[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: origin crashed



Were you able to check the expiration date on your admin root cluster cert and verify it has not expired?

On Sep 6, 2016, at 5:19 AM, Candide Kemmler <candide intrinsic world> wrote:

Hi Clayton,

Thanks! Here's the result of running `sudo oadm diagnostics`. I'm particularly bothered by the "the server has asked for the client to provide credentials" message as I'm seeing this one when I try to execute the ansible scripts as well. Do you know how to solve it?

Any other ideas on things I should focus on?

Regards,

Candide


[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
[Note] Could not configure a client, so client diagnostics are limited to testing configuration and connection
Info:  Using context for cluster-admin access: 'default/paas-intrinsic-world:8443/system:admin'
[Note] Performing systemd discovery

[Note] Running diagnostic: ConfigContexts[logging/paas-intrinsic-world:8443/admin]
       Description: Validate client config context is complete and has connectivity

ERROR: [DCli0014 from diagnostic ConfigContexts openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
       For client config context 'logging/paas-intrinsic-world:8443/admin':
       The server URL is 'https://paas.intrinsic.world:8443'
       The user authentication is 'admin/paas-intrinsic-world:8443'
       The current project is 'logging'
       (*errors.StatusError) the server has asked for the client to provide credentials

       This means that when we tried to make a request to the master API
       server, the request required credentials that were not presented. This
       can happen with an expired or invalid authentication token. Try logging
       in with this user again.

[Note] Running diagnostic: ConfigContexts[logging/paas-intrinsic-world:8443/system:admin]
       Description: Validate client config context is complete and has connectivity

Info:  For client config context 'logging/paas-intrinsic-world:8443/system:admin':
       The server URL is 'https://paas.intrinsic.world:8443'
       The user authentication is 'system:admin/paas-intrinsic-world:8443'
       The current project is 'logging'
       Successfully requested project list; has access to project(s):
         [openshift-infra dev ieml-demo logging management-infra misc openshift p2p default ieml-dev ...]

[Note] Running diagnostic: ClusterRegistry
       Description: Check that there is a working Docker registry

WARN:  [DClu1009 from diagnostic ClusterRegistry openshift/origin/pkg/diagnostics/cluster/registry.go:217]
       The "docker-registry-1-8w93s" pod for the "docker-registry" service is not running.
       This may be transient, a scheduling error, or something else.

ERROR: [DClu1001 from diagnostic ClusterRegistry openshift/origin/pkg/diagnostics/cluster/registry.go:173]
       The "docker-registry" service exists but no pods currently running, so it
       is not available. Builds and deployments that use the registry will fail.

[Note] Running diagnostic: ClusterRoleBindings
       Description: Check that the default ClusterRoleBindings are present and contain the expected subjects

Info:  clusterrolebinding/cluster-admins has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-admins has extra subject {User  admin    }.

Info:  clusterrolebinding/cluster-readers has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount management-infra management-admin    }.
Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount logging aggregated-logging-fluentd    }.

[Note] Running diagnostic: ClusterRoles
       Description: Check that the default ClusterRoles are present and contain the expected permissions

[Note] Running diagnostic: ClusterRouterName
       Description: Check there is a working router

ERROR: [DClu2007 from diagnostic ClusterRouter openshift/origin/pkg/diagnostics/cluster/router.go:156]
       The "router" DeploymentConfig exists but has no running pods, so it
       is not available. Apps will not be externally accessible via the router.

[Note] Running diagnostic: MasterNode
       Description: Check if master is also running node (for Open vSwitch)

Info:  Found a node with same IP as master: paas.intrinsic.world

[Note] Running diagnostic: NodeDefinitions
       Description: Check node records on master

WARN:  [DClu0003 from diagnostic NodeDefinition openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]
       Node paas.intrinsic.world is ready but is marked Unschedulable.
       This is usually set manually for administrative reasons.
       An administrator can mark the node schedulable with:
           oadm manage-node paas.intrinsic.world --schedulable=true

       While in this state, pods should not be scheduled to deploy on the node.
       Existing pods will continue to run until completed or evacuated (see
       other options for 'oadm manage-node').

[Note] Running diagnostic: AnalyzeLogs
       Description: Check for recent problems in systemd service logs

Info:  Checking journalctl logs for 'origin-master' service
Info:  Checking journalctl logs for 'origin-node' service
Info:  Checking journalctl logs for 'docker' service

[Note] Running diagnostic: MasterConfigCheck
       Description: Check the master config file

Info:  Found a master config file: /etc/origin/master/master-config.yaml

WARN:  [DH0005 from diagnostic MasterConfigCheck openshift/origin/pkg/diagnostics/host/check_master_config.go:58]
       Validation of master config file '/etc/origin/master/master-config.yaml' warned:
       assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console
       assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console

[Note] Running diagnostic: NodeConfigCheck
       Description: Check the node config file

Info:  Found a node config file: /etc/origin/node/node-config.yaml

[Note] Running diagnostic: UnitStatus
       Description: Check status for related systemd units

[Note] Summary of diagnostics execution (version v1.1.6):
[Note] Warnings seen: 3
[Note] Errors seen: 4



On 05 Sep 2016, at 18:46, Clayton Coleman <ccoleman redhat com> wrote:

Did you change the IP of your master, or otherwise delete / alter the
openshift-infra namespace?  Or have your client certificates expired
(is this cluster 1 year old(?

Before deleting, try two things:

   oadm diagnostics

From the master (to see if it identifies anything).

Also check your certificate expiration a.

On Sep 5, 2016, at 5:00 AM, Candide Kemmler <candide intrinsic world> wrote:

Hi,

I have a development server setup made up of two nodes (1 master - 1 slave) running a bunch of different projects and environments which just crashed badly on me.

Symptoms are: all containers in all projects are in pending state (orange circle) - when I try to `delete all`, things get removed but pods hang in a 'terminating' state. oc describe gives me uninteresting information that I already know (basically that pods are Pending) and oc logs tells me that it (could not find the requested resource).

I tried to `sudo systemctl restart origin-master` as it seems to have produced good results in the past, but that didn't help this time. I also tried that in combination with a full system reboot.

Finally I tried running the ansible scripts in hopes of updating origin to the latest version (it's still running 1.1.6) but I got the following error log:

failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", "create", "-n", "openshift", "-f", "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"], "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-09-05 07:20:11.869249", "stdout_lines": [], "warnings": []}
stderr: unable to connect to a server to handle "imagestreamlists": the server has asked for the client to provide credentials

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
         to retry, use: --limit @/Users/candide/config.retry

apps.intrinsic.world       : ok=48   changed=0    unreachable=0    failed=0
localhost                  : ok=15   changed=0    unreachable=0    failed=0
paas.intrinsic.world       : ok=207  changed=0    unreachable=0    failed=1

My last option is to reinstall everything from scratch but before I do this I wanted to know if you guys had other ideas on how to get on top of things again.

Candide

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]