origin crashed


I have a development server setup made up of two nodes (1 master - 1 slave) running a bunch of different projects and environments which just crashed badly on me.

Symptoms are: all containers in all projects are in pending state (orange circle) - when I try to `delete all`, things get removed but pods hang in a 'terminating' state. oc describe gives me uninteresting information that I already know (basically that pods are Pending) and oc logs tells me that it (could not find the requested resource).

I tried to `sudo systemctl restart origin-master` as it seems to have produced good results in the past, but that didn't help this time. I also tried that in combination with a full system reboot.

Finally I tried running the ansible scripts in hopes of updating origin to the latest version (it's still running 1.1.6) but I got the following error log:

failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", "create", "-n", "openshift", "-f", "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"], "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", "failed": true, "failed_when_result": true, "rc": 1, "start": "2016-09-05 07:20:11.869249", "stdout_lines": [], "warnings": []}
stderr: unable to connect to a server to handle "imagestreamlists": the server has asked for the client to provide credentials

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/Users/candide/config.retry

apps.intrinsic.world       : ok=48   changed=0    unreachable=0    failed=0
localhost                  : ok=15   changed=0    unreachable=0    failed=0
paas.intrinsic.world       : ok=207  changed=0    unreachable=0    failed=1

My last option is to reinstall everything from scratch but before I do this I wanted to know if you guys had other ideas on how to get on top of things again.


