[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)



There was a bug at one point where controllers were failing to notify systemd and were being terminated.   What does type of process is the atomic-master-controllers unit file set to?  And what version of the binaries?

On Feb 20, 2016, at 4:27 PM, Florian Daniel Otel <florian otel gmail com> wrote:

Hello all, 

I've installed a setup using multiple masters using "native HA" (i.e. HAproxy) -- just as described here:  

My problem: 

After a reboot, on two of my three masters -- namely "master2" and "master3" -- the "atomic-openshift-master-controllers" service keeps respawning every 30 seconds. 

The systemd logs for the service (here master2). 


Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift Master Controllers...
Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893    3145 plugins.go:71] No cloud provider specified.
Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515    3145 start_master.go:410] Starting controllers on 0.0.0.0:8444 (v3.1.0.4-16-g112fcc4)
Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566    3145 start_master.go:414] Using images from "openshift3/ose-<component>:latest"
Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183    3145 master.go:232] Started health checks at 0.0.0.0:8444
Feb 20 21:13:14 vspose-master2 atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747    3145 master_config.go:250] Attempting to acquire controller lease as master-xct012o4, renewing every 30 seconds
Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service start operation timed out. Terminating.
Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic OpenShift Master Controllers.
Feb 20 21:14:44 vspose-master2 systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service failed.
Feb 20 21:14:44 vspose-master2 systemd[1]: atomic-openshift-master-controllers.service holdoff time over, scheduling restart.


My questions: 

- What have gone wrong  here ? 

- How do I recover from this ? 

- What is the recommended procedure to shut down / restart the OpenShift master services in a multi-master setup ?  

Normally on a (single) master environment I do "systemctl stop/start/restart atomic-openshift-master" but it seems naturally that the process on a multi-master environment should be more involved -- just cannot find any guidance on this


Kindest thanks for the help, 


/Florian




_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]