[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: "atomic-openshift-controllers" service keeps respawning every 30 secs (multiple HA masters setup)



I believe we fixed the issue with the restarting controller in 3.1.1 -
this looks like what I would expect in 3.1.0.4.  For now, there's
minimal impact to the looping other than it looks ugly.

On Sun, Feb 21, 2016 at 2:11 AM, Florian Daniel Otel
<florian otel gmail com> wrote:
> Kindest thanks Clayton, Jason for being willing to help yet again:
>
> The info Clayton requested:
>
> The service status on e.g. "master2"
>
>
> [root vspose-master2 ~]# systemctl status
> atomic-openshift-master-controllers.service
> ● atomic-openshift-master-controllers.service - Atomic OpenShift Master
> Controllers
>    Loaded: loaded
> (/usr/lib/systemd/system/atomic-openshift-master-controllers.service;
> enabled; vendor preset: disabled)
>    Active: activating (start) since Sun 2016-02-21 06:55:25 UTC; 9s ago
>      Docs: https://github.com/openshift/origin
>  Main PID: 54642 (openshift)
>    CGroup: /system.slice/atomic-openshift-master-controllers.service
>            └─54642 /usr/bin/openshift start master controllers
> --config=/etc/origin/master/master-config.yaml --loglevel=2
> --listen=https://0.0.0.0:8444
>
> ....
>
> The corresponding systemd unit file:
>
> [root vspose-master2 systemd]# cat
> /usr/lib/systemd/system/atomic-openshift-master-controllers.service
> [Unit]
> Description=Atomic OpenShift Master Controllers
> Documentation=https://github.com/openshift/origin
> After=network.target
> After=atomic-openshift-master-api.service
> Before=atomic-openshift-node.service
> Requires=network.target
>
> [Service]
> Type=notify
> EnvironmentFile=/etc/sysconfig/atomic-openshift-master-controllers
> Environment=GOTRACEBACK=crash
> ExecStart=/usr/bin/openshift start master controllers
> --config=${CONFIG_FILE} $OPTIONS
> LimitNOFILE=131072
> LimitCORE=infinity
> WorkingDirectory=/var/lib/origin
> SyslogIdentifier=atomic-openshift-master-controllers
> Restart=on-failure
>
> [Install]
> WantedBy=multi-user.target
> WantedBy=atomic-openshift-node.service
>
>
>
> OSE version:
>
> [root vspose-master2 systemd]# /usr/bin/openshift version
> openshift v3.1.0.4-16-g112fcc4
> kubernetes v1.1.0-origin-1107-g4c8e6f4
> etcd 2.1.2
>
>
> So far, the procedure I tried for stopping / starting the masters was:
>
>      systemctl stop atomic-openshift-master-controllers.service
>      systemctl stop atomic-openshift-master-api.service
>
>
> respectively:
>
>      systemctl start atomic-openshift-master-api.service
>      systemctl start atomic-openshift-master-controllers.service
>
>
> (stopping / staring "atomic-openshift-master-api" seems a bit redundant
> since it is a requirement for "atomic-openshift-master-controllers" , but
> still... )
>
>
> Thanks,
>
> /Florian
>
>
>
> On Sun, Feb 21, 2016 at 1:07 AM, Clayton Coleman <ccoleman redhat com>
> wrote:
>>
>>
>>
>> On Feb 20, 2016, at 6:59 PM, Jason DeTiberus <jdetiber redhat com> wrote:
>>
>>
>> On Feb 20, 2016 4:27 PM, "Florian Daniel Otel" <florian otel gmail com>
>> wrote:
>> >
>> > Hello all,
>> >
>> > I've installed a setup using multiple masters using "native HA" (i.e.
>> > HAproxy) -- just as described here:
>> >
>> > My problem:
>> >
>> > After a reboot, on two of my three masters -- namely "master2" and
>> > "master3" -- the "atomic-openshift-master-controllers" service keeps
>> > respawning every 30 seconds.
>>
>> This is expected. The controllers service can only be active on a single
>> host. The active service acquires a lock within etcd and the others will
>> continuously respawn and attempt to acquire the lock.
>>
>>
>> That is not expected - the controllers should start and block until they
>> are needed.  They should never restart unless the lose their leader lock.
>>
>>
>> >
>> > The systemd logs for the service (here master2).
>> >
>> >
>> > Feb 20 21:13:13 vspose-master2 systemd[1]: Starting Atomic OpenShift
>> > Master Controllers...
>> > Feb 20 21:13:14 vspose-master2
>> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.669893    3145
>> > plugins.go:71] No cloud provider specified.
>> > Feb 20 21:13:14 vspose-master2
>> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818515    3145
>> > start_master.go:410] Starting controllers on 0.0.0.0:8444
>> > (v3.1.0.4-16-g112fcc4)
>> > Feb 20 21:13:14 vspose-master2
>> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.818566    3145
>> > start_master.go:414] Using images from "openshift3/ose-<component>:latest"
>> > Feb 20 21:13:14 vspose-master2
>> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.846183    3145
>> > master.go:232] Started health checks at 0.0.0.0:8444
>> > Feb 20 21:13:14 vspose-master2
>> > atomic-openshift-master-controllers[3145]: I0220 21:13:14.864747    3145
>> > master_config.go:250] Attempting to acquire controller lease as
>> > master-xct012o4, renewing every 30 seconds
>> > Feb 20 21:14:44 vspose-master2 systemd[1]:
>> > atomic-openshift-master-controllers.service start operation timed out.
>> > Terminating.
>> > Feb 20 21:14:44 vspose-master2 systemd[1]:
>> > atomic-openshift-master-controllers.service: main process exited,
>> > code=exited, status=2/INVALIDARGUMENT
>> > Feb 20 21:14:44 vspose-master2 systemd[1]: Failed to start Atomic
>> > OpenShift Master Controllers.
>> > Feb 20 21:14:44 vspose-master2 systemd[1]: Unit
>> > atomic-openshift-master-controllers.service entered failed state.
>> > Feb 20 21:14:44 vspose-master2 systemd[1]:
>> > atomic-openshift-master-controllers.service failed.
>> > Feb 20 21:14:44 vspose-master2 systemd[1]:
>> > atomic-openshift-master-controllers.service holdoff time over, scheduling
>> > restart.
>> >
>> >
>> > My questions:
>> >
>> > - What have gone wrong  here ?
>> >
>> > - How do I recover from this ?
>> >
>> > - What is the recommended procedure to shut down / restart the OpenShift
>> > master services in a multi-master setup ?
>> >
>> > Normally on a (single) master environment I do "systemctl
>> > stop/start/restart atomic-openshift-master" but it seems naturally that the
>> > process on a multi-master environment should be more involved -- just cannot
>> > find any guidance on this
>> >
>> >
>> > Kindest thanks for the help,
>> >
>> >
>> > /Florian
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > users lists openshift redhat com
>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >
>>
>> _______________________________________________
>> users mailing list
>> users lists openshift redhat com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]