[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: why all pods are redeployed daily?



The node controller - if it detects a node has not contacted the master within a liveness window - will mark the node as down.  However, in this case I didn't see the expected series of log messages in your previous input that would indicate the node controller has decided to kill the node.  If every X hours you lose network connectivity from node -> master for > 5 minutes, you'd see that.

However, we are *not* seeing that in the master log you sent before.  Instead, the node is "discovered" deleted during one of the periodic sync loop events.  Can you find the master logs between "NodeController observed a Node deletion" and "NodeController observed a new Node" (assuming that both are present in the logs)?   They may come out of order (although shouldn't).

On Fri, Oct 9, 2015 at 1:06 PM, Viet Nguyen <vnguyen redhat com> wrote:
Here's the list of Recording Removing Node since 09/17 when the node was first created.

http://pastebin.com/NMHGEgaU

Two occurred around the same time (17:01) on Oct 4 and 5 but that might just be a coincident.

There are no crons on master/node besides the usual sar ones.  Node CPU/mem utilizations were about 60%.

For some reason Kubernetes thinks the node needs to be removed:

https://github.com/kubernetes/kubernetes/blob/release-1.1/pkg/controller/node/nodecontroller.go#L258



----- Original Message -----
From: "Clayton Coleman" <ccoleman redhat com>
To: "Viet Nguyen" <vnguyen redhat com>
Cc: users lists openshift redhat com
Sent: Thursday, October 8, 2015 5:12:25 PM
Subject: Re: why all pods are redeployed daily?

Does it happen at a fixed time everyday?  Any correlation between that
and cron, network dhcp, and thing?

> On Oct 8, 2015, at 6:33 PM, Viet Nguyen <vnguyen redhat com> wrote:
>
> Node: b16<.lab...company.com>  I stripped the rest of the domain in pastebin for privacy reason.  The node has fixed IP.
>
>
>
> ----- Original Message -----
> From: "Clayton Coleman" <ccoleman redhat com>
> To: "Viet Nguyen" <vnguyen redhat com>
> Cc: users lists openshift redhat com
> Sent: Thursday, October 8, 2015 3:11:13 PM
> Subject: Re: why all pods are redeployed daily?
>
> What is your node name?  Is it by any chance based on something that
> dynamically changes every day, like say an IP address assigned by
> dhcp?
>
>> On Oct 8, 2015, at 5:49 PM, Viet Nguyen <vnguyen redhat com> wrote:
>>
>> This is an openshift-ansible install on bring-your-own-VMs.
>>
>> I can see "Killing unwanted pod..." in the node log.  Not sure about the source of that delete node api call.  Really strange!
>>
>> node journal around 14:85
>>
>> http://pasted.co/2dd5fa80
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Clayton Coleman" <ccoleman redhat com>
>> To: "Viet Nguyen" <vnguyen redhat com>
>> Cc: users lists openshift redhat com
>> Sent: Thursday, October 8, 2015 1:39:06 PM
>> Subject: Re: why all pods are redeployed daily?
>>
>>  1. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
>>  14:58:08.204117   48577 nodecontroller.go:230] NodeController observed a
>>  Node deletion: b16
>>  2. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
>>  14:58:08.204147   48577 nodecontroller.go:356] Recording Removing Node b16
>>  from NodeController event message for node b16
>>  3. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
>>  14:58:08.204174   48577 nodecontroller.go:160] Delete all pods from b16
>>
>>
>> Which means something deleted your node via the API. Check whether there is
>> a corresponding log entry at that time in the journal for docker and
>> Openshift-node.  Is this an all in one containerized deployment, an Ansible
>> installed cluster, or down thing else?
>>
>> On Oct 8, 2015, at 4:20 PM, Viet Nguyen <vnguyen redhat com> wrote:
>>
>> pastebin correct link:  http://pastebin.com/ky9dmJWB
>>
>>
>>
>> ----- Original Message -----
>> From: "Viet Nguyen" <vnguyen redhat com>
>> To: "Clayton Coleman" <ccoleman redhat com>
>> Cc: users lists openshift redhat com
>> Sent: Thursday, October 8, 2015 12:47:29 PM
>> Subject: Re: why all pods are redeployed daily?
>>
>> Did your node restart 21 hours ago?
>>
>>
>> No, the node (RHEL7.1 on bare metal) has been up for awhile.  As I was
>> writing my reply all the pods were restarted again.  Master system journal
>> shows a lot "Attempting to schedule..." msgs around that time.
>>
>> openshift master log:
>> http://pastebin.com/ky9dmJWBI
>>
>> I can see inactive containers on the node with 'docker ps -a'
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Clayton Coleman" <ccoleman redhat com>
>> To: "Viet Nguyen" <vnguyen redhat com>
>> Cc: users lists openshift redhat com
>> Sent: Thursday, October 8, 2015 10:50:00 AM
>> Subject: Re: why all pods are redeployed daily?
>>
>> Did your node restart 21 hours ago?  On the node, do you see the
>> previous docker container from that time, and is there any other info
>> in the system journal from that time?
>>
>> On Oct 8, 2015, at 1:21 PM, Viet Nguyen <vnguyen redhat com> wrote:
>>
>>
>> Dead cakephp containers have the following message in the log though I
>> don't think it's the cause:
>>
>>
>> AH00558: httpd: Could not reliably determine the server's fully qualified
>> domain name, using 10.1.0.58.
>>
>>
>> I don't see any other errors in other pods.  The dead containers have
>> either exit code 0 or 137.
>>
>>
>>
>> [root openshift-master1 ~]# oc describe pod cakephp-example-1-5v9jg
>>
>> Name:                cakephp-example-1-5v9jg
>>
>> ...
>>
>> Conditions:
>>
>> Type        Status
>>
>> Ready    True
>>
>> No events.
>>
>>
>>
>>
>> ----- Original Message -----
>>
>> From: "Clayton Coleman" <ccoleman redhat com>
>>
>> To: "Viet Nguyen" <vnguyen redhat com>
>>
>> Cc: users lists openshift redhat com
>>
>> Sent: Thursday, October 8, 2015 9:59:06 AM
>>
>> Subject: Re: why all pods are redeployed daily?
>>
>>
>> That's restarting, not redeployment. What is in the pod logs and what
>>
>> are the recent pod events (do a describe of each pod)?
>>
>>
>> On Oct 8, 2015, at 12:38 PM, Viet Nguyen <vnguyen redhat com> wrote:
>>
>>
>> I have a 1-node cluster with about 30 or so pods running in different
>> projects.  All of my pods (including the registry, router) are
>> automatically redeployed almost daily without my intervention.   Is this a
>> configurable option or something wrong with my cluster?
>>
>>
>>
>> # oadm manage-node b16 --list-pods
>>
>>
>> NAME                            READY     STATUS    RESTARTS   AGE
>>
>> cakephp-example-1-3exdw         1/1       Running   0          21h
>>
>> cakephp-example-1-5v9jg         1/1       Running   0          21h
>>
>> cakephp-example-1-dpiag         1/1       Running   0          21h
>>
>> ...
>>
>>
>>
>> version:
>>
>>
>> oc v3.0.2.0
>>
>> kubernetes v1.1.0-alpha.0-1605-g44c91b1
>>
>>
>> _______________________________________________
>>
>> users mailing list
>>
>> users lists openshift redhat com
>>
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>> _______________________________________________
>>
>> users mailing list
>>
>> users lists openshift redhat com
>>
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users lists openshift redhat com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users lists openshift redhat com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]