[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: why all pods are redeployed daily?



All delete and new node events since cluster was created

http://pastebin.com/rzqmtRQk


Full Logs in between most recent delete and new node

http://pastebin.com/jjEEfW0t




----- Original Message -----
From: "Clayton Coleman" <ccoleman redhat com>
To: "Viet Nguyen" <vnguyen redhat com>
Cc: "users" <users lists openshift redhat com>
Sent: Friday, October 9, 2015 2:53:19 PM
Subject: Re: why all pods are redeployed daily?

The node controller - if it detects a node has not contacted the master
within a liveness window - will mark the node as down.  However, in this
case I didn't see the expected series of log messages in your previous
input that would indicate the node controller has decided to kill the
node.  If every X hours you lose network connectivity from node -> master
for > 5 minutes, you'd see that.

However, we are *not* seeing that in the master log you sent before.
Instead, the node is "discovered" deleted during one of the periodic sync
loop events.  Can you find the master logs between "NodeController observed
a Node deletion" and "NodeController observed a new Node" (assuming that
both are present in the logs)?   They may come out of order (although
shouldn't).

On Fri, Oct 9, 2015 at 1:06 PM, Viet Nguyen <vnguyen redhat com> wrote:

> Here's the list of Recording Removing Node since 09/17 when the node was
> first created.
>
> http://pastebin.com/NMHGEgaU
>
> Two occurred around the same time (17:01) on Oct 4 and 5 but that might
> just be a coincident.
>
> There are no crons on master/node besides the usual sar ones.  Node
> CPU/mem utilizations were about 60%.
>
> For some reason Kubernetes thinks the node needs to be removed:
>
>
> https://github.com/kubernetes/kubernetes/blob/release-1.1/pkg/controller/node/nodecontroller.go#L258
>
>
>
> ----- Original Message -----
> From: "Clayton Coleman" <ccoleman redhat com>
> To: "Viet Nguyen" <vnguyen redhat com>
> Cc: users lists openshift redhat com
> Sent: Thursday, October 8, 2015 5:12:25 PM
> Subject: Re: why all pods are redeployed daily?
>
> Does it happen at a fixed time everyday?  Any correlation between that
> and cron, network dhcp, and thing?
>
> > On Oct 8, 2015, at 6:33 PM, Viet Nguyen <vnguyen redhat com> wrote:
> >
> > Node: b16<.lab...company.com>  I stripped the rest of the domain in
> pastebin for privacy reason.  The node has fixed IP.
> >
> >
> >
> > ----- Original Message -----
> > From: "Clayton Coleman" <ccoleman redhat com>
> > To: "Viet Nguyen" <vnguyen redhat com>
> > Cc: users lists openshift redhat com
> > Sent: Thursday, October 8, 2015 3:11:13 PM
> > Subject: Re: why all pods are redeployed daily?
> >
> > What is your node name?  Is it by any chance based on something that
> > dynamically changes every day, like say an IP address assigned by
> > dhcp?
> >
> >> On Oct 8, 2015, at 5:49 PM, Viet Nguyen <vnguyen redhat com> wrote:
> >>
> >> This is an openshift-ansible install on bring-your-own-VMs.
> >>
> >> I can see "Killing unwanted pod..." in the node log.  Not sure about
> the source of that delete node api call.  Really strange!
> >>
> >> node journal around 14:85
> >>
> >> http://pasted.co/2dd5fa80
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: "Clayton Coleman" <ccoleman redhat com>
> >> To: "Viet Nguyen" <vnguyen redhat com>
> >> Cc: users lists openshift redhat com
> >> Sent: Thursday, October 8, 2015 1:39:06 PM
> >> Subject: Re: why all pods are redeployed daily?
> >>
> >>  1. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
> >>  14:58:08.204117   48577 nodecontroller.go:230] NodeController observed
> a
> >>  Node deletion: b16
> >>  2. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
> >>  14:58:08.204147   48577 nodecontroller.go:356] Recording Removing Node
> b16
> >>  from NodeController event message for node b16
> >>  3. Oct 08 14:58:08 openshift-master1 openshift-master[48577]: I1008
> >>  14:58:08.204174   48577 nodecontroller.go:160] Delete all pods from b16
> >>
> >>
> >> Which means something deleted your node via the API. Check whether
> there is
> >> a corresponding log entry at that time in the journal for docker and
> >> Openshift-node.  Is this an all in one containerized deployment, an
> Ansible
> >> installed cluster, or down thing else?
> >>
> >> On Oct 8, 2015, at 4:20 PM, Viet Nguyen <vnguyen redhat com> wrote:
> >>
> >> pastebin correct link:  http://pastebin.com/ky9dmJWB
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: "Viet Nguyen" <vnguyen redhat com>
> >> To: "Clayton Coleman" <ccoleman redhat com>
> >> Cc: users lists openshift redhat com
> >> Sent: Thursday, October 8, 2015 12:47:29 PM
> >> Subject: Re: why all pods are redeployed daily?
> >>
> >> Did your node restart 21 hours ago?
> >>
> >>
> >> No, the node (RHEL7.1 on bare metal) has been up for awhile.  As I was
> >> writing my reply all the pods were restarted again.  Master system
> journal
> >> shows a lot "Attempting to schedule..." msgs around that time.
> >>
> >> openshift master log:
> >> http://pastebin.com/ky9dmJWBI
> >>
> >> I can see inactive containers on the node with 'docker ps -a'
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: "Clayton Coleman" <ccoleman redhat com>
> >> To: "Viet Nguyen" <vnguyen redhat com>
> >> Cc: users lists openshift redhat com
> >> Sent: Thursday, October 8, 2015 10:50:00 AM
> >> Subject: Re: why all pods are redeployed daily?
> >>
> >> Did your node restart 21 hours ago?  On the node, do you see the
> >> previous docker container from that time, and is there any other info
> >> in the system journal from that time?
> >>
> >> On Oct 8, 2015, at 1:21 PM, Viet Nguyen <vnguyen redhat com> wrote:
> >>
> >>
> >> Dead cakephp containers have the following message in the log though I
> >> don't think it's the cause:
> >>
> >>
> >> AH00558: httpd: Could not reliably determine the server's fully
> qualified
> >> domain name, using 10.1.0.58.
> >>
> >>
> >> I don't see any other errors in other pods.  The dead containers have
> >> either exit code 0 or 137.
> >>
> >>
> >>
> >> [root openshift-master1 ~]# oc describe pod cakephp-example-1-5v9jg
> >>
> >> Name:                cakephp-example-1-5v9jg
> >>
> >> ...
> >>
> >> Conditions:
> >>
> >> Type        Status
> >>
> >> Ready    True
> >>
> >> No events.
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >>
> >> From: "Clayton Coleman" <ccoleman redhat com>
> >>
> >> To: "Viet Nguyen" <vnguyen redhat com>
> >>
> >> Cc: users lists openshift redhat com
> >>
> >> Sent: Thursday, October 8, 2015 9:59:06 AM
> >>
> >> Subject: Re: why all pods are redeployed daily?
> >>
> >>
> >> That's restarting, not redeployment. What is in the pod logs and what
> >>
> >> are the recent pod events (do a describe of each pod)?
> >>
> >>
> >> On Oct 8, 2015, at 12:38 PM, Viet Nguyen <vnguyen redhat com> wrote:
> >>
> >>
> >> I have a 1-node cluster with about 30 or so pods running in different
> >> projects.  All of my pods (including the registry, router) are
> >> automatically redeployed almost daily without my intervention.   Is
> this a
> >> configurable option or something wrong with my cluster?
> >>
> >>
> >>
> >> # oadm manage-node b16 --list-pods
> >>
> >>
> >> NAME                            READY     STATUS    RESTARTS   AGE
> >>
> >> cakephp-example-1-3exdw         1/1       Running   0          21h
> >>
> >> cakephp-example-1-5v9jg         1/1       Running   0          21h
> >>
> >> cakephp-example-1-dpiag         1/1       Running   0          21h
> >>
> >> ...
> >>
> >>
> >>
> >> version:
> >>
> >>
> >> oc v3.0.2.0
> >>
> >> kubernetes v1.1.0-alpha.0-1605-g44c91b1
> >>
> >>
> >> _______________________________________________
> >>
> >> users mailing list
> >>
> >> users lists openshift redhat com
> >>
> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>
> >>
> >> _______________________________________________
> >>
> >> users mailing list
> >>
> >> users lists openshift redhat com
> >>
> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users lists openshift redhat com
> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users lists openshift redhat com
> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]