[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Evacuation of pods and scheduling



Good Morning,

I’ve continued some research into how this probably could’ve happened but I’m still left with one remaining question.

What I can’t seem to find, is information about how the replication controller and the evacuate command interact. If I mimic what the evac command does via this awesome bash line:

oadm manage-node node-002.ose.bld.f4tech.com --list-pods -o json | tail -n +4 | jq '.items[].metadata.name' | xargs oc delete pod

I’m able to recreate the problem. This makes me think that when a lot of commands are executed, the replication controller is not able to keep up with the needs of the application. Something I found in the events log during this scenario is a little nerving.

7:35:23 AM  sample-jvm-app-30-xunju Pod Normal  Scheduled   Successfully assigned sample-jvm-app-30-xunju to node-001.ose.bld.f4tech.com
7:35:23 AM  sample-jvm-app-30-g4drx Pod Normal  Scheduled   Successfully assigned sample-jvm-app-30-g4drx to node-003.ose.bld.f4tech.com
7:35:22 AM  sample-jvm-app-30-362hb Pod Normal  Scheduled   Successfully assigned sample-jvm-app-30-362hb to node-003.ose.bld.f4tech.com
7:35:19 AM  sample-jvm-app-30-qn5nt Pod Normal  Killing     Killing container with docker id 99a673abe7e3: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-xo9w6 Pod Normal  Killing     Killing container with docker id 33c23ef1e7ac: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-pcxlr Pod Normal  Killing     Killing container with docker id f1b3ce10a5c1: Need to kill pod.
7:34:22 AM  sample-jvm-app-30-362hb Pod Warning Failed scheduling   node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-xunju Pod Warning Failed scheduling   node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-g4drx Pod Warning Failed scheduling   node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes

As seen from the above, the newly created pods appear first to be wanted to be placed on node–002, but node–002 is not found in the cache, which suggests he’s failing to pass through the predicate search of available nodes. Which should be understandable as he’s been marked unschedulable. What I don’t understand is that during this period of time, node–001 and node–003 are available and more than willing to accept these pods. I ponder if the replication controller doesn’t have updated information regarding the availability of nodes until after the pods are finally killed off.

I’m still researching how I can prevent all three pods from ending up on a single node.



-- 
John Skarbek

On June 7, 2016 at 16:05:12, Skarbek, John (john skarbek ca com) wrote:

Good Morning,

I’d like to ask a question regarding the use of evacuating pods and how openshift/kubernetes schedules the replacement.

We have 3 nodes configured to run applications, and we went through a cycle of applying patches. So we’ve created an ansible playbook that goes through, evacuates the pods and restarts that node, one node at a time.

Prior to starting, we had an application running 3 pods, one one each node. When node1 was forced to evac the pods, kubernetes scheduled the replacement pod on node3. Node2 was next in line, when ansible forced the evac of pods, the final pod was placed on node3. So at this point, all pods were on the same physical node.

When ansible forced the evac of pods on node3, I then had an outage. The three pods were put in a “terminating” state, while 3 others were in a “pending” state. It took approximately 30 seconds to terminate the pods. The new ‘pending’ pods sat pending for about 65 seconds, after which they were finally scheduled on nodes 1 and 2 and X time to start the containers.

Is this expected behavior? I was hoping that the replication controller woud recognize this behavior a bit better for scheduling nodes to ensure pods don’t get shifted to the same physical box when there’s two boxes available. I’m also hoping that before pods are term’ed, replacements are brought online.



-- 
John Skarbek


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]