I’ve continued some research into how this probably could’ve happened but I’m still left with one remaining question.
What I can’t seem to find, is information about how the replication controller and the evacuate command interact. If I mimic what the evac command does via this awesome bash line:
I’m able to recreate the problem. This makes me think that when a lot of commands are executed, the replication controller is not able to keep up with the needs of the application. Something I found in the events log during this scenario is a little nerving.
As seen from the above, the newly created pods appear first to be wanted to be placed on node–002, but node–002 is not found in the cache, which suggests he’s failing to pass through the predicate search of available nodes. Which should be understandable as he’s been marked unschedulable. What I don’t understand is that during this period of time, node–001 and node–003 are available and more than willing to accept these pods. I ponder if the replication controller doesn’t have updated information regarding the availability of nodes until after the pods are finally killed off.
I’m still researching how I can prevent all three pods from ending up on a single node.
On June 7, 2016 at 16:05:12, Skarbek, John (john skarbek ca com) wrote: