[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

"Readiness probe failed" after a restart of the node



Hello,

I just had a very interesting problem with an OpenShift Node. After a restart many of our pods would be like this:

root node02 ~ # oc get po
zzz-1-vmu4g        0/1       Running   0          44s

root node02 ~ # oc describe po
  1m            1m              1       {kubelet node02.xyz.com}       spec.containers{zzz}            created         Created with docker id 060d48664a9a
  1m            1m              1       {kubelet node02.xyz.com}       spec.containers{zzz}            started         Started with docker id 060d48664a9a
  1m            35s             4       {kubelet node02.xyz.com}       spec.containers{zzz}            unhealthy       Readiness probe failed: Get http://10.1.0.21:8080/mgmt/health: dial tcp 10.1.0.21:8080: connection refused

And they would stay like this. What was very weird is that "brctl show" would show lots of different veth interfaces:

root node02 ~ # brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.56847afe9799       no
lbr0            8000.0a0dac23c824       no              veth0e410ea
                                                        veth1b8a907
[many more lines like that, *snip*]
                                                        vethddc8aa5
                                                        vethf9ba02f
                                                        vlinuxbr

On our working nodes "brctl show" does not show any veth* interfaces.
We tried many things: Restarting the node one more time, restarting the pods, restarting docker/origin-node, restarting iptables-service and openvswitch but in the end the only thing that helped was running
ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml -i ~/openshift-hostsone more time and then restarting the node.
After that, all the veth* interfaces disappeared again and everything was fine.

Needless to say that running ansible-playbook every time something goes wrong is not a good solution for us. Anyone got an idea as to what was going on there?

Regards,
v


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]