[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Pods stuck on 'ContainerCreating' when redhat/openshift-ovs-multitenant enabled



Hi Dan,
I checked the logs of all pods in namespace openshift-sdn and I didn’t find any errors in them.
I reinstalled with ‘redhat/openshift-ovs-multitenant’ on a clean machine, everything works well.

So I suspect uninstall playbook didn’t clean calico plugin properly.

Thanks,
Jared


On Oct 16, 2019, at 1:09 AM, Dan Williams <dcbw redhat com> wrote:

On Tue, 2019-10-15 at 06:18 +0000, Yu Wei wrote:
I found the root cause for this issue.
In my machine, I firstly deployed cop with calico. It works well.
Then run uninstall playbook and reinstall with sdn openshift-ovs-
multitenant.
And it didn’t work anymore.
I found something as below,

[root buzz1 openshift-ansible]# systemctl status  atomic-openshift-
node.service
● atomic-openshift-node.service - OpenShift Node
  Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; 
enabled; vendor preset: disabled)
  Active: active (running) since Mon 2019-10-14 00:43:08 PDT; 22h
ago
    Docs: https://github.com/openshift/origin
Main PID: 87388 (hyperkube)
  CGroup: /system.slice/atomic-openshift-node.service
          ├─87388 /usr/bin/hyperkube kubelet --v=6 --address=0.0.0.0 
--allow-privileged=true --anonymous-auth=true --authentication-
toke...
          └─88872 /opt/cni/bin/calico

Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.289674   87388 common.go:71]
Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.289809   87388 file.go:199]
Reading config file "/et...yaml"
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.292556   87388 common.go:62]
Generated UID "598eab3c....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.293602   87388 common.go:66]
Generated Name "master-....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.294512   87388 common.go:71]
Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.295667   87388 file.go:199]
Reading config file "/et...yaml"
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.296350   87388 common.go:62]
Generated UID "d71dc810....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.296367   87388 common.go:66]
Generated Name "master-....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.296379   87388 common.go:71]
Using namespace "kube-s....yaml
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.300194   87388 config.go:303]
Setting pods for source file
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.361625   87388 kubelet.go:1884]
SyncLoop (SYNC): 3 p...d33c)
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.361693   87388 config.go:100]
Looking for [api file]...e:{}]
Oct 14 23:15:48 buzz1.fyre.ibm.com<http://buzz1.fyre.ibm.com> atomic-
openshift-node[87388]: I1014 23:15:48.361716   87388 kubelet.go:1907]
SyncLoop (housekeeping)
Hint: Some lines were ellipsized, use -l to show in full.
[root buzz1 openshift-ansible]# ps -ef | grep calico
root      88872  87388  0 23:15 ?        00:00:00 /opt/cni/bin/calico
root      88975  74601  0 23:15 pts/0    00:00:00 grep --color=auto
calico
[root buzz1 openshift-ansible]#

It seemed that calico is extra here. Then using the same inventory
file, OCP 3.11 could be deployed on a clean VM successfully.
I guessed that uninstall playbook did not clear calico thoroughly.


On Oct 12, 2019, at 11:52 PM, Yu Wei <yu2003w hotmail com<mailto:
yu2003w hotmail com>> wrote:

Hi,
I tried to install OCP 3.11 with following variables set.
openshift_use_openshift_sdn=true
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant’

Some pods stuck on ‘ContainerCreating’.
[root buzz1 openshift-ansible]# oc get pods --all-namespaces
NAMESPACE               NAME                                    READY
    STATUS              RESTARTS   AGE
default                 docker-registry-1-
deploy                0/1       ContainerCreating   0          5h
default                 registry-console-1-
deploy               0/1       ContainerCreating   0          5h
kube-system             master-api-buzz1.center1.com<
http://master-api-buzz1.center1.com/>            1/1       Running   
         0          5h
kube-system             master-controllers-buzz1.center1.com<
http://master-controllers-buzz1.center1.com/>    1/1       Running   
         0          5h
kube-system             master-etcd-buzz1.center1.com<
http://master-etcd-buzz1.center1.com/>           1/1       Running   
         0          5h
openshift-node          sync-
x8j7d                              1/1       Running             0   
      5h
openshift-sdn           ovs-
ff7r7                               1/1       Running             0  
       5h
openshift-sdn           sdn-
7frfw                               1/1       Running             10 
       5h
openshift-web-console   webconsole-85494cdb8c-
s2dnh             0/1       ContainerCreating   0          5h

Run ‘oc describe pods’, I got something as below.

Events:

Type     Reason                  Age              From               
         Message
 ----     ------                  ----             --
--                         -------
 Warning  FailedCreatePodSandBox  2m               kubelet,
buzz1  Failed create pod sandbox: rpc error: code = Unknown desc =
[failed to set up sandbox container
"8570c350953e29185ef8ab05d628f90c6791a56ac392e40f2f6e30a14a76ab22"
network for pod "network-diag-test-pod-qz7hv": NetworkPlugin cni
failed to set up pod "network-diag-test-pod-qz7hv_network-diag-
global-ns-q7vbn" network: context deadline exceeded, failed to clean
up sandbox container
"8570c350953e29185ef8ab05d628f90c6791a56ac392e40f2f6e30a14a76ab22"
network for pod "network-diag-test-pod-qz7hv": NetworkPlugin cni
failed to teardown pod "network-diag-test-pod-qz7hv_network-diag-
global-ns-q7vbn" network: context deadline exceeded]
 Normal   SandboxChanged          2s (x8 over 2m)  kubelet,
buzz1  Pod sandbox changed, it will be killed and re-created.

This means the openshift-sdn network plugin is somehow not responding.
You'd want to get the container logs of the openshift-sdn container on
that node to debug further.

Dan

How could I resolve this problem?
Any thoughts?

Thanks,
Jared

_______________________________________________
users mailing list
users lists openshift redhat com<mailto:
users lists openshift redhat com>
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
dev mailing list
dev lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]