[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: openshift-ansible release-3.10 - Install fails with control plane pods



Hi,
I've this issue reproduceably after uninstalling a (failed/completed) installation and then reinstalling. It is however solved by rebooting all involved nodes/masters so I did not investigate further.

Greetings
Klaas

On 31.08.2018 21:26, Marc Schlegel wrote:
Sure, see attached.

Before each attempt I pull the latest release-3.10 branch for openshift-ansible.

@Scott Dodson: I am going to investigate again using your suggestions.

Marc,

Is it possible to share  your ansible inventory file to review your
openshift installation? I know there are some changes in 3.10 installation
and might reflect in the inventory.

On Thu, Aug 30, 2018 at 3:37 PM Marc Schlegel <marc schlegel gmx de> wrote:

Thanks for the link. It looks like the api-pod is not getting up at all!

Log from k8s_controllers_master-controllers-*

[vagrant master ~]$ sudo docker logs
k8s_controllers_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_1
E0830 18:28:05.787358       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:594:
Failed to list *v1.Pod: Get
https://master.vnet.de:8443/api/v1/pods?fieldSelector=spec.schedulerName%3Ddefault-scheduler%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:05.788589       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.ReplicationController: Get
https://master.vnet.de:8443/api/v1/replicationcontrollers?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:05.804239       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.Node: Get
https://master.vnet.de:8443/api/v1/nodes?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:05.806879       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1beta1.StatefulSet: Get
https://master.vnet.de:8443/apis/apps/v1beta1/statefulsets?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:05.808195       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1beta1.PodDisruptionBudget: Get
https://master.vnet.de:8443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:06.673507       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.PersistentVolume: Get
https://master.vnet.de:8443/api/v1/persistentvolumes?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:06.770141       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1beta1.ReplicaSet: Get
https://master.vnet.de:8443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:06.773878       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.Service: Get
https://master.vnet.de:8443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:06.778204       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.StorageClass: Get
https://master.vnet.de:8443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0830 18:28:06.784874       1 reflector.go:205]
github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
Failed to list *v1.PersistentVolumeClaim: Get
https://master.vnet.de:8443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0:
dial tcp 127.0.0.1:8443: getsockopt: connection refused

The log is full with those. Since it is all about api, I tried to get the
logs from k8s_POD_master-api-master.vnet.de_kube-system_* which is
completely empty :-/

[vagrant master ~]$ sudo docker logs
k8s_POD_master-api-master.vnet.de_kube-system_86017803919d833e39cb3d694c249997_1
[vagrant master ~]$

Is there any special prerequisite about the api-pod?

regards
Marc


Marc,

could you please look over the issue [1] and pull the master pod logs and
see if you bumped into same issue mentioned by the other folks?
Also make sure the openshift-ansible release is the latest one.

Dani

[1] https://github.com/openshift/openshift-ansible/issues/9575

On Wed, Aug 29, 2018 at 7:36 PM Marc Schlegel <marc schlegel gmx de>
wrote:
Hello everyone

I am having trouble getting a working Origin 3.10 installation using
the
openshift-ansible installer. My install always fails because the
control
pane pods are not available. I've checkout the release-3.10 branch from
openshift-ansible and configured the inventory accordingly


TASK [openshift_control_plane : Start and enable self-hosting node]
******************
changed: [master]
TASK [openshift_control_plane : Get node logs]
*******************************
skipping: [master]
TASK [openshift_control_plane : debug]
******************************************
skipping: [master]
TASK [openshift_control_plane : fail]
*********************************************
skipping: [master]
TASK [openshift_control_plane : Wait for control plane pods to appear]
***************

failed: [master] (item=etcd) => {"attempts": 60, "changed": false,
"item":
"etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-master.vnet.de -o
json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The
connection to the server master.vnet.de:8443 was refused - did you
specify the right host or port?\n", "stdout": ""}}

TASK [openshift_control_plane : Report control plane errors]
*************************
fatal: [master]: FAILED! => {"changed": false, "msg": "Control plane
pods
didn't come up"}


I am using Vagrant to setup a local domain (vnet.de) which also
includes
a dnsmasq-node to have full control over the dns. The following VMs are
running and DNS ans SSH works as expected

Hostname             IP
domain.vnet.de   192.168.60.100
master.vnet.de    192.168.60.150 (dns also works for openshift.vnet.de
which is configured as openshift_master_cluster_public_hostname) also
runs
etcd
infra.vnet.de        192.168.60.151
(openshift_master_default_subdomain
wildcard points to this node)
app1.vnet.de        192.168.60.152
app2.vnet.de        192.168.60.153


When connecting to the master-node I can see that several
docker-instances
are up and running

[vagrant master ~]$ sudo docker ps
CONTAINER ID        IMAGE                                    COMMAND
             CREATED             STATUS              PORTS
  NAMES

9a0844123909        ff5dd2137a4f                             "/bin/sh
-c
'#!/bi..."   19 minutes ago      Up 19 minutes

k8s_etcd_master-etcd-master.vnet.de_kube-system_a2c858fccd481c334a9af7413728e203_0
41d803023b72        f216d84cdf54
  "/bin/bash -c
'#!/..."   19 minutes ago      Up 19 minutes

k8s_controllers_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_0
044c9d12588c        docker.io/openshift/origin-pod:v3.10.0
  "/usr/bin/pod"           19 minutes ago      Up 19 minutes


k8s_POD_master-api-master.vnet.de_kube-system_86017803919d833e39cb3d694c249997_0
10a197e394b3        docker.io/openshift/origin-pod:v3.10.0
  "/usr/bin/pod"           19 minutes ago      Up 19 minutes


k8s_POD_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_0
20f4f86bdd07        docker.io/openshift/origin-pod:v3.10.0
  "/usr/bin/pod"           19 minutes ago      Up 19 minutes


k8s_POD_master-etcd-master.vnet.de_kube-system_a2c858fccd481c334a9af7413728e203_0

However, there is no port 8443 open on the master-node. No wonder the
ansible-installer complains.

The machines are using a plain Centos 7.5 and I've run the
openshift-ansible/playbooks/prerequisites.yml first and then
openshift-ansible/playbooks/deploy_cluster.yml.
I've double-checked the installation documentation and my Vagrant
config...all looks correct.

Any ideas/advice?
regards
Marc


_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users




_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users





_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]