[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: openshift-ansible release-3.10 - Install fails with control plane pods



Hello everyone

I was finally able to resolve the issue with the control plane.

The problem was caused by the master pod which was not able to connect to the etcd pod because the hostname always resolved to 127.0.0.1 and not the local cluster ip. This was due to the Vagrant box I used, and could be resolved by making sure that /etc/hosts only contained the localhost 127.0.0.1 entry.

Now the installer gets past the control-plane-check.

Unfortunately the next issue arises when the installer waits for the "catalog api server".  The command "curl -k https://apiserver.kube-service-catalog.svc/healthz"; cannot connect because the installer only adds "cluster.local" to resolv.conf.
Either the installer makes sure that any service with .svc gets resolved as well (my current workaround, by adding server=/svc/172.30.0.1 to /etc/dnsmasq.d/origin-upstream-dns.conf), or all services get the hostname ending on "cluster.local"


Am Freitag, 31. August 2018, 21:15:12 CEST schrieben Sie:
> The dependency chain for control plane is node then etcd then api then
> controllers. From your previous post it looks like there's no apiserver
> running. I'd look into what's wrong there.
> 
> Check `master-logs api api` if that doesn't provide you any hints then
> check the logs for the node service but I can't think of anything that
> would fail there yet result in successfully starting the controller pods.
> The apiserver and controller pods use the same image. Each pod will have
> two containers, the k8s_POD containers are rarely interesting.
> 
> On Thu, Aug 30, 2018 at 2:37 PM Marc Schlegel <marc schlegel gmx de> wrote:
> 
> > Thanks for the link. It looks like the api-pod is not getting up at all!
> >
> > Log from k8s_controllers_master-controllers-*
> >
> > [vagrant master ~]$ sudo docker logs
> > k8s_controllers_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_1
> > E0830 18:28:05.787358       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:594:
> > Failed to list *v1.Pod: Get
> > https://master.vnet.de:8443/api/v1/pods?fieldSelector=spec.schedulerName%3Ddefault-scheduler%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:05.788589       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.ReplicationController: Get
> > https://master.vnet.de:8443/api/v1/replicationcontrollers?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:05.804239       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.Node: Get
> > https://master.vnet.de:8443/api/v1/nodes?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:05.806879       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1beta1.StatefulSet: Get
> > https://master.vnet.de:8443/apis/apps/v1beta1/statefulsets?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:05.808195       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1beta1.PodDisruptionBudget: Get
> > https://master.vnet.de:8443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:06.673507       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.PersistentVolume: Get
> > https://master.vnet.de:8443/api/v1/persistentvolumes?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:06.770141       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1beta1.ReplicaSet: Get
> > https://master.vnet.de:8443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:06.773878       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.Service: Get
> > https://master.vnet.de:8443/api/v1/services?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:06.778204       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.StorageClass: Get
> > https://master.vnet.de:8443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> > E0830 18:28:06.784874       1 reflector.go:205]
> > github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:87:
> > Failed to list *v1.PersistentVolumeClaim: Get
> > https://master.vnet.de:8443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0:
> > dial tcp 127.0.0.1:8443: getsockopt: connection refused
> >
> > The log is full with those. Since it is all about api, I tried to get the
> > logs from k8s_POD_master-api-master.vnet.de_kube-system_* which is
> > completely empty :-/
> >
> > [vagrant master ~]$ sudo docker logs
> > k8s_POD_master-api-master.vnet.de_kube-system_86017803919d833e39cb3d694c249997_1
> > [vagrant master ~]$
> >
> > Is there any special prerequisite about the api-pod?
> >
> > regards
> > Marc
> >
> >
> > > Marc,
> > >
> > > could you please look over the issue [1] and pull the master pod logs and
> > > see if you bumped into same issue mentioned by the other folks?
> > > Also make sure the openshift-ansible release is the latest one.
> > >
> > > Dani
> > >
> > > [1] https://github.com/openshift/openshift-ansible/issues/9575
> > >
> > > On Wed, Aug 29, 2018 at 7:36 PM Marc Schlegel <marc schlegel gmx de>
> > wrote:
> > >
> > > > Hello everyone
> > > >
> > > > I am having trouble getting a working Origin 3.10 installation using
> > the
> > > > openshift-ansible installer. My install always fails because the
> > control
> > > > pane pods are not available. I've checkout the release-3.10 branch from
> > > > openshift-ansible and configured the inventory accordingly
> > > >
> > > >
> > > > TASK [openshift_control_plane : Start and enable self-hosting node]
> > > > ******************
> > > > changed: [master]
> > > > TASK [openshift_control_plane : Get node logs]
> > > > *******************************
> > > > skipping: [master]
> > > > TASK [openshift_control_plane : debug]
> > > > ******************************************
> > > > skipping: [master]
> > > > TASK [openshift_control_plane : fail]
> > > > *********************************************
> > > > skipping: [master]
> > > > TASK [openshift_control_plane : Wait for control plane pods to appear]
> > > > ***************
> > > >
> > > > failed: [master] (item=etcd) => {"attempts": 60, "changed": false,
> > "item":
> > > > "etcd", "msg": {"cmd": "/bin/oc get pod master-etcd-master.vnet.de -o
> > > > json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The
> > > > connection to the server master.vnet.de:8443 was refused - did you
> > > > specify the right host or port?\n", "stdout": ""}}
> > > >
> > > > TASK [openshift_control_plane : Report control plane errors]
> > > > *************************
> > > > fatal: [master]: FAILED! => {"changed": false, "msg": "Control plane
> > pods
> > > > didn't come up"}
> > > >
> > > >
> > > > I am using Vagrant to setup a local domain (vnet.de) which also
> > includes
> > > > a dnsmasq-node to have full control over the dns. The following VMs are
> > > > running and DNS ans SSH works as expected
> > > >
> > > > Hostname             IP
> > > > domain.vnet.de   192.168.60.100
> > > > master.vnet.de    192.168.60.150 (dns also works for openshift.vnet.de
> > > > which is configured as openshift_master_cluster_public_hostname) also
> > runs
> > > > etcd
> > > > infra.vnet.de        192.168.60.151
> > (openshift_master_default_subdomain
> > > > wildcard points to this node)
> > > > app1.vnet.de        192.168.60.152
> > > > app2.vnet.de        192.168.60.153
> > > >
> > > >
> > > > When connecting to the master-node I can see that several
> > docker-instances
> > > > are up and running
> > > >
> > > > [vagrant master ~]$ sudo docker ps
> > > > CONTAINER ID        IMAGE                                    COMMAND
> > > >             CREATED             STATUS              PORTS
> > > >  NAMES
> > > >
> > > > 9a0844123909        ff5dd2137a4f                             "/bin/sh
> > -c
> > > > '#!/bi..."   19 minutes ago      Up 19 minutes
> > > >
> > k8s_etcd_master-etcd-master.vnet.de_kube-system_a2c858fccd481c334a9af7413728e203_0
> > > >
> > > > 41d803023b72        f216d84cdf54
> >  "/bin/bash -c
> > > > '#!/..."   19 minutes ago      Up 19 minutes
> > > >
> > k8s_controllers_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_0
> > > >
> > > > 044c9d12588c        docker.io/openshift/origin-pod:v3.10.0
> > > >  "/usr/bin/pod"           19 minutes ago      Up 19 minutes
> > > >
> > > >
> > k8s_POD_master-api-master.vnet.de_kube-system_86017803919d833e39cb3d694c249997_0
> > > >
> > > > 10a197e394b3        docker.io/openshift/origin-pod:v3.10.0
> > > >  "/usr/bin/pod"           19 minutes ago      Up 19 minutes
> > > >
> > > >
> > k8s_POD_master-controllers-master.vnet.de_kube-system_a3c3ca56f69ed817bad799176cba5ce8_0
> > > >
> > > > 20f4f86bdd07        docker.io/openshift/origin-pod:v3.10.0
> > > >  "/usr/bin/pod"           19 minutes ago      Up 19 minutes
> > > >
> > > >
> > k8s_POD_master-etcd-master.vnet.de_kube-system_a2c858fccd481c334a9af7413728e203_0
> > > >
> > > >
> > > > However, there is no port 8443 open on the master-node. No wonder the
> > > > ansible-installer complains.
> > > >
> > > > The machines are using a plain Centos 7.5 and I've run the
> > > > openshift-ansible/playbooks/prerequisites.yml first and then
> > > > openshift-ansible/playbooks/deploy_cluster.yml.
> > > > I've double-checked the installation documentation and my Vagrant
> > > > config...all looks correct.
> > > >
> > > > Any ideas/advice?
> > > > regards
> > > > Marc
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users lists openshift redhat com
> > > > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> > > >
> > >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users lists openshift redhat com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
> 





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]