[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: openshift dns: docker-registry-deploy times out, fails





On Tue, Jan 12, 2016 at 11:04 AM, Jon Cope <jcope redhat com> wrote:
Hi all,
I'm new to openshift and attempting to run a small proof of concept cluster. I can deploy openshift 3.1 using the quick install method without issue. The problem is that the docker-registry-1-deploy pod times out when it can't contact the master service.  I suspect openshift dns isn't working properly but I'm unsure how to diagnose it.

My end goal is to setup the docker-registry using a glusterfs pvc.

>From pod's log:
    [root master init-cluster]# oc logs docker-registry-1-deploy
    F0111 18:26:03.967979 1 deployer.go:65] couldn't get deployment default/docker-registry-1: Get https://master.rh71:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: dial tcp: lookup master.rh71: no such host

General info:

- 2 vm cluster; rhel server 7.2 (also reproducible on 7.1)
- ose 3.1
- docker 1.8.2

- /etc/hosts is set across both nodes:
    [root slave init-cluster]# cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    192.168.122.78 master.rh71
    192.168.122.52 slave.rh71

Just FYI, pods and direct docker containers don't inherit /etc/hosts from the node. In order to resolve it inside containers, DNS needs to resolve these names.
 

- /etc/resolv.conf looks right:
    [root slave ~]# cat /etc/resolv.conf
    # Generated by NetworkManager
    search rh71
    nameserver 192.168.122.1

- first indicator of something wrong:
    [root master ~]# oc get pods
    NAME READY STATUS RESTARTS AGE
    docker-registry-1-deploy 0/1 Error 0 59s

- describing the pod
[root master ~]# oc describe pods
Name: docker-registry-1-deploy
Namespace: default
Image(s): openshift3/ose-deployer:v3.1.0.4
Node: slave.rh71/192.168.122.52
Start Time: Mon, 11 Jan 2016 18:25:59 -0500
Labels: openshift.io/deployer-pod-for.name=docker-registry-1
Status: Failed
Reason:
Message:
IP:
Replication Controllers: <none>
Containers:
deployment:
Container ID: docker://674407a0ac9187245dbb4df45b70465091cfd5368315a1a569d5987e62be9785
Image: openshift3/ose-deployer:v3.1.0.4
Image ID: docker://9580a28b3e18c64cff56f96e3f777464431accde6c98b3765d9bfc5a7e619ea2
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 11 Jan 2016 18:26:02 -0500
Finished: Mon, 11 Jan 2016 18:26:04 -0500
Ready: False
Restart Count: 0
Environment Variables:
KUBERNETES_MASTER: https://master.rh71:8443
OPENSHIFT_MASTER: https://master.rh71:8443
BEARER_TOKEN_FILE: /var/run/secrets/kubernetes.io/serviceaccount/token
OPENSHIFT_CA_DATA: <left out to reduce the wall of text>

OPENSHIFT_DEPLOYMENT_NAME: docker-registry-1
OPENSHIFT_DEPLOYMENT_NAMESPACE: default
Conditions:
Type Status
Ready False
Volumes:
deployer-token-duq7o:
Type: Secret (a secret that should populate this volume)
SecretName: deployer-token-duq7o
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
23m 23m 1 {kubelet slave.rh71} implicitly required container POD Pulled Container image "openshift3/ose-pod:v3.1.0.4" already present on machine
23m 23m 1 {kubelet slave.rh71} implicitly required container POD Created Created with docker id 9ffdbb0f8e6c
23m 23m 1 {kubelet slave.rh71} implicitly required container POD Started Started with docker id 9ffdbb0f8e6c
23m 23m 1 {kubelet slave.rh71} spec.containers{deployment} Pulled Container image "openshift3/ose-deployer:v3.1.0.4" already present on machine
23m 23m 1 {kubelet slave.rh71} spec.containers{deployment} Created Created with docker id 674407a0ac91
23m 23m 1 {kubelet slave.rh71} spec.containers{deployment} Started Started with docker id 674407a0ac91
22m 22m 1 {kubelet slave.rh71} implicitly required container POD Killing Killing with docker id 9ffdbb0f8e6c
22m 22m 1 {kubelet slave.rh71} FailedSync Error syncing pod, skipping: failed to delete containers ([exit status 1])
19m 19m 1 {scheduler } Scheduled Successfully assigned docker-registry-1-deploy to slave.rh71


To test, I created a busybox container on the slave and attempted an nslookup of the master. It cannot resolve the server name.

[root slave ~]# docker run -it --rm busybox nslookup master.rh71
Server: 192.168.122.1
Address 1: 192.168.122.1

nslookup: can't resolve 'master.rh71'

Kubernetes inserts the SkyDNS (master) IP into /etc/resolv.conf for containers it owns. When you run a docker container directly, it doesn't get this, it just gets what the host has (and no /etc/hosts). So be aware those are different environments. You could try `oc run` to directly run an image (not sure how long that has existed).
 


Following instructions on configuring dnsmasq and openshift's skynds to coexist on the master allowed the nodes to perform nslookups but did nothing to fix the issue. Guide here: http://developerblog.redhat.com/2015/11/19/dns-your-openshift-v3-cluster/

If you've also set up /etc/hosts on your master, its dnsmasq server ought to be able to resolve the addresses (having read /etc/hosts). Is this the case? Check with dig:

dig master.rh71 @192.168.122.78 
 


Here I'm stumped. What else can I do to further diagnose the cause? It appears that openshift dns isn't working as expected however I'm lost as where to look next.

Appreciatively,
Jon

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]