I have been experiencing DNS lookup failures. This is preventing production deployment of Openshift.
I see it in two cases, lookup of a remote docker registry and lookup of a ldap service. Both of these are not local to the server(s) in question but local to internal DNS servers.
The ldap case is easier for me to replicate as I just need to attempt to login.
Feb 20 11:21:16 lab-stack1 atomic-openshift-master-api: E0220 11:21:16.924930 2005 login.go:176] Error authenticating "XXXX" with provider "ldap": LDAP Result Code 200 "": dial tcp: lookup ldap.xxx.xxx on xxx.xxx.xxx.xxx:53: no such host
Officiated the user, provider name and host for security.
On xxx.xxx.xxx.xxx:53 is the master node which is running dnsmasq with the default configuration provided via openshift-ansible installation.
These get resolved for a while if I go on a host and do ‘host ldap.xxx.xxx’. It then works for a while and then reverts.
features: Basic-Auth GSSAPI Kerberos SPNEGO
What are the next steps to try. Using dig or host on the node in question always returns a valid lookup result.