1) Indeed, the "lb" host is configured (via dnsmasq) as a DNS forwarder, has the correct "/etc/hosts" (which is propagated to all the other hosts in the cluster), and all hosts have an entry pointing to it in the "/etc/resolv.conf"
2) A bit puzzled wrt "system:node" vs "system:anonymous"....
I've just test the corresponding curl call on another system where everything work as expected (at least so far...) and the response I get back from a GET to " /api/v1/namespaces" still refers to "system:anonymous" , and not "system:node"
Also, to make things even more weird, if I copy the node "kubeconfig" in the ".kube/config" I am identified accordingly (i.e. as "system:node") when doing an "oc whoami"
I'm probably missing something with the way that the node identifies itself when using client certificate authentication, I'm seeing the same behavior on a system I have that is functioning as expected.
3) Thanks for pointing out that specifying "HTTP_PROXY" / "HTTPS_PROXY" and resp "NO_PROXY" is not yet possible via the Ansible installer.
My remaining question is: Is there any way to debug the authentication process / why the "oc login" with "httpasswd" back end doesn't work ?
You will most likely need to increase the logging level to see authentication logs for the api service. In /etc/sysconfig/atomic-openshift-master-api. increasing the loglevel to 4 should provide output around the authentication failure.
I have a multimaster OSE setup consisting of the following:
- A LB with "native" HA
- Three masters (doubling as "etcd" nodes)
- Two nodes
All the hosts are themselves OpenStack instances (hence the ".novalocal" suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host doubling as DNS forwarder (via dnsmasq). All Internet access is via an http / https proxy.
So, if I'm understanding this correctly, then the lb host is correctly resolving the dns for all of the *.novalocal addresses that are in use by the cluster and all of the hosts are pre-configured to use the lb host as the dns resolver prior to running the installation? If not, then there will definitely be issues, since /etc/hosts is not used by deployed containers.
After many attempts we finally get a setup that is somewhat working (see P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file. Installation is from the main "openshift-ansible" repo (https://github.com/openshift/openshift-ansible)
After installation, on one master I created two users in "/etc/origin/htpasswd". After creation I have propagated the file to all the other masters. UNIX permissions to the file on all masters are "0600"
However, doing an "oc login" returns a "401 Unauthorized", and I cannot find what the issue is, or how to debug it (no trace for it in the "atomic-openshift-master-api" or "atomic-openshift-master-controllers" logs).
* About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
* Trying 10.0.0.31...
* Connected to az1lb01.mydomain.novalocal (10.0.0.31) port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/origin/node/ca.crt
* NSS: client certificate not found: /etc/origin/node/system
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=10.0.0.24
* start date: Feb 24 19:40:56 2016 GMT
* expire date: Feb 23 19:40:57 2018 GMT
* common name: 10.0.0.24
* issuer: CN=openshift-signer 1456342841
> GET /api/v1/namespaces HTTP/1.1
> User-Agent: curl/7.29.0
> Host: az1lb01.mydomain.novalocal:8443
> Accept: */*
< HTTP/1.1 403 Forbidden
< Cache-Control: no-store
< Content-Type: application/json
< Date: Thu, 25 Feb 2016 14:42:41 GMT
< Content-Length: 255
"message": "User \"system:anonymous\" cannot list all namespaces in the cluster",
* Connection #0 to host az1lb01.mydomain.novalocal left intact
Attached is also the master configuration file for one master.
- I had many issues in getting the installation working, mostly due to the Ansible installer reading the OpenStack instance metadata, and inconsistencies btw. that and the "hostname".
Is there any particular repo / branch of the installer that is known to work in this particular setup ? Any particular settings I should use in the Ansible hosts file ?
I suspect the certificate issues I'm encountering is because of that (in combination with the proxy) but I'm not sure.
- Operating behind an HTTP / HTTPS proxy: Even before starting the Ansible installer, Docker was (properly) configured to the HTTP / HTTPS proxy settings, and working correctly. However, for the installer itself I found no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly, the "NO_PROXY" settings. For that I'm relying on exported environment variables in the shell. Is there a "proper" way to do this via the installer itself.
Post installer I have manually added those settings into "/etc/sysconfig/atomic-openshift-master", "/etc/sysconfig/atomic-openshift-master-controllers", "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do this via the installer itself.
- Is there an issue with the masters doubling as "etcd" nodes ?
No, there should not be any issues with co-locating the etcd service alongside the masters.
The most frustrating part is that I have this very setup working perfectly fine in a public cloud environment (namely on GCE) , but with the (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts instead of 6), and with unproxied Internet access.... However, that installation is from a different repo branch (namely from "https://github.com/detiber/openshift-ansible" from the "gceFixes" branch )
I *believe* all of the fixes from gceFixes have been merged into master at this point.
Thanks a lot for the help,
P.S. The weirdest case wrt certificates is when trying to check the "etcd" cluster: