[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Help debug "oc login" returning "401" / certificate issues

Hi Jason, 

Kindest thanks for trying to help. 

In order

1) Indeed, the "lb" host is configured (via dnsmasq) as a DNS forwarder, has the correct "/etc/hosts" (which is propagated to all the other hosts in the cluster), and all hosts have an entry pointing to it in the "/etc/resolv.conf"

2) A bit puzzled wrt "system:node" vs "system:anonymous"....

I've just test the corresponding curl call on another system where everything work as expected (at least so far...)  and the response I get back from a GET to " /api/v1/namespaces" still refers to "system:anonymous" , and not "system:node" 

Also, to make things even more weird, if I copy the node "kubeconfig" in the ".kube/config" I am identified accordingly (i.e. as "system:node") when doing an "oc whoami" 

3) Thanks for pointing out that specifying "HTTP_PROXY" / "HTTPS_PROXY" and resp "NO_PROXY" is not yet possible via the Ansible installer. 

My  remaining question is: Is there any way to debug the authentication process / why the "oc login" with "httpasswd" back end doesn't work ?

Thanks again,


On Thu, Feb 25, 2016 at 10:30 PM, Jason DeTiberus <jdetiber redhat com> wrote:

On Thu, Feb 25, 2016 at 10:54 AM, Florian Daniel Otel <florian otel gmail com> wrote:

Hello all,

I have the following problems:

I have a multimaster OSE setup consisting of the following:
- A LB with "native" HA
- Three masters (doubling as "etcd" nodes)
- Two nodes

All the hosts are themselves OpenStack instances (hence the ".novalocal" suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host doubling as DNS forwarder (via dnsmasq). All Internet access is via an http / https proxy.

So, if I'm understanding this correctly, then the lb host is correctly resolving the dns for all of the *.novalocal addresses that are in use by the cluster and all of the hosts are pre-configured to use the lb host as the dns resolver prior to running the installation? If not, then there will definitely be issues, since /etc/hosts is not used by deployed containers.

After many attempts we finally get a setup that is somewhat working (see P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file. Installation is from the main "openshift-ansible" repo (https://github.com/openshift/openshift-ansible)

My problem:

After installation, on one master I created two users in "/etc/origin/htpasswd". After creation I have propagated the file to all the other masters. UNIX permissions to the file on all masters are "0600"

However, doing an "oc login" returns a "401 Unauthorized", and I cannot find what the issue is, or how to debug it (no trace for it in the "atomic-openshift-master-api" or "atomic-openshift-master-controllers" logs).  

[root az1node01 ~]# oc login
Authentication required for https://az1lb01.mydomain.novalocal:8443 (openshift)
Username: reguser
Login failed (401 Unauthorized)

The puzzling thing is that using the "system:node" certificates and keys work (in the sense I am identified as "system:anonymous"):

Something is definitely not right here, the user for the system:node certs should be identified as the system:node user and not anonymous. I suspect that there is a larger issue at play here.

It looks like the initial cluster creation may have had issues...  The atomic-openshift-master-api logs should provide more insight into what may have gone wrong.

curl -v --cacert  /etc/origin/node/ca.crt --cert "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key" https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces
* About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
*   Trying
* Connected to az1lb01.mydomain.novalocal ( port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/origin/node/ca.crt
  CApath: none
* NSS: client certificate not found: /etc/origin/node/system
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=
*       start date: Feb 24 19:40:56 2016 GMT
*       expire date: Feb 23 19:40:57 2018 GMT
*       common name:
*       issuer: CN=openshift-signer 1456342841
> GET /api/v1/namespaces HTTP/1.1
> User-Agent: curl/7.29.0
> Host: az1lb01.mydomain.novalocal:8443
> Accept: */*
< HTTP/1.1 403 Forbidden
< Cache-Control: no-store
< Content-Type: application/json
< Date: Thu, 25 Feb 2016 14:42:41 GMT
< Content-Length: 255
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:anonymous\" cannot list all namespaces in the cluster",
  "reason": "Forbidden",
  "details": {
    "kind": "namespaces"
  "code": 403
* Connection #0 to host az1lb01.mydomain.novalocal left intact

Attached is also the master configuration file for one master.

My questions:

- I had many issues in getting the installation working, mostly due to the Ansible installer reading the OpenStack instance metadata, and inconsistencies btw. that and the "hostname".

  Is there any particular repo / branch of the installer that is known to work in this particular setup ? Any particular settings I should use in the Ansible hosts file ?

  I suspect the certificate issues I'm encountering is because of that (in combination with the proxy) but I'm not sure.

- Operating behind an HTTP / HTTPS proxy: Even before starting the Ansible installer, Docker was (properly) configured to the HTTP / HTTPS proxy settings, and working correctly. However, for the installer itself I found no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly, the "NO_PROXY" settings.  For that I'm relying on exported environment variables in the shell. Is there a "proper" way to do this via the installer itself.

There is an openshift-ansible PR to expose this directly (https://github.com/openshift/openshift-ansible/pull/1385)

  Post installer I have manually added those settings into "/etc/sysconfig/atomic-openshift-master", "/etc/sysconfig/atomic-openshift-master-controllers", "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do this via the installer itself.

- Is there an issue with the masters doubling as "etcd" nodes ?

No, there should not be any issues with co-locating the etcd service alongside the masters.

The most frustrating part  is that I have this very setup working perfectly fine in a public cloud environment (namely on GCE) , but with the (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts instead of 6), and with unproxied Internet access.... However, that installation is from a different repo branch (namely from "https://github.com/detiber/openshift-ansible" from the "gceFixes" branch )

I *believe* all of the fixes from gceFixes have been merged into master at this point.

Thanks a lot for the help,


P.S. The weirdest case wrt certificates is when trying to check the "etcd" cluster:

[root az1master01 ~]# etcdctl --debug  -C https://az1master01.mydomain.novalocal:2379,https://az3master02.mydomain.novalocal:2379,https://az3master03.mydomain.novalocal:2379 --ca-file /etc/origin/master/ca.crt  --cert-file /etc/origin/master/master.etcd-client.crt     --key-file /etc/origin/master/master.etcd-client.key cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate signed by unknown authority
error #1: x509: certificate signed by unknown authority
error #2: x509: certificate signed by unknown authority

You need to use the etcd ca cert here: etcdctl --debug  -C https://az1master01.mydomain.novalocal:2379,https://az3master02.mydomain.novalocal:2379,https://az3master03.mydomain.novalocal:2379 --ca-file /etc/origin/master/master.etcd-ca.crt  --cert-file /etc/origin/master/master.etcd-client.crt     --key-file /etc/origin/master/master.etcd-client.key cluster-health

Attempting doing a direct curl to the "etcd"

[root az1master01 ~]# curl -v   --cacert /etc/origin/master/ca.crt --cert /etc/origin/master/master.etcd-client.crt     --key /etc/origin/master/master.etcd-client.key  https://az1master01.mydomain.novalocal:2379/v2/members
* About to connect() to az1master01.mydomain.novalocal port 2379 (#0)
*   Trying
* Connected to az1master01.mydomain.novalocal ( port 2379 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/origin/master/ca.crt
  CApath: none
* Server certificate:
* subject: CN=az1master01.mydomain.novalocal
* start date: Feb 24 19:38:07 2016 GMT
* expire date: Feb 23 19:38:07 2017 GMT
* common name: az1master01.mydomain.novalocal
* issuer: CN=etcd-signer 1456342665
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
[root az1master01 ~]#

users mailing list
users lists openshift redhat com

Jason DeTiberus

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]