[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Help debug "oc login" returning "401" / certificate issues

Hello all,

I have the following problems:

I have a multimaster OSE setup consisting of the following:
- A LB with "native" HA
- Three masters (doubling as "etcd" nodes)
- Two nodes

All the hosts are themselves OpenStack instances (hence the ".novalocal" suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host doubling as DNS forwarder (via dnsmasq). All Internet access is via an http / https proxy.

After many attempts we finally get a setup that is somewhat working (see P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file. Installation is from the main "openshift-ansible" repo (https://github.com/openshift/openshift-ansible)

My problem:

After installation, on one master I created two users in "/etc/origin/htpasswd". After creation I have propagated the file to all the other masters. UNIX permissions to the file on all masters are "0600"

However, doing an "oc login" returns a "401 Unauthorized", and I cannot find what the issue is, or how to debug it (no trace for it in the "atomic-openshift-master-api" or "atomic-openshift-master-controllers" logs). 

[root az1node01 ~]# oc login
Authentication required for https://az1lb01.mydomain.novalocal:8443 (openshift)
Username: reguser
Login failed (401 Unauthorized)

The puzzling thing is that using the "system:node" certificates and keys work (in the sense I am identified as "system:anonymous"):

curl -v --cacert  /etc/origin/node/ca.crt --cert "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key" https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces
* About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
*   Trying
* Connected to az1lb01.mydomain.novalocal ( port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/origin/node/ca.crt
  CApath: none
* NSS: client certificate not found: /etc/origin/node/system
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=
*       start date: Feb 24 19:40:56 2016 GMT
*       expire date: Feb 23 19:40:57 2018 GMT
*       common name:
*       issuer: CN=openshift-signer 1456342841
> GET /api/v1/namespaces HTTP/1.1
> User-Agent: curl/7.29.0
> Host: az1lb01.mydomain.novalocal:8443
> Accept: */*
< HTTP/1.1 403 Forbidden
< Cache-Control: no-store
< Content-Type: application/json
< Date: Thu, 25 Feb 2016 14:42:41 GMT
< Content-Length: 255
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "User \"system:anonymous\" cannot list all namespaces in the cluster",
  "reason": "Forbidden",
  "details": {
    "kind": "namespaces"
  "code": 403
* Connection #0 to host az1lb01.mydomain.novalocal left intact

Attached is also the master configuration file for one master.

My questions:

- I had many issues in getting the installation working, mostly due to the Ansible installer reading the OpenStack instance metadata, and inconsistencies btw. that and the "hostname".

  Is there any particular repo / branch of the installer that is known to work in this particular setup ? Any particular settings I should use in the Ansible hosts file ?

  I suspect the certificate issues I'm encountering is because of that (in combination with the proxy) but I'm not sure.

- Operating behind an HTTP / HTTPS proxy: Even before starting the Ansible installer, Docker was (properly) configured to the HTTP / HTTPS proxy settings, and working correctly. However, for the installer itself I found no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly, the "NO_PROXY" settings.  For that I'm relying on exported environment variables in the shell. Is there a "proper" way to do this via the installer itself.

  Post installer I have manually added those settings into "/etc/sysconfig/atomic-openshift-master", "/etc/sysconfig/atomic-openshift-master-controllers", "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do this via the installer itself.

- Is there an issue with the masters doubling as "etcd" nodes ?

The most frustrating part  is that I have this very setup working perfectly fine in a public cloud environment (namely on GCE) , but with the (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts instead of 6), and with unproxied Internet access.... However, that installation is from a different repo branch (namely from "https://github.com/detiber/openshift-ansible" from the "gceFixes" branch )

Thanks a lot for the help,


P.S. The weirdest case wrt certificates is when trying to check the "etcd" cluster:

[root az1master01 ~]# etcdctl --debug  -C https://az1master01.mydomain.novalocal:2379,https://az3master02.mydomain.novalocal:2379,https://az3master03.mydomain.novalocal:2379 --ca-file /etc/origin/master/ca.crt  --cert-file /etc/origin/master/master.etcd-client.crt     --key-file /etc/origin/master/master.etcd-client.key cluster-health
Cluster-Endpoints: https://az3master02.mydomain.novalocal:2379, https://az1master01.mydomain.novalocal:2379, https://az3master03.mydomain.novalocal:2379
cURL Command: curl -X GET https://az3master02.mydomain.novalocal:2379/v2/members
cURL Command: curl -X GET https://az1master01.mydomain.novalocal:2379/v2/members
cURL Command: curl -X GET https://az3master03.mydomain.novalocal:2379/v2/members
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate signed by unknown authority
error #1: x509: certificate signed by unknown authority
error #2: x509: certificate signed by unknown authority

Attempting doing a direct curl to the "etcd"

[root az1master01 ~]# curl -v   --cacert /etc/origin/master/ca.crt --cert /etc/origin/master/master.etcd-client.crt     --key /etc/origin/master/master.etcd-client.key  https://az1master01.mydomain.novalocal:2379/v2/members
* About to connect() to az1master01.mydomain.novalocal port 2379 (#0)
*   Trying
* Connected to az1master01.mydomain.novalocal ( port 2379 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/origin/master/ca.crt
  CApath: none
* Server certificate:
* subject: CN=az1master01.mydomain.novalocal
* start date: Feb 24 19:38:07 2016 GMT
* expire date: Feb 23 19:38:07 2017 GMT
* common name: az1master01.mydomain.novalocal
* issuer: CN=etcd-signer 1456342665
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
[root az1master01 ~]#

Attachment: ansible-hosts-mydomain
Description: Binary data

Attachment: az1master01--master-config.yaml
Description: Binary data

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]