[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ocp 4.3 nightly install on openstack queens

Thanks for the tips, Joel, but no luck so far with 4.3.0-0.nightly-2019-12-13-180405.

After the following:

- destroy cluster
- copy backup install-config.yaml with my CA cert at additionalTrustBundle to empty osp-nightly/ dir
- generate manifests `openshift-install create manifests --dir osp-nightly`
- update osp-nightly/manifests/cluster-proxy-01-config.yaml setting spec/trustedCA/name=user-ca-bundle
- run install `openshift-install create cluster --dir=osp-nightly --log-level=debug`

I still see cert errors from machine-api controller

$ export KUBECONFIG=osp-nightly/auth/kubeconfig
$ oc logs -c machine-controller -f -n openshift-machine-api $(oc get pods -n openshift-machine-api  -l k8s-app=controller -o name)
I1214 07:34:19.124112       1 controller.go:164] Reconciling Machine "osp-nightly-rrzv5-worker-tk495"
I1214 07:34:19.124188       1 controller.go:376] Machine "osp-nightly-rrzv5-worker-tk495" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
E1214 07:34:19.132925       1 controller.go:279] Failed to check if machine "osp-nightly-rrzv5-worker-tk495" exists: Error checking if instance exists (machine/actuator.go 346):
Error getting a new instance service from the machine (machine/actuator.go 467): Create providerClient err: Post https://openstack.domain.com:13000/v3/auth/tokens: x509: certificate signed by unknown authority

I can confirm my cert is here:

$ oc get cm user-ca-bundle -n openshift-config -o json | jq -r '.data."ca-bundle.crt"'

And that the proxy received the configmap name from the custom manifest rather than default "":

$ oc get proxy cluster -o json | jq .spec.trustedCA
{"name": "user-ca-bundle"}

I'm stuck with 3 masters and no workers while installer says:

DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring

I guess I'll keep watching https://bugzilla.redhat.com/show_bug.cgi?id=1769879 and https://github.com/openshift/enhancements/pull/115 and running 3.11 :)

On Wed, Dec 4, 2019 at 9:29 PM Joel Pearson <japearson agiledigital com au> wrote:

On Wed, 4 Dec 2019 at 08:02, Dale Bewley <dale bewley net> wrote:

On Tue, Nov 26, 2019 at 7:29 PM Joel Pearson <japearson agiledigital com au> wrote:

Thanks for taking the time to reply, Joel. 
On Sat, 23 Nov 2019 at 13:21, Dale Bewley <dale bewley net> wrote:
I'm testing OCP 4.3 2019-11-19 nightly on OSP 13.

I added my CA cert [1] to install-config.yaml [3]  and the installer now progresses. I can even `oc get nodes` and see the masters. [2].

I still have the following errors and no worker nodes though.

ERROR Cluster operator authentication Degraded is True with RouteStatusDegradedFailedHost: RouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.osp-nightly.osp-nightly.domain.com: [] 

This sounds like ingress isn't deploying because the worker nodes are not deployed or your load balancer isn't making ingress available. Are your master nodes schedulable? Ie are your masters also workers? If not, then ingress won't deploy.

$ oc describe node osp-nightly-tfz6p-master-0 | grep -i schedul
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false

They are schedulable, but there are no matching tolerations in openshift-ingress/router-default deployment, so those pods are indeed stuck in _pending_ without any worker nodes.

How is your load balancer configured for 80/443 traffic? If the masters aren't targets of that, then even if ingress deploys you still won't be able to use any routes


This is likely a symptom of not yet having associated a floating IP to the app neutron port, and not having created an /etc/hosts entry on the installer host. I assume that's a nonfatal error.

I assume this one is fatal, however:

INFO Cluster operator image-registry Progressing is True with Error: Unable to apply resources: unable to sync storage configuration: Post https://openstack.domain.com:13000/v3/auth/tokens: x509: certificate signed by unknown authority

Have you added the CA that covers openstack.domain.com to install-config.yaml at .additionalTrustBundle like you mentioned in your previous post?


Otherwise you might need to edit Proxy config and set spec.trustedCA.name to  user-ca-bundle  

apiVersion: config.openshift.io/v1
kind: Proxy
  name: cluster
    name: user-ca-bundle  

I had to do this even though I don't have an explicit proxy. I do have a transparent proxy though, which was doing MITM, essentially breaking anything trying to talk to the internet.
Where did you make this change?

I did this before installation, for convenience mostly, after running "openshift-install create manifests --dir=ignition-files", I edited the ignition-files/manifests/cluster-proxy-01-config.yaml file.

Otherwise, it looks like you can do it after the fact using "oc edit proxies cluster", then you'll need to wait for the masters to reboot I think. Which for me sometimes takes like 10 minutes until it has done all of them.

FYI, I managed to find out what name to use to edit that proxy config by running "oc api-resources --api-group=config.openshift.io" and then finding the name for apigroup "config.openshift.io" and kind "Proxy".
I was going to try the 12/02 4.3 nightly build, but based on the following 2 blockers it doesn't look like it will work:

https://bugzilla.redhat.com/show_bug.cgi?id=1769879 Machine-api cannot create workers on osp envs installed with self-signed certs

There is a fair chance the above proxy config will fix this one
https://github.com/openshift/enhancements/pull/115 enhancements/x509-trust: Propose a new enhancement

I triggered this whole discussion from here: https://bugzilla.redhat.com/show_bug.cgi?id=1771564 originally, so the above proxy config should help.

It's disappointing that the 4.2 release notes claim that OpenStack is supported when it does not seem to be supported in what I presume to be the majority of OSP configurations. 

Is it safe to assume this BZ comment is related to that error? https://bugzilla.redhat.com/show_bug.cgi?id=1735192#c17

Bootstrap host has already been removed by the installer, so `openshift-install gather` does not seem usable, but the installer debug output can be found at  https://paste.fedoraproject.org/paste/SzIqAMU4DWHN3Bw3WDKfTQ

Any advice?


export KUBECONFIG=osp-nightly/auth/kubeconfig
$ oc get nodes
NAME                         STATUS    ROLES     AGE       VERSION
osp-nightly-tfz6p-master-0   Ready     master    102m      v1.16.2
osp-nightly-tfz6p-master-1   Ready     master    103m      v1.16.2
osp-nightly-tfz6p-master-2   Ready     master    103m      v1.16.2

[3] install-config.yaml
apiVersion: v1
baseDomain: ocp.domain.com
additionalTrustBundle: |
- hyperthreading: Enabled
  name: worker
        size: 10
  replicas: 3
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
  creationTimestamp: null
  name: osp-nightly
  - cidr:
    hostPrefix: 23
  networkType: OpenShiftSDN
    cloud: shiftstack
    computeFlavor: ocp4.worker.4x16
    externalDNS: null
    externalNetwork: floating
    octaviaSupport: "0"
    region: ""
    trunkSupport: "1"
publish: External
pullSecret: '{"...
sshKey: |
  ssh-rsa A...

users mailing list
users lists openshift redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]