[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ocp4 cluster fails to initialize on GCP



Trevor,

    Responded to your email with the logs, but that is still pending moderator approval to be posted to the list (since the size of the message was above the size cap). While a moderator makes up his / her mind, I thought that I'd repeat some of my observations:

E1006 11:39:34.041543       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:39:34.041641       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
W1006 11:40:12.831667       1 reflector.go:289] k8s.io/client-go/informers/factory.go:133: watch of *v1.ConfigMap ended with: too old resource version: 17351 (19676)
W1006 11:40:30.881153       1 reflector.go:289] k8s.io/client-go/informers/factory.go:133: watch of *v1.ConfigMap ended with: too old resource version: 18133 (19758)
E1006 11:45:13.003476       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:45:13.003613       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
W1006 11:46:25.850256       1 reflector.go:289] k8s.io/client-go/informers/factory.go:133: watch of *v1.ConfigMap ended with: too old resource version: 19478 (21246)
E1006 11:46:26.883137       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:46:26.883217       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
W1006 11:47:00.837769       1 reflector.go:289] k8s.io/client-go/informers/factory.go:133: watch of *v1.ConfigMap ended with: too old resource version: 19826 (21404)
W1006 11:47:03.833901       1 reflector.go:289] k8s.io/client-go/informers/factory.go:133: watch of *v1.Deployment ended with: too old resource version: 13636 (14162)
E1006 11:47:28.977942       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:47:28.978052       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
E1006 11:47:29.001738       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:47:29.001853       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
E1006 11:47:29.031826       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:47:29.031924       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []
E1006 11:47:29.455882       1 status.go:71] RouteSyncProgressing FailedHost route is not available at canonical host []
E1006 11:47:29.456081       1 controller.go:129] {Console Console} failed with: route is not available at canonical host []

     This is a default install with three masters. Previously, I had done an install with one master, and even though that was hanging, the console route had been defined (and so, I imagine, there would have been an A record set up in DNS). Currently, I see an A record for the master, but not for the console. I'm wondering if there is perhaps a screwup on my DNS setup that is contributing to the problem.

    As of right now (probably another 30 mins past the install timing out), I still see the following messages in the cluster version operator log:

I1006 12:31:16.462865       1 sync_worker.go:745] Update error 294 of 432: ClusterOperatorNotAvailable Cluster operator console has not yet reported success (*errors.errorString: cluster operator console is not done; it is available=false, progressing=true, degraded=false)
I1006 12:31:16.462876       1 sync_worker.go:745] Update error 135 of 432: ClusterOperatorNotAvailable Cluster operator authentication is still updating (*errors.errorString: cluster operator authentication is still updating)
I1006 12:31:16.462884       1 sync_worker.go:745] Update error 260 of 432: ClusterOperatorNotAvailable Cluster operator monitoring is still updating (*errors.errorString: cluster operator monitoring is still updating)
I1006 12:31:16.462890       1 sync_worker.go:745] Update error 183 of 432: ClusterOperatorNotAvailable Cluster operator ingress is still updating (*errors.errorString: cluster operator ingress is still updating)
I1006 12:31:16.462897       1 sync_worker.go:745] Update error 170 of 432: ClusterOperatorNotAvailable Cluster operator image-registry is still updating (*errors.errorString: cluster operator image-registry is still updating)
E1006 12:31:16.462944       1 sync_worker.go:311] unable to synchronize image (waiting 2m52.525702462s): Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring

Regards,
Marvin

On Sat, Oct 5, 2019 at 6:22 PM W. Trevor King <wking redhat com> wrote:
On Sat, Oct 5, 2019 at 11:22 AM Just Marvin wrote:
> INFO Destroying the bootstrap resources...
> INFO Waiting up to 30m0s for the cluster at https://api.one.discworld.a.random.domain:6443 to initialize...
> FATAL failed to initialize the cluster: Working towards 4.2.0-0.nightly-2019-10-01-210901: 99% complete
> ...
>     How do I track down what went wrong. And at this point, is it just a matter of waiting for a while? Suppose I let it go for a few hours, will there be a way to see if the initialization did complete?

Might be.  You can launch additional waiters with 'openshift-install
wait-for install-complete'.  You can also see exactly what the
cluster-version operator is stuck on by looking in the cluster-version
operator pod's logs.  Or you can inspect the ClusterOperator resources
and see if any of the core operators has more-specific complaints.  Or
you can 'oc adm must-gather' to get a tarball of OpenStack components
to send to us if you'd rather have us poke around.

Cheers,
Trevor

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]