[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: clsut stopped working - certificate problems



Hi Simon,

we're run those playbooks and all certs are reported as still being valid.

Tim

On 31/03/2020 15:59, Simon Krenger wrote:
Hi Tim,

Note that there are multiple sets of certificates, both external and
internal. So it would be worth checking the certificates again using
the Certificate Expiration Playbooks (see link below). The
documentation also has an overview of what can be done to renew
certain certificates:

- [ Redeploying Certificates ]
   https://docs.okd.io/3.11/install_config/redeploying_certificates.html

Apart from checking all certificates, I'd certainly review the time
synchronisation for the whole cluster, as we see the message "x509:
certificate has expired or is not yet valid".

I hope this helps.

Kind regards
Simon

On Tue, Mar 31, 2020 at 4:33 PM Tim Dudgeon <tdudgeon ml gmail com> wrote:
One of our OKD 3.11 clusters has suddenly stopped working without any
obvious reason.

The origin-node service on the nodes does not start (times out).
The master-api pod is running on the master.
The nodes can access the master-api endpoints.

The logs of the master-api pod look mostly OK other than a huge number
of warnings about certificates that don't really make sense as the
certificates are valid (we use named certificates from let's Encryt and
they were renewed about 2 weeks ago and all appear to be correct.

Examples of errors from the master-api pod are:

I0331 12:46:57.065147       1 establishing_controller.go:73] Starting
EstablishingController
I0331 12:46:57.065561       1 logs.go:49] http: TLS handshake error from
192.168.160.17:58024: EOF
I0331 12:46:57.071932       1 logs.go:49] http: TLS handshake error from
192.168.160.19:48102: EOF
I0331 12:46:57.072036       1 logs.go:49] http: TLS handshake error from
192.168.160.19:37178: EOF
I0331 12:46:57.072141       1 logs.go:49] http: TLS handshake error from
192.168.160.17:58022: EOF

E0331 12:47:37.855023       1 memcache.go:147] couldn't get resource
list for metrics.k8s.io/v1beta1: the server is currently unable to
handle the request
E0331 12:47:37.856569       1 memcache.go:147] couldn't get resource
list for servicecatalog.k8s.io/v1beta1: the server is currently unable
to handle the request
E0331 12:47:44.115290       1 authentication.go:62] Unable to
authenticate the request due to an error: [x509: certificate has expired
or is not yet valid, x509: certificate
   has expired or is not yet valid]
E0331 12:47:44.118976       1 authentication.go:62] Unable to
authenticate the request due to an error: [x509: certificate has expired
or is not yet valid, x509: certificate
   has expired or is not yet valid]
E0331 12:47:44.122276       1 authentication.go:62] Unable to
authenticate the request due to an error: [x509: certificate has expired
or is not yet valid, x509: certificate
   has expired or is not yet valid]

Huge number of this second sort.

Any ideas what is wrong?



_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]