[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: clsut stopped working - certificate problems



Hi Tim,

Note that there are multiple sets of certificates, both external and
internal. So it would be worth checking the certificates again using
the Certificate Expiration Playbooks (see link below). The
documentation also has an overview of what can be done to renew
certain certificates:

- [ Redeploying Certificates ]
  https://docs.okd.io/3.11/install_config/redeploying_certificates.html

Apart from checking all certificates, I'd certainly review the time
synchronisation for the whole cluster, as we see the message "x509:
certificate has expired or is not yet valid".

I hope this helps.

Kind regards
Simon

On Tue, Mar 31, 2020 at 4:33 PM Tim Dudgeon <tdudgeon ml gmail com> wrote:
>
> One of our OKD 3.11 clusters has suddenly stopped working without any
> obvious reason.
>
> The origin-node service on the nodes does not start (times out).
> The master-api pod is running on the master.
> The nodes can access the master-api endpoints.
>
> The logs of the master-api pod look mostly OK other than a huge number
> of warnings about certificates that don't really make sense as the
> certificates are valid (we use named certificates from let's Encryt and
> they were renewed about 2 weeks ago and all appear to be correct.
>
> Examples of errors from the master-api pod are:
>
> I0331 12:46:57.065147       1 establishing_controller.go:73] Starting
> EstablishingController
> I0331 12:46:57.065561       1 logs.go:49] http: TLS handshake error from
> 192.168.160.17:58024: EOF
> I0331 12:46:57.071932       1 logs.go:49] http: TLS handshake error from
> 192.168.160.19:48102: EOF
> I0331 12:46:57.072036       1 logs.go:49] http: TLS handshake error from
> 192.168.160.19:37178: EOF
> I0331 12:46:57.072141       1 logs.go:49] http: TLS handshake error from
> 192.168.160.17:58022: EOF
>
> E0331 12:47:37.855023       1 memcache.go:147] couldn't get resource
> list for metrics.k8s.io/v1beta1: the server is currently unable to
> handle the request
> E0331 12:47:37.856569       1 memcache.go:147] couldn't get resource
> list for servicecatalog.k8s.io/v1beta1: the server is currently unable
> to handle the request
> E0331 12:47:44.115290       1 authentication.go:62] Unable to
> authenticate the request due to an error: [x509: certificate has expired
> or is not yet valid, x509: certificate
>   has expired or is not yet valid]
> E0331 12:47:44.118976       1 authentication.go:62] Unable to
> authenticate the request due to an error: [x509: certificate has expired
> or is not yet valid, x509: certificate
>   has expired or is not yet valid]
> E0331 12:47:44.122276       1 authentication.go:62] Unable to
> authenticate the request due to an error: [x509: certificate has expired
> or is not yet valid, x509: certificate
>   has expired or is not yet valid]
>
> Huge number of this second sort.
>
> Any ideas what is wrong?
>
>
>
> _______________________________________________
> users mailing list
> users lists openshift redhat com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users



-- 
Simon Krenger
Technical Account Manager
Red Hat



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]