[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: clsut stopped working - certificate problems



Brian,

That's fixed it. THANK YOU.

On 31/03/2020 17:05, Brian Jarvis wrote:
Hello Tim,

Each node has a client certificate that expire after one year.
Run "oc get csr"  you should see many pending requests, possibly thousands.

To clear those run "oc get csr -o name | xargs oc adm certificate approve"

One way to prevent this in the future is to deploy/enable the auto approver statefulset with the following command.
ansible-playbook -vvv -i [inventory_file] /usr/share/ansible/openshift-ansible/playbooks/openshift-master/enable_bootstrap.yml -e openshift_master_bootstrap_auto_approve=true

On Tue, Mar 31, 2020 at 11:53 AM Tim Dudgeon <tdudgeon ml gmail com> wrote:

Maybe an uncanny coincidence but with think the cluster was created almost EXACTLY 1 year before it failed.

On 31/03/2020 16:17, Ben Holmes wrote:
Hi Tim,

Can you verify that the host's clocks are being synced correctly as per Simon's other suggestion?

Ben

On Tue, 31 Mar 2020 at 16:05, Tim Dudgeon <tdudgeon ml gmail com> wrote:
Hi Simon,

we're run those playbooks and all certs are reported as still being valid.

Tim

On 31/03/2020 15:59, Simon Krenger wrote:
> Hi Tim,
>
> Note that there are multiple sets of certificates, both external and
> internal. So it would be worth checking the certificates again using
> the Certificate Expiration Playbooks (see link below). The
> documentation also has an overview of what can be done to renew
> certain certificates:
>
> - [ Redeploying Certificates ]
>    https://docs.okd.io/3.11/install_config/redeploying_certificates.html
>
> Apart from checking all certificates, I'd certainly review the time
> synchronisation for the whole cluster, as we see the message "x509:
> certificate has expired or is not yet valid".
>
> I hope this helps.
>
> Kind regards
> Simon
>
> On Tue, Mar 31, 2020 at 4:33 PM Tim Dudgeon <tdudgeon ml gmail com> wrote:
>> One of our OKD 3.11 clusters has suddenly stopped working without any
>> obvious reason.
>>
>> The origin-node service on the nodes does not start (times out).
>> The master-api pod is running on the master.
>> The nodes can access the master-api endpoints.
>>
>> The logs of the master-api pod look mostly OK other than a huge number
>> of warnings about certificates that don't really make sense as the
>> certificates are valid (we use named certificates from let's Encryt and
>> they were renewed about 2 weeks ago and all appear to be correct.
>>
>> Examples of errors from the master-api pod are:
>>
>> I0331 12:46:57.065147       1 establishing_controller.go:73] Starting
>> EstablishingController
>> I0331 12:46:57.065561       1 logs.go:49] http: TLS handshake error from
>> 192.168.160.17:58024: EOF
>> I0331 12:46:57.071932       1 logs.go:49] http: TLS handshake error from
>> 192.168.160.19:48102: EOF
>> I0331 12:46:57.072036       1 logs.go:49] http: TLS handshake error from
>> 192.168.160.19:37178: EOF
>> I0331 12:46:57.072141       1 logs.go:49] http: TLS handshake error from
>> 192.168.160.17:58022: EOF
>>
>> E0331 12:47:37.855023       1 memcache.go:147] couldn't get resource
>> list for metrics.k8s.io/v1beta1: the server is currently unable to
>> handle the request
>> E0331 12:47:37.856569       1 memcache.go:147] couldn't get resource
>> list for servicecatalog.k8s.io/v1beta1: the server is currently unable
>> to handle the request
>> E0331 12:47:44.115290       1 authentication.go:62] Unable to
>> authenticate the request due to an error: [x509: certificate has expired
>> or is not yet valid, x509: certificate
>>    has expired or is not yet valid]
>> E0331 12:47:44.118976       1 authentication.go:62] Unable to
>> authenticate the request due to an error: [x509: certificate has expired
>> or is not yet valid, x509: certificate
>>    has expired or is not yet valid]
>> E0331 12:47:44.122276       1 authentication.go:62] Unable to
>> authenticate the request due to an error: [x509: certificate has expired
>> or is not yet valid, x509: certificate
>>    has expired or is not yet valid]
>>
>> Huge number of this second sort.
>>
>> Any ideas what is wrong?
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users lists openshift redhat com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users



--

BENJAMIN HOLMES

SENIOR Solution ARCHITECT

Red Hat UKI Presales

bholmes redhat com    M: 07876-885388    

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


--


Brian Jarvis, RHCE

Technical Account Manager

Red Hat North America
Partnering with you to help achieve your business goals

bjarvis redhat com    

T: 631-685-7519   M: 610-587-1736    



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]