[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

How to recover from failed update in OpenShift 4.2.x?

So, I'm running OpenShift 4.2 on Azure UPI following this blog article: https://blog.openshift.com/openshift-4-1-upi-environment-deployment-on-microsoft-azure-cloud/ with a few customisations on the terraform side.

One of the main differences it seems, is how the router/ingress is handled. Normal Azure uses load balancers, but UPI Azure uses a regular router (that I'm used to seeing the 3.x version) which is configured by setting the "HostNetwork" for the endpoint publishing strategy

It was all working fine in OpenShift 4.2.0 and 4.2.2, but when I upgraded to OpenShift 4.2.4, the router stopped listening on ports 80 and 443, I could see the pod running with "crictl ps", but a "netstat -tpln" didn't show anything listening.

I tried updating the version back from 4.2.4 to 4.2.2, but I accidentally used 4.1.22 image digest value, so I quickly reverted back to 4.2.4 once I saw the apiservers coming up as 4.1.22.  I then noticed that there was a 4.2.7 release on the candidate-4.2 channel, so I switched to that, and ingress started working properly again.

So my question is, what is the strategy for recovering from a failed update? Do I need to have etcd backups and then restore the cluster by restoring etcd? Ie. https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html

The upgrade page specifically says "Reverting your cluster to a previous version, or a rollback, is not supported. Only upgrading to a newer version is supported." so is it an expectation for a production cluster that you would restore from backup if the cluster isn't usable?

Maybe the upgrade page should mention taking backups? Especially if there is no rollback option.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]