[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: How to recover from failed update in OpenShift 4.2.x?

On Thu, 21 Nov 2019 at 10:58, Clayton Coleman <ccoleman redhat com> wrote:

On Nov 17, 2019, at 9:34 PM, Joel Pearson <japearson agiledigital com au> wrote:

So, I'm running OpenShift 4.2 on Azure UPI following this blog article: https://blog.openshift.com/openshift-4-1-upi-environment-deployment-on-microsoft-azure-cloud/ with a few customisations on the terraform side.

One of the main differences it seems, is how the router/ingress is handled. Normal Azure uses load balancers, but UPI Azure uses a regular router (that I'm used to seeing the 3.x version) which is configured by setting the "HostNetwork" for the endpoint publishing strategy

This sounds like a bug in Azure UPI.  IPI is the reference architecture, it shouldn’t have a default divergent from the ref arch.

In the blog, he mentions that he has changed the architecture because it creates a public facing load balancer.  In my case I'm not allowed to create a public load balancer at all, additionally I can't use Azure's Public or Private DNS either, so I had to customise the terraform templates even more.

Maybe supported UPI Azure will allow internally facing load balancers?

It was all working fine in OpenShift 4.2.0 and 4.2.2, but when I upgraded to OpenShift 4.2.4, the router stopped listening on ports 80 and 443, I could see the pod running with "crictl ps", but a "netstat -tpln" didn't show anything listening.

I tried updating the version back from 4.2.4 to 4.2.2, but I accidentally used 4.1.22 image digest value, so I quickly reverted back to 4.2.4 once I saw the apiservers coming up as 4.1.22.  I then noticed that there was a 4.2.7 release on the candidate-4.2 channel, so I switched to that, and ingress started working properly again.

So my question is, what is the strategy for recovering from a failed update? Do I need to have etcd backups and then restore the cluster by restoring etcd? Ie. https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html

The upgrade page specifically says "Reverting your cluster to a previous version, or a rollback, is not supported. Only upgrading to a newer version is supported." so is it an expectation for a production cluster that you would restore from backup if the cluster isn't usable?

Backup, yes.  If you could open a bug for the documentation that would be great.

Thanks, raised it here: https://bugzilla.redhat.com/show_bug.cgi?id=1777155

Maybe the upgrade page should mention taking backups? Especially if there is no rollback option.
users mailing list
users lists openshift redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]