[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: OKD 3.9 to 3.10 upgrade failure on CentOS



Hi Dan,

In-line responses below.

On 30/11, Dan Pungă wrote:
> Hi Dharmit,
> 
> What you're experiencing looks a lot like a problem I had with the upgrade.
> I ended up doing a fresh install.
> 
> I've tried fiddling around with the ansible config and as I was trying to
> get my head about what was happening I discovered an issue about node names.
> With this reply from Michael Gugino that shed some light on the matter: https://github.com/openshift/openshift-ansible/issues/9935#issuecomment-423268110
> 
> Basically my problem was that the upgrade playbook of OKD 3.10 expected that
> the node names from the previously isntalled version be the short name
> versions and not the FQDN.

My understanding is that with 3.10 you are required to have proper DNS
setup in the cluster. Inventory file needs to have the FQDN of the
systems in cluster and not their IP addresses.

> I guess I was precisely in your position and I really didn't know what else
> to try except doing a fresh install. I have no idea if there is a way of
> changing node names of a running cluster. Maybe someone who knows more about
> the internals could be of help in this respect...

I'm not sure how to change the node names either. But I *think* it could
be done by removing a node from the cluster and then adding it back
through scale-up playbook. There's documentation to do this. It's easier
said than done but if you're careful, this is not entirely impossible.

> Since I see your installation is also a fresh one, maybe it would worth
> uninstalling 3.9 and installing the 3.10. Or maybe have a try at the newest
> 3.11.

This is my test environment where I can play however I wish to.
Unfortunately, I can't do the same with production where we are supposed
to upgrade as well. :(

I managed to fix the issue in my test environment and am going to
upgrade production cluster soon.

Since 3.9 to 3.10 upgrade wasn't working, we planned to uninstall OKD by
executing uninstall playbook
(/usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml). We
planned to re-use the Jenkins PV after successful 3.11 deployment. While
doing 3.11 setup I faced this issue [1].

I decided to completely remove configuration for "kubeletArguments" from
the hosts file. Configuring the kubelet arguments could be done by
setting "openshift_node_kubelet_args" in 3.9. With 3.10, it's deprecated
and has to be specified in "openshift_node_groups". I'm guessing I was
doing something wrong there. Or maybe it's an issue with OKD
documentation mentioning arguments for container garbage collection [2]
that are not available in upstream kubelet documentation [3]. I have no
clue!

But after removing the kubeletArguments from "openshift_node_groups",
3.9 to 3.10 upgrade using the playbook went just fine!

Hope that helps. :)

Regards,
Dharmit

[1] https://github.com/openshift/openshift-ansible/issues/10774
[2] https://docs.okd.io/3.10/admin_guide/garbage_collection.html#container-garbage-collection
[3] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

> 
> Hope it helps,
> 
> Dan
> 
> On 20.11.2018 04:38, Dharmit Shah wrote:
> > Hi,
> > 
> > I'm trying to upgrade my OKD 3.9 cluster to 3.10 using
> > openshift-ansible. I have already described the problem in detail and
> > provided logs on the GitHub issue [1].
> > 
> > I could really use some help on this issue!
> > 
> > Regards,
> > Dharmit
> > 
> > [1] https://github.com/openshift/openshift-ansible/issues/10690
> > 

-- 
Dharmit Shah
Red Hat Developer Tools (https://developers.redhat.com/)
irc, mattermost: dharmit
https://dharmitshah.com


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]