[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Master/ETCD Migration





On Tue, Dec 13, 2016 at 1:49 PM, Diego Castro <diego castro getupcloud com> wrote:
2016-12-13 15:24 GMT-03:00 Jason DeTiberus <jdetiber redhat com>:


On Tue, Dec 13, 2016 at 12:37 PM, Diego Castro <diego castro getupcloud com> wrote:
Thanks John, it's very helpful.
Looking over the playbook code, it's seems to replace all certificates and trigger node evacuation to update all pods CA, i definitely don't want that!

It should only do that when openshift_certificates_redeploy_ca is set to True, otherwise it should just redeploy certificates on the masters. 
Perfect! 

There is also a PR for splitting out the certificate redeploy playbooks to allow for more flexibility when running: https://github.com/openshift/openshift-ansible/pull/2671
 
- ETCD wont be a problem since i can replace the certs, migrate the datadir and restart masters.

We don't currently support automated resizing or migration of etcd currently, but this approach should work just fine.

That said, one *could* do the following:
- Add the new etcd hosts to the inventory
- Run Ansible against the hosts (I suspect it will fail on service startup)
- Add the newly provisioned etcd hosts manually to the cluster using etcdctl
- if Ansible failed on the previous step, re-run Ansible again to finish landing the etcd config change
- Remove the old etcd hosts from the etcd cluster using etcdctl
- Update the inventory to remove the old etcd hosts
- Run Ansible to remove the old etcd hosts from the master configs

I'll do it! 

- Masters is a big issue, since i had to change public cluster hostname. 

Indeed, but there shouldn't be a huge disruption of doing a rolling update of the master services to land the new certificate. The controllers service will migrate (possibly multiple times), but that should be mostly transparent to running apps and users.

What you mean by 'rolling update', is the same process of nodes 'which i do by running scaleup playbook'?

For masters, this might work:
- If you are using a named certificates:
  - update inventory:
    - update openshift_master_named_certificates to add the cert for the new cluster name(s)
    - add the additional master hosts to the inventory without updating the cluster hostname(s)
  - Run Ansible to land the new named_certificate on the existing hosts and install/configure the new hosts

At this point, the cluster should be up and functional with all masters and should respond and serve the api/console using the new cluster hostname, but nodes will still be configured to use the old cluster hostname

The certificate redeploy PR covers how to update the node kubeconfigs to point to the new master host, which would need to be done on each host (along with a node reboot), before the old cluster hostname/load balancer is removed.


One other thing to keep in mind, is that you will want to migrate /etc/etcd/generated_certs and /etc/origin/generated_configs to the new "first etcd" and "first master" respectively after removing the old hosts.
 

Once i get the new nodes up and running, can i just shutdown the old servers and update the inventory? Just wondering if something goes wrong replacing masters[0]. 
 
 
 


---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade

2016-12-13 11:17 GMT-03:00 Skarbek, John <John Skarbek ca com>:

Diego,

We’ve done a similar thing in our environment. I’m not sure if the openshift-ansible guys have a better way, but this is what we did at that time.

We created a custom playbook to run through all the steps as necessary. And due to the version of openshift-ansible we were running, we had to be careful when we did whichever server was index 0 in the array of hosts. (I think they resolved that problem now)

First we created a play that copied the necessary certificates too all the nodes, such that it didn’t matter which node was in index 0 of the list of nodes. So we had the playbook limited to operate one one node at a time which dealt with tearing it down. Then we’d run the deploy on the entire cluster. For the new node, everything was installed as necessary. For the rest of the cluster it was mostly a no-op. We use static addresses, so the only thing that really changed was the underlying host. Certificate regeneration was limited.

For the master nodes, this was pretty easy. For the etcd nodes, we had to do a bit of extra work as the nodes being added to the cluster, had different member id’s that what the cluster thought that node ought to have. Following etcd’s docs on Member Migration should be able to help you out here.

The only major part we had to be careful of, was doing the work on the node that was going to be the first node. Due to the way the playbooks operated, it put a lot of config and certificate details that would get copied around. If they’ve addressed this, it shouldn’t be an issue, but at the time, we got around this by simply adjusting the order of which nodes defined in our inventory file.

A wee bit laborious, but definitely doable.

In our case, we didn’t experience any downtime, the master nodes cycled through the haproxy box appropriately, and the etcd nodes were removed and added to the cluster without any major headaches.

Though I’m now more curious if the team at redhat working on openshift-ansible may have addressed any of these sorts of issues to make it easier.



-- 
John Skarbek

On December 13, 2016 at 08:35:54, Diego Castro (diego castro getupcloud com) wrote:

Hello, i have to migrate my production HA masters/etcd servers to new boxes.

Steps Intended:

1) Create a new masters and etcd machines using byo/config playbook.
2) Stop the old masters and move etcd data directory to new etcd servers
3) Start the new masters
4) Run byo/openshift-cluster/redeploy-certificates.yml against the cluster to updage CA and node configuration.

Question:
- Is it the best or the right way to do since this is a production cluster and i want minimal downtime?


---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
_______________________________________________
users mailing list
users lists openshift redhat com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openshift.redhat.com_openshiftmm_listinfo_users&d=DgICAg&c=_hRq4mqlUmqpqlyQ5hkoDXIVh6I6pxfkkNxQuL0p-Z0&r=8IlWeJZqFtf8Tvx1PDV9NsLfM_M0oNfzEXXNp-tpx74&m=SXZbgql2jEdZcxZf-F7G1PY7KWstOe44c8cHN7wPNKM&s=hljug4_Dzfra1fGcjSvwVO2n6CAsCQpr5yyPBcbOc-Y&e=



_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users




--
Jason DeTiberus




--
Jason DeTiberus

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]