[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Node not joining cluster during ansible install



Thanks for the response, and sorry for the delay on my end - I've been away for a week.

I ran through the process again and got the same result. On the node it looks like the openshift services are running OK:

systemctl list-units --all | grep -i origin
  origin-node.service loaded active running OpenShift Node
But from the master the node has not joined the cluster:

oc get nodes
NAME STATUS AGE VERSION
2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com Ready,SchedulingDisabled 26m v1.6.1+5115d708d7
The install process seems to have gone OK. There were no obvious errors, though it did twice stall at a point like this:

### TASK [openshift_hosted : Ensure OpenShift router correctly rolls out (best-effort today)] ******************
But after waiting for about 5-10 mins it continued.

There were lot of 'skipping' messages during the install, but no obvious errors. The output was huge and not captured to a file, so I'd have to run it again to try to get a full log.

Any thoughts as to what is wrong?

Tim


On 04/08/2017 16:07, Tim Bielawa wrote:
(reposting: forgot to reply-all the first time)


Just based off of the number of tasks your summary says completed I am not sure your installation actually completed in full. I expect to see upwards of 1->2 thousand tasks.


A while back we changed node integration behavior such that if a node fails to provision it does not stop your entire installation. This is to ease the pain felt when provisioning large (hundred+) node clusters. 

<private node1 dns name> : ok=235  changed=56 unreachable=0    failed=0

That node did not fully install. Open a shell on that node and check the openshift services. I'm willing to bet that

systemctl list-units --all | grep -i origin

would show the node service is not running. Find the name of the node service and then examine the journal logs for that node

journalctl -x -u <node-service-name>


I think we (the openshift-ansible team) will want to add detection of failed node integrations into our error summary report in the future. Would you mind please opening an issue for this on our github page with this information?


Thanks!



On Sun, Jul 30, 2017 at 10:57 AM, Tim Dudgeon <tdudgeon ml gmail com> wrote:
I'm trying to get to grips with the advanced (Ansible) installer.
Initially I'm trying to do something very simple, fire up a cluster with one master and one node.
My inventory file looks like this:

[OSEv3:children]
masters
nodes


[OSEv3:vars]
ansible_ssh_user=root
openshift_hostname=<private master dns name>
openshift_master_cluster_hostname=<private master dns name>
openshift_master_cluster_public_hostname=<public master dns name>
openshift_disable_check=docker_storage,memory_availability
openshift_deployment_type=origin

[masters]
<private master dns name>

[etcd]
<private master dns name>


[nodes]
<private master dns name>
<private node1 dns name>


I run:
ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml
and (after a long time) it completes, without any noticeable errors:

...
PLAY RECAP *********************************************************************************************************************************************************
<private node1 dns name> : ok=235  changed=56 unreachable=0    failed=0
<private master dns name> : ok=623  changed=166 unreachable=0    failed=0
localhost                  : ok=12   changed=0    unreachable=0 failed=0

Both nodes seem to have been setup OK.
But when I look on the master node there is only the master in the cluster, no second node:

oc get nodes
NAME STATUS                     AGE
<private master dns name> Ready,SchedulingDisabled   32m

and of course like this nothing can get scheduled.

Presumably the node should be added to the cluster, so any ideas what is going wrong here?

Thanks
Tim

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users



--
Tim Bielawa, Sr. Software Engineer [ED-C137]
IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036  4958 AD05 E75E 0333 AE37


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]