Can you please provide more information? You full inventory would be very useful right now for debugging. Feel free to mask your hostnames if you wish. What I need to see to debug this further are all the parameters you're setting in the [OSEv3] section and applying to each host in [masters] and [nodes].

You will find my GPG public key fingerprint in my signature If you wish to encrypt the inventory file instead

As for those two stalls you mentioned:

"Ensure OpenShift <THING> correctly rolls out (best-effort today)"

The delays you experienced are normal and expected. Those delays are typically because the pod images were being downloaded to your hosts. However, you showed your 'oc get nodes' output and I noticed your master said "Ready,SchedulingDisabled". Because your master is labeled as 'SchedulingDisabled' then your master should *NOT* be running any pods. In which case that means it wasn't downloading pod images.

Can you please provide the following information:

* The output from `oc get all` on your master
* The output `docker images` on your node *AND* your master
* Your complete inventory file. As I said before, feel free to mask your hostnames or IPs if you prefer.

You logs would also be helpful. Ensure you run ansible-playbook with the -vv option for extra verbosity. You can do this in two ways:

1) If you run the install again you can set:

log_path = /tmp/ansible.log

in the [defaults] section of your ansible.cfg file. 

2) Alternatively you can capture the output of ansible using the `tee` command like so:

ansible-playbook -vv -i <INVENTORY> ./playbooks/byo/config.yml | tee /tmp/ansible.log

Again, if you wish to keep this information private, my GPG key is in my signature. Short ID is 0333AE37. 


On Tue, Aug 15, 2017 at 6:15 AM, Tim Dudgeon <tdudgeon ml gmail com> wrote:

Thanks for the response, and sorry for the delay on my end - I've been away for a week.

I ran through the process again and got the same result. On the node it looks like the openshift services are running OK:

systemctl list-units --all | grep -i origin
  origin-node.service loaded active running OpenShift Node
But from the master the node has not joined the cluster:

oc get nodes
2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com Ready,SchedulingDisabled 26m v1.6.1+5115d708d7
The install process seems to have gone OK. There were no obvious errors, though it did twice stall at a point like this:

### TASK [openshift_hosted : Ensure OpenShift router correctly rolls out (best-effort today)] ******************
But after waiting for about 5-10 mins it continued.

There were lot of 'skipping' messages during the install, but no obvious errors. The output was huge and not captured to a file, so I'd have to run it again to try to get a full log.

Any thoughts as to what is wrong?


On 04/08/2017 16:07, Tim Bielawa wrote:
(reposting: forgot to reply-all the first time)

Just based off of the number of tasks your summary says completed I am not sure your installation actually completed in full. I expect to see upwards of 1->2 thousand tasks.

A while back we changed node integration behavior such that if a node fails to provision it does not stop your entire installation. This is to ease the pain felt when provisioning large (hundred+) node clusters. 

<private node1 dns name> : ok=235  changed=56 unreachable=0    failed=0

That node did not fully install. Open a shell on that node and check the openshift services. I'm willing to bet that

systemctl list-units --all | grep -i origin

would show the node service is not running. Find the name of the node service and then examine the journal logs for that node

journalctl -x -u <node-service-name>

I think we (the openshift-ansible team) will want to add detection of failed node integrations into our error summary report in the future. Would you mind please opening an issue for this on our github page with this information?


On Sun, Jul 30, 2017 at 10:57 AM, Tim Dudgeon <tdudgeon ml gmail com> wrote:
I'm trying to get to grips with the advanced (Ansible) installer.
Initially I'm trying to do something very simple, fire up a cluster with one master and one node.
My inventory file looks like this:


openshift_hostname=<private master dns name>
openshift_master_cluster_hostname=<private master dns name>
openshift_master_cluster_public_hostname=<public master dns name>

<private master dns name>

<private master dns name>

<private master dns name>
<private node1 dns name>

I run:
ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml
and (after a long time) it completes, without any noticeable errors:

PLAY RECAP *********************************************************************************************************************************************************
<private node1 dns name> : ok=235  changed=56 unreachable=0    failed=0
<private master dns name> : ok=623  changed=166 unreachable=0    failed=0
localhost                  : ok=12   changed=0    unreachable=0 failed=0

Both nodes seem to have been setup OK.
But when I look on the master node there is only the master in the cluster, no second node:

oc get nodes
NAME STATUS                     AGE
<private master dns name> Ready,SchedulingDisabled   32m

and of course like this nothing can get scheduled.

Presumably the node should be added to the cluster, so any ideas what is going wrong here?


Tim Bielawa, Sr. Software Engineer [ED-C137]
IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036  4958 AD05 E75E 0333 AE37

