[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Missing OpenShift Nodes - Unable to Join Cluster



On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <ichristoffersen vizuri com> wrote:
>
> I'm running Origin in AWS and after adding some shared EFS volumes to the node instances, the nodes seem to be unable to rejoin the cluster.  
>
> It's a 3 Master + ETCD setup with 4 application Nodes.  An 'oc get nodes' returns an empty list and of course, none of the pods will start.
>
>
> Various error messages that I see that are relevant are:
>
> "Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
> "Could not find an allocated subnet for node: ip-10-0-37-217..... , Waiting..."
>
> and 
>
> ""Error updating node status, will retry: error getting node "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found"
>
>
> Any insights into how to start troubleshooting further.  I'm baffled.

Did the nodes come back up with a new IP address? If so, the internal DNS name would have also changed and the node would need to be reconfigured accordingly.

Items that would need to be updated:
- node name in the node config
- node serving certificate

There is an Ansible playbook that can automate the redeployment of certificates as well (playbooks/byo/openshift-cluster/redeploy-certificates.yml).

--
Jason DeTiberus


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]