No, the hostnames are the same. Because I was getting the "external Id from Cloud provider" error, I disabled the AWS configuration settings and left it as solely a BYO.This allowed me to get my nodes back up. There's definitely something with the AWS cloud provider settings and how instance names for nodes are being found.I only need the AWS config for EBS storage for Persistence Volumes, so I can't fully disable it the AWS settings.How does the external id lookup work? Can I verify the settings it expects?On Thu, Sep 8, 2016 at 9:24 PM, Jason DeTiberus <jdetiber redhat com> wrote:
On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <ichristoffersen vizuri com> wrote:
> I'm running Origin in AWS and after adding some shared EFS volumes to the node instances, the nodes seem to be unable to rejoin the cluster.
> It's a 3 Master + ETCD setup with 4 application Nodes. An 'oc get nodes' returns an empty list and of course, none of the pods will start.
> Various error messages that I see that are relevant are:
> "Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
> "Could not find an allocated subnet for node: ip-10-0-37-217..... , Waiting..."
> ""Error updating node status, will retry: error getting node "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found"
> Any insights into how to start troubleshooting further. I'm baffled.
Did the nodes come back up with a new IP address? If so, the internal DNS name would have also changed and the node would need to be reconfigured accordingly.
Items that would need to be updated:
- node name in the node config
- node serving certificate
There is an Ansible playbook that can automate the redeployment of certificates as well (playbooks/byo/openshift-clust