So the hostnames did not change and after rolling back to just the BYO configuration and removing the AWS settings, I was able to get back up and running. This means that the certificates were good as well.I lost the ability to use EBS volumes doing this, but we in the process of using EFS anyway.I suspect the issue is tied up in the fact that these node names have multiple aliases and have a different local hostname then they do in the EC2 console. However, I'm not why this manifested itself after running successfully for 4 weeks.
Either way, I'm moving on with just BYO.thanks,IsaacOn Thu, Sep 8, 2016 at 10:36 PM, Isaac Christoffersen <ichristoffersen vizuri com> wrote:No, the hostnames are the same. Because I was getting the "external Id from Cloud provider" error, I disabled the AWS configuration settings and left it as solely a BYO.This allowed me to get my nodes back up. There's definitely something with the AWS cloud provider settings and how instance names for nodes are being found.I only need the AWS config for EBS storage for Persistence Volumes, so I can't fully disable it the AWS settings.How does the external id lookup work? Can I verify the settings it expects?On Thu, Sep 8, 2016 at 9:24 PM, Jason DeTiberus <jdetiber redhat com> wrote:
On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <ichristoffersen vizuri com> wrote:
> I'm running Origin in AWS and after adding some shared EFS volumes to the node instances, the nodes seem to be unable to rejoin the cluster.
> It's a 3 Master + ETCD setup with 4 application Nodes. An 'oc get nodes' returns an empty list and of course, none of the pods will start.
> Various error messages that I see that are relevant are:
> "Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found
> "Could not find an allocated subnet for node: ip-10-0-37-217..... , Waiting..."
> ""Error updating node status, will retry: error getting node "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found"
> Any insights into how to start troubleshooting further. I'm baffled.
Did the nodes come back up with a new IP address? If so, the internal DNS name would have also changed and the node would need to be reconfigured accordingly.
Items that would need to be updated:
- node name in the node config
- node serving certificate
There is an Ansible playbook that can automate the redeployment of certificates as well (playbooks/byo/openshift-clust
users mailing list
users lists openshift redhat c