[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Issues with AWS-based setup -- wrong "public_ip" and hostnames, issues deploying docker registry

On Fri, Jan 8, 2016 at 4:51 AM, Florian Daniel Otel <florian otel gmail com> wrote:
Hello all, 

For several days I'm struggling to get a working deployment on AWS. 

My setup consists of 1 x master + 3 x etc servers + 2 nodes (similar to how it is described here ).  My "/etc/ansible/hosts":

# Create an OSEv3 group that contains the masters, nodes, and etcd groups

# Set variables common for all OSEv3 hosts

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/htpasswd'}]

# host group for masters
ip-XX-YY-ZZ-219.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.219

# host group for etcd
ip-XX-YY-ZZ-220.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.220
ip-XX-YY-ZZ-221.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.221
ip-XX-YY-ZZ-222.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.222

# host group for nodes, includes region info
ip-XX-YY-ZZ-219.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.219 openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
ip-XX-YY-ZZ-223.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.223 openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
ip-XX-YY-ZZ-224.nuage-vpcZZ.internal openshift_public_ip=XX.YY.ZZ.224 openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

(In this above,  XX.YY.ZZ.0/24 is the VPC subnet address. I have only one subnet per VPC. All hostnames are registered with the local DNS server + forwarder) 

My issues -- in increasing order of severity 

1) (FYI / can live with this): Ignoring customized hostname for instances

My AWS VPC has an internal DNS server as a local DNS server and forwarder.   If I'm trying to use customized hostnames -- i.e.  other hostnames other than the ones assigned by EC2 to the instances / as per the above file --  even after registering those in the DNS and setting the hostname in the instances , the Ansible install still tries to use the AWS hostnames -> install fails.  So I'm forced to stick with the EC2 provided hostnames (as per above file...) 

If the override variables are not set, then we use the metadata provided values, which is why you are seeing the ec2 instance names.
To override, you'll want to set openshift_hostname and openshift_public_hostname for overriding the hostnames to use internally and exposed publicly respectively.


2)  Ignoring overriding the "openshift_public_ip" 

This is definitely not expected...  can you provide the output of running the following playbook using your inventory? openshift-ansible/playbooks/byo/openshift_facts.yml 


The setup is a strictly VPC-internal deployment, with no external access needed / required.  Hence all instances have temporary public IPs (no EIPs). 

However, even after overriding the "openshift_public_ip" paramenter in the "/etc/ansible/hosts" file as per above, the temporary AWS public IPs are used (in e.g. YAML files, certificate files etc etc....) 

3) Failing to deploy Docker registry 

Using the AWS assigned hostnames (registered in the DNS server), the installation completes, but with above issue wrt "openshift_public_ip" 

Still, when trying to deploy the docker registry: 

(On the master, i.e. node with IP address "XX.YY.ZZ.219" and hostname "ip-XX-YY-ZZ-219.nuage-vpcZZ.internal"  ) 

oadm registry --service-account=registry \
    --config=/etc/origin/master/admin.kubeconfig \
    --credentials=/etc/origin/master/openshift-registry.kubeconfig \

the "docker-registry" pod fails to deploy: 

[root ip-XX-YY-ZZ-219 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Error     0          30m
[root ip-XX-YY-ZZ-219 ~]#

[root ip-XX-YY-ZZ-219 ~]# oc logs docker-registry-1-deploy
F0108 04:06:46.675922       1 deployer.go:65] couldn't get deployment default/docker-registry-1: Get https://ip-XX-YY-ZZ-219.nuage-vpcZZ.internal:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1: net/http: TLS handshake timeout

I would assume this is due to the "public IP" issue above, but not sure. 

This should not be related to the public IP being set wrong...  that error seems to indicate an error reaching the master host using the URL seen in the error message.  They deployer is running on one of the nodes, and I suspect the issue would be related to the pods being unable to contact the master over port 8443. I suspect that there may be an issue related to SDN connectivity between the master and the nodes to generate that error. 

First, I would verify that your security group config is allowing UDP 4789 traffic between the hosts (I seem to recall having to explicitly configure the security group to allow the port from itself to get it to work in the past).

TIA for prompt help -- need to get the setup up and running before Monday  (got my weekend sorted....:((( ) 

I wish I would have seen these messages earlier in the day... 


(P.S. Btw, this is with the latest version of OSE, namely "oc v3.1.0.4-16-g112fcc4  / kubernetes v1.1.0-origin-1107-g4c8e6f4 ") 

users mailing list
users lists openshift redhat com

Jason DeTiberus

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]