[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: OKD installation on CentOS 7.6



Hi Wilfried,
did you check that you could connect to these server via ssh before you run the ansible-installer?
Do you have applied any custom iptable rules to these servers (via cloud-init or similar maybe)?
Do these servers only have one NIC resp. one IP address over which you access them?
Maybe try to open port 22 explicitly via iptables on one node to test, if it's the firewall which blocks the requests. 
Try what happens, if you stop the origin-node service on a compute node (sytemctl stop origin-node). If this doesn't help try to flush all applied iptable rules, and add only port 22 for example afterwards (better backup. I think the kube-proxy will generate them, but not 100% sure). 
And please provide the output of "netstat -tupln" and "ip address show" of one node and the infra node (resp. check if the ip binding for the sshd service is your external ip).
Even if the cluster is causing this behaviour, I think the issue might be caused from a certain server config (e.g. firewall, network). I try to isolate the possible cause with these questions. 

Best Regards,
Nikolas


Am Di., 16. Apr. 2019 um 16:42 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Hello Nikolas,

 

I just test something and it seems obviously a network problem on the openshift cluster itself.

I just reboot the master to test and it seems that the server is accessible throught a little window when the TCP/IP stack is up but before the firewall / OKD start.

 

Don't know where I missed something.

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : ANUZET Wilfried
Envoyé : mardi 16 avril 2019 15:17
À : 'Nikolas Philips' <nikolas philips gmail com>
Objet : RE: OKD installation on CentOS 7.6

 

Hello Nikolas,

 

Here's the points you mentions I've to check:

·         On all servers the NM_CONTROLLED=yes is set in the network interfaces definitions.

                The service itself is running:

[root okdmst01t ~]# systemctl status NetworkManager

● NetworkManager.service - Network Manager

   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; vendor preset: enabled)

   Active: active (running) since Mon 2019-04-15 10:05:21 CEST; 1 day 5h ago

     Docs: man:NetworkManager(8)

Main PID: 10304 (NetworkManager)

   CGroup: /system.slice/NetworkManager.service

           └─10304 /usr/sbin/NetworkManager --no-daemon

 

Apr 15 10:17:40 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316260.6944] device (veth726db232): enslaved to non-master-type device ovs-system; ignoring

Apr 15 10:18:35 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316315.3185] device (veth138e5060): carrier: link connected

Apr 15 10:18:35 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316315.3188] manager: (veth138e5060): new Veth device (/org/freedesktop/NetworkManager/Devices/12)

Apr 15 10:18:35 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316315.3346] device (veth138e5060): enslaved to non-master-type device ovs-system; ignoring

Apr 15 10:18:44 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316324.3338] manager: (veth95ee3ae7): new Veth device (/org/freedesktop/NetworkManager/Devices/13)

Apr 15 10:18:44 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316324.3347] device (veth95ee3ae7): carrier: link connected

Apr 15 10:18:44 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316324.3555] device (veth95ee3ae7): enslaved to non-master-type device ovs-system; ignoring

Apr 15 10:20:39 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316439.2149] device (vethb5a95288): carrier: link connected

Apr 15 10:20:39 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316439.2155] manager: (vethb5a95288): new Veth device (/org/freedesktop/NetworkManager/Devices/14)

Apr 15 10:20:39 okdmst01t.stluc.ucl.ac.be NetworkManager[10304]: <info>  [1555316439.2515] device (vethb5a95288): enslaved to non-master-type device ovs-system; ignoring

 

·         I can reach another server in another internal subnet outside our /24 subnet defined in OKD servers (I can't go reach a server outside our internal network as SSH and ICMP out are disabled at our firewall level…)

 

·         The route are the same on the master and lb node:

LB:

[root okdlb01t ~]$ ip route show

default via 10.244.246.2 dev ens192 proto static metric 100

10.244.246.0/24 dev ens192 proto kernel scope link src 10.244.246.84 metric 100

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

 

MASTER:

[root okdmst01t ~]# ip route show

default via 10.244.246.2 dev ens192 proto static metric 100

10.128.0.0/14 dev tun0 scope link

10.244.246.0/24 dev ens192 proto kernel scope link src 10.244.246.66 metric 100

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

172.30.0.0/16 dev tun0

 

·         Here's the result of the oc commands regarding the openshift SDN pods:

[root okdmst01t ~]# oc get pods -n openshift-sdn

NAME        READY     STATUS    RESTARTS   AGE

ovs-h6vqq   1/1       Running   0          1d

ovs-prm2z   1/1       Running   0          1d

ovs-r5wll   1/1       Running   0          1d

ovs-stnc5   1/1       Running   0          1d

sdn-4g5fk   1/1       Running   0          1d

sdn-4vlpr   1/1       Running   0          1d

sdn-5775r   1/1       Running   0          1d

sdn-j87dp   1/1       Running   0          1d

 

 

Thanks for your help.

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support
Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : mardi 16 avril 2019 14:47
À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Objet : Re: OKD installation on CentOS 7.6

 

Hey Wilfried,

it looks like you got some networking issues. I think the [lb] node isn't affected because there's only a HAProxy deployment, and the node is probably not integrated in the SDN of your cluster. So I guess the ansible installer resp. the installation of the SDN messed up with your network settings. 

Can you reach hosts outside of your subnet from the master node? E.g. 1.1.1.1 or a different internal host from a different subnet? 

Is NetworkManager enabled and running on all nodes (required!)?

Are the default routes correct on all nodes (check with "ip route show", and look for the line with default. Is the gateway correct? Is it the same as the LB node has?)

When you are connected to the master node, can you execute "oc get nodes"? If yes, can you check if the SDN pods are running ("oc get pods -n openshift-sdn")? And  are the nodes ready? 

 

Best Regards,

Nikolas

 

 

Am Di., 16. Apr. 2019 um 14:21 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Hello,

 

I tried to install OKD onto brand new CentOS VM 7.6.

As I already set up a simple cluster on my cloud server to learn Openshift (1 master 1 node / CentOS 7.6 running on proxmox), I assume it will be easy as well using the openshift-ansible project.

 

Here's the server I want to deploy:

okdlb01t => OKD Load balancer / 1CPU / 2G RAM / 1NIC

okdmst01t => OKD master / 8CPU / 16G RAM / 1NIC

okdnod01t / okdnod02t => 2 OKD nodes / 4CPU / 8G RAM / 1NIC

okdinf01t => OKD infrastructure node / 4CPU / 8G RAM / 1NIC

 

All serveurs are configured to

 

All servers are configured to:

- use one of our internal /24 network

- use the coporate proxy at user space and docker level

- use Red Hat Satellite as repositories source

- use Active Directory as user authentication method

- be accessible throught SSH.

 

Here's my inventory file:

---------------------

[masters]

okdmst01t.stluc.ucl.ac.be

 

[etcd]

okdmst01t.stluc.ucl.ac.be openshift_master_cluster_hostname="okdmst01t.stluc.ucl.ac.be" openshift_schedulable=true

 

[nodes]

okdmst01t.stluc.ucl.ac.be openshift_node_group_name="node-config-master"

okdinf01t.stluc.ucl.ac.be openshift_node_group_name="node-config-infra"

okdnod0[1:2]t.stluc.ucl.ac.be openshift_node_group_name="node-config-compute"

 

[lb]

okdlb01t.stluc.ucl.ac.be

 

[OSEv3:children]

masters

nodes

etcd

lb

 

[OSEv3:vars]

openshift_deployment_type=origin

openshift_master_default_subdomain=okdt.stluc.ucl.ac.be

debug_level=2

ansible_become=true

openshift_docker_insecure_registries=172.30.0.0/16

openshift_release=3.11

openshift_install_examples=true

os_firewall_use_firewalld=true

openshift_disable_check:=docker_image_availability

---------------------

 

I use ansible tower upstream (AWX) to deploy OKD and made the following workflow:

prerequisites.yml == on-success ==> deply-cluster.yml == on-failure ==> uninstall.yml

 

Everything seems to tun well and my workflow execute correctly.

 

But I don't know why but when OKD is deployed none of the master / nodes / infra server are accessible throught ssh and none respond to ping.

I can still use the vmware console and see that every conainers are up and running.

 

I can still login to the lb and all nodes are visible from this one.

 

So I can't connect to the web console or login using oc using the following:

- in Browser (tested with latest Firefox and Chromium): https://okdmst01t.stluc.ucl.ac.be:8443/

  Connection time out

 

- CLI:

  oc login https://okdmst01t.stluc.ucl.ac.be:8443

  error: dial tcp 10.244.246.66:8443: i/o timeout - verify you have provided the correct host and port and that the server is currently running.

 

Do you have a clue that I've to check ?

Is there something I missed ?

I already read the OKD latest doc and serverworld tutorial (https://www.server-world.info/en/note?os=CentOS_7&p=openshift311&f=1) but I can't found something to help me solve this.

I don't really know what to search …

If you have a clue or something to help please share it.

 

Bests regards.

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support
Fondation Saint-Luc

 

 

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]