[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: OKD installation on CentOS 7.6



Hi Nikolas,

 

I continue to check and trying to understand where my problem is and to keep you up to date with my troobleshoots.

Reading this troobleshooting guide: https://docs.openshift.com/container-platform/3.7/admin_guide/sdn_troubleshooting.html

I was able to dig further in opneshift configuration.

 

I checked my router and it's indeed my infrastruture node:

[root okdmst01t ~]# oc get pods --all-namespaces --selector=router --template='{{range .items}}HostIP: {{.status.hostIP}}   PodIP: {{.status.podIP}}{{end}}{{"\n"}}'

HostIP: 10.244.246.67   PodIP: 10.244.246.67

 

On the master there's an openshift service that listen with the 8443 ports on every interface:

[root okdmst01t ~]# netstat -tunlp|grep 8443

tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN      24759/openshift    

 

I also check the route defined on the infrastructure node and tried it on a browser

[root okdmst01t ~]# oc get route --all-namespaces

NAMESPACE                          NAME                HOST/PORT                                                        PATH      SERVICES            PORT      TERMINATION          WILDCARD

openshift-console                  console             console.okdt.stluc.ucl.ac.be                                               console             https     reencrypt/Redirect   None

[…]

 

https://console.okdt.stluc.ucl.ac.be:8443 redirect to https://okdmst01t.stluc.ucl.ac.be:8443/console/ but I still I get a timeout.

 

Checking the iptables confiuration on the master I found this in the NAT section, so the incoming traffic from all interface shoud redirect to the pods / container that host the okd web console:

[root okdmst01t ~]# iptables -L -t nat|grep 8443

DNAT       tcp  --  anywhere             anywhere             /* openshift-template-service-broker/apiserver: */ tcp to:10.128.0.23:8443

DNAT       tcp  --  anywhere             anywhere             /* default/kubernetes:https */ tcp to:10.244.246.66:8443

DNAT       tcp  --  anywhere             anywhere             /* openshift-web-console/webconsole:https */ tcp to:10.128.0.19:8443

DNAT       tcp  --  anywhere             anywhere             /* openshift-console/console:https */ tcp to:10.128.0.20:8443

 

Keep searching…

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : ANUZET Wilfried
Envoyé : jeudi 18 avril 2019 13:11
À : ANUZET Wilfried <wilfried anuzet uclouvain be>; Nikolas Philips <nikolas philips gmail com>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : RE: OKD installation on CentOS 7.6

 

Nikolas,

 

Here's the log for the master:

 

[root okdmst01t ~]# oc logs sdn-pgbv6 -n openshift-sdn

2019/04/18 10:27:16 socat[31800] E connect(5, AF=1 "/var/run/openshift-sdn/cni-server.sock", 40): No such file or directory

User "sa" set.

Context "default/okdmst01t-stluc-ucl-ac-be:8443/system:admin" modified.

which: no openshift-sdn in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)

I0418 10:27:17.451014   31757 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml

I0418 10:27:17.463248   31757 start_network.go:200] Starting node networking okdmst01t.stluc.ucl.ac.be (v3.11.0+5a84bad-168)

W0418 10:27:17.463579   31757 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.

I0418 10:27:17.463660   31757 feature_gate.go:230] feature gates: &{map[]}

I0418 10:27:17.465941   31757 transport.go:160] Refreshing client certificate from store

I0418 10:27:17.465990   31757 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".

I0418 10:27:17.485515   31757 node.go:147] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "okdmst01t.stluc.ucl.ac.be" (IP ""), iptables sync period "30s"

I0418 10:27:17.486771   31757 node.go:289] Starting openshift-sdn network plugin

I0418 10:27:17.578455   31757 sdn_controller.go:139] [SDN setup] full SDN setup required (Link not found)

I0418 10:27:17.738041   31757 node.go:348] Starting openshift-sdn pod manager

E0418 10:27:17.741166   31757 cniserver.go:148] failed to remove old pod info socket: remove /var/run/openshift-sdn: device or resource busy

E0418 10:27:17.741240   31757 cniserver.go:151] failed to remove contents of socket directory: remove /var/run/openshift-sdn: device or resource busy

I0418 10:27:17.745593   31757 node.go:392] openshift-sdn network plugin registering startup

I0418 10:27:17.745749   31757 node.go:410] openshift-sdn network plugin ready

I0418 10:27:17.749460   31757 network.go:95] Using iptables Proxier.

W0418 10:27:17.751033   31757 proxier.go:298] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended

I0418 10:27:17.751237   31757 network.go:131] Tearing down userspace rules.

I0418 10:27:17.766170   31757 proxier.go:189] Setting proxy IP to 10.244.246.66 and initializing iptables

I0418 10:27:17.802110   31757 config.go:202] Starting service config controller

I0418 10:27:17.802127   31757 proxy.go:82] Starting multitenant SDN proxy endpoint filter

I0418 10:27:17.802138   31757 controller_utils.go:1025] Waiting for caches to sync for service config controller

I0418 10:27:17.808249   31757 config.go:102] Starting endpoints config controller

I0418 10:27:17.808291   31757 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller

I0418 10:27:17.808638   31757 network.go:239] Started Kubernetes Proxy on 0.0.0.0

I0418 10:27:17.808917   31757 network.go:53] Starting DNS on 127.0.0.1:53

I0418 10:27:17.809809   31757 server.go:76] Monitoring dnsmasq to point cluster queries to 127.0.0.1

I0418 10:27:17.809869   31757 logs.go:49] skydns: ready for queries on cluster.local. for tcp://127.0.0.1:53 [rcache 0]

I0418 10:27:17.809879   31757 logs.go:49] skydns: ready for queries on cluster.local. for udp://127.0.0.1:53 [rcache 0]

I0418 10:27:17.816840   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:dns to [10.244.246.66:8053]

I0418 10:27:17.816962   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:https to [10.244.246.66:8443]

I0418 10:27:17.816981   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for default/kubernetes:dns-tcp to [10.244.246.66:8053]

I0418 10:27:17.902461   31757 controller_utils.go:1032] Caches are synced for service config controller

I0418 10:27:17.902689   31757 proxier.go:635] Not syncing iptables until Services and Endpoints have been received from master

I0418 10:27:17.908568   31757 controller_utils.go:1032] Caches are synced for endpoints config controller

I0418 10:27:17.908745   31757 service.go:319] Adding new service port "default/kubernetes:dns-tcp" at 172.30.0.1:53/TCP

I0418 10:27:17.908787   31757 service.go:319] Adding new service port "default/kubernetes:https" at 172.30.0.1:443/TCP

I0418 10:27:17.908817   31757 service.go:319] Adding new service port "default/kubernetes:dns" at 172.30.0.1:53/UDP

I0418 10:27:17.908869   31757 proxier.go:649] Stale udp service default/kubernetes:dns -> 172.30.0.1

I0418 10:28:49.384762   31757 service.go:319] Adding new service port "default/router:80-tcp" at 172.30.214.100:80/TCP

I0418 10:28:49.384825   31757 service.go:319] Adding new service port "default/router:443-tcp" at 172.30.214.100:443/TCP

I0418 10:28:49.384854   31757 service.go:319] Adding new service port "default/router:1936-tcp" at 172.30.214.100:1936/TCP

I0418 10:28:56.207799   31757 service.go:319] Adding new service port "default/docker-registry:5000-tcp" at 172.30.82.81:5000/TCP

I0418 10:29:10.096055   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for default/router:1936-tcp to [10.244.246.67:1936]

I0418 10:29:10.096164   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for default/router:80-tcp to [10.244.246.67:80]

I0418 10:29:10.096192   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for default/router:443-tcp to [10.244.246.67:443]

I0418 10:29:14.491670   31757 service.go:319] Adding new service port "default/registry-console:registry-console" at 172.30.117.141:9000/TCP

I0418 10:29:29.257127   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for default/registry-console:registry-console to [10.128.0.18:9090]

I0418 10:29:33.972756   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for default/docker-registry:5000-tcp to [10.129.0.17:5000]

I0418 10:29:46.691272   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/prometheus-operator:http to [10.129.0.19:8080]

I0418 10:30:05.848458   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for openshift-monitoring/cluster-monitoring-operator:http to [10.129.0.18:8080]

I0418 10:30:07.466454   31757 service.go:319] Adding new service port "openshift-web-console/webconsole:https" at 172.30.216.23:443/TCP

I0418 10:30:16.639997   31757 service.go:319] Adding new service port "openshift-monitoring/grafana:https" at 172.30.38.255:3000/TCP

I0418 10:30:28.016768   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-web-console/webconsole:https to [10.128.0.19:8443]

I0418 10:30:34.233535   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/grafana:https to [10.129.0.20:3000]

I0418 10:30:37.856715   31757 service.go:319] Adding new service port "openshift-console/console:https" at 172.30.33.85:443/TCP

I0418 10:30:42.138983   31757 service.go:319] Adding new service port "openshift-monitoring/prometheus-k8s:web" at 172.30.131.38:9091/TCP

I0418 10:30:42.337726   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for kube-system/kube-controllers:http-metrics to [10.244.246.66:8444]

I0418 10:30:58.111265   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/prometheus-k8s:web to [10.129.0.21:9091]

I0418 10:30:58.112870   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/prometheus-operated:web to [10.129.0.21:9091]

I0418 10:31:04.180763   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-console/console:https to [10.128.0.20:8443]

I0418 10:31:10.161250   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/prometheus-k8s:web to [10.129.0.21:9091 10.129.0.22:9091]

I0418 10:31:10.161278   31757 roundrobin.go:240] Delete endpoint 10.129.0.22:9091 for service "openshift-monitoring/prometheus-k8s:web"

I0418 10:31:10.161310   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/prometheus-operated:web to [10.129.0.21:9091 10.129.0.22:9091]

I0418 10:31:10.161319   31757 roundrobin.go:240] Delete endpoint 10.129.0.22:9091 for service "openshift-monitoring/prometheus-operated:web"

I0418 10:31:15.442480   31757 service.go:319] Adding new service port "openshift-monitoring/alertmanager-main:web" at 172.30.62.59:9094/TCP

I0418 10:31:24.934229   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:web to [10.129.0.23:9093]

I0418 10:31:24.934258   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:mesh to [10.129.0.23:6783]

I0418 10:31:24.938493   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-main:web to [10.129.0.23:9094]

I0418 10:31:32.325915   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:mesh to [10.129.0.23:6783 10.129.0.24:6783]

I0418 10:31:32.325954   31757 roundrobin.go:240] Delete endpoint 10.129.0.24:6783 for service "openshift-monitoring/alertmanager-operated:mesh"

I0418 10:31:32.325975   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:web to [10.129.0.23:9093 10.129.0.24:9093]

I0418 10:31:32.325989   31757 roundrobin.go:240] Delete endpoint 10.129.0.24:9093 for service "openshift-monitoring/alertmanager-operated:web"

I0418 10:31:32.326045   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-main:web to [10.129.0.23:9094 10.129.0.24:9094]

I0418 10:31:32.326059   31757 roundrobin.go:240] Delete endpoint 10.129.0.24:9094 for service "openshift-monitoring/alertmanager-main:web"

I0418 10:31:40.159223   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-main:web to [10.129.0.23:9094 10.129.0.24:9094 10.129.0.25:9094]

I0418 10:31:40.159262   31757 roundrobin.go:240] Delete endpoint 10.129.0.25:9094 for service "openshift-monitoring/alertmanager-main:web"

I0418 10:31:40.182527   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:web to [10.129.0.23:9093 10.129.0.24:9093 10.129.0.25:9093]

I0418 10:31:40.182559   31757 roundrobin.go:240] Delete endpoint 10.129.0.25:9093 for service "openshift-monitoring/alertmanager-operated:web"

I0418 10:31:40.182580   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/alertmanager-operated:mesh to [10.129.0.23:6783 10.129.0.24:6783 10.129.0.25:6783]

I0418 10:31:40.182594   31757 roundrobin.go:240] Delete endpoint 10.129.0.25:6783 for service "openshift-monitoring/alertmanager-operated:mesh"

I0418 10:31:45.625750   31757 service.go:319] Adding new service port "kube-service-catalog/apiserver:secure" at 172.30.68.143:443/TCP

I0418 10:31:51.274700   31757 service.go:319] Adding new service port "kube-service-catalog/controller-manager:secure" at 172.30.27.87:443/TCP

I0418 10:31:52.609451   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/node-exporter:https to [10.244.246.67:9100]

I0418 10:31:59.528367   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/node-exporter:https to [10.244.246.67:9100 10.244.246.69:9100]

I0418 10:31:59.528435   31757 roundrobin.go:240] Delete endpoint 10.244.246.69:9100 for service "openshift-monitoring/node-exporter:https"

I0418 10:32:04.446974   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-monitoring/node-exporter:https to [10.244.246.66:9100 10.244.246.67:9100 10.244.246.69:9100]

I0418 10:32:04.447040   31757 roundrobin.go:240] Delete endpoint 10.244.246.66:9100 for service "openshift-monitoring/node-exporter:https"

I0418 10:32:24.590287   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for kube-service-catalog/apiserver:secure to [10.128.0.21:6443]

I0418 10:32:55.395109   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for kube-system/kubelet:http-metrics to [10.244.246.66:10255 10.244.246.67:10255 10.244.246.68:10255 10.244.246.69:10255]

I0418 10:32:55.395143   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for kube-system/kubelet:cadvisor to [10.244.246.66:4194 10.244.246.67:4194 10.244.246.68:4194 10.244.246.69:4194]

I0418 10:32:55.395159   31757 roundrobin.go:276] LoadBalancerRR: Setting endpoints for kube-system/kubelet:https-metrics to [10.244.246.66:10250 10.244.246.67:10250 10.244.246.68:10250 10.244.246.69:10250]

I0418 10:33:17.051657   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for kube-service-catalog/controller-manager:secure to [10.128.0.22:6443]

I0418 10:33:41.819842   31757 service.go:319] Adding new service port "openshift-ansible-service-broker/asb:port-1338" at 172.30.178.1:1338/TCP

I0418 10:33:41.819910   31757 service.go:319] Adding new service port "openshift-ansible-service-broker/asb:port-1337" at 172.30.178.1:1337/TCP

I0418 10:33:54.764975   31757 service.go:319] Adding new service port "openshift-template-service-broker/apiserver:" at 172.30.122.2:443/TCP

I0418 10:34:30.064016   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-ansible-service-broker/asb:port-1338 to [10.129.0.27:1338]

I0418 10:34:30.064135   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-ansible-service-broker/asb:port-1337 to [10.129.0.27:1337]

I0418 10:34:33.608916   31757 roundrobin.go:310] LoadBalancerRR: Setting endpoints for openshift-template-service-broker/apiserver: to [10.128.0.23:8443]

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : users-bounces lists openshift redhat com <users-bounces lists openshift redhat com> De la part de ANUZET Wilfried
Envoyé : jeudi 18 avril 2019 12:53
À : Nikolas Philips <nikolas philips gmail com>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : RE: OKD installation on CentOS 7.6

 

Nikolas,

 

The redirection to a short name was a misconfiguration from my side ;)

I let my /etc/hosts file on the master with short name first, like:

10.244.246.66   okdmst01t okdmst01t.stluc.ucl.ac.be

Removed the short name and the redirection is now fine.

 

I checked all snd pods and find something interesting :

2019/04/18 10:27:52 socat[1836] E connect(5, AF=1 "/var/run/openshift-sdn/cni-server.sock", 40): No such file or directory

warning: Cannot find existing node-config.yaml, waiting 15s ...

User "sa" set.

Context "default-context" modified.

which: no openshift-sdn in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)

I0418 10:28:08.264526    1788 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml

I0418 10:28:08.269879    1788 start_network.go:200] Starting node networking okdnod01t.stluc.ucl.ac.be (v3.11.0+5a84bad-168)

W0418 10:28:08.270019    1788 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.

I0418 10:28:08.270054    1788 feature_gate.go:230] feature gates: &{map[]}

I0418 10:28:08.271366    1788 transport.go:160] Refreshing client certificate from store

I0418 10:28:08.271394    1788 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".

I0418 10:28:08.286495    1788 node.go:147] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "okdnod01t.stluc.ucl.ac.be" (IP ""), iptables sync period "30s"

I0418 10:28:08.287436    1788 node.go:289] Starting openshift-sdn network plugin

I0418 10:28:08.347033    1788 sdn_controller.go:139] [SDN setup] full SDN setup required (local subnet gateway CIDR not found)

 

It seams you're right about the SDN not using the gateway define at host level ...

 

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : jeudi 18 avril 2019 11:20
À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : Re: OKD installation on CentOS 7.6

 

Hi Wilfried,

This seams like a routing issue then. The cluster seems kinda not to use the defined default gateway. Maybe there are more infos in one of the SDN pods? On a master node

oc get pods -n openshift-sdn

 

Chose one pod and run

oc logs <pod> -n openshift-sdn

 

Regarding the url redirect. I need to look this up, but I had a similar issue. Check if all the hostnames are set correctly (always FQDN), and look in the /etc/origin/master/master-config.yaml file for the domain name of the api

 

Regards,

Nikolas

 

 

ANUZET Wilfried <wilfried anuzet uclouvain be> schrieb am Do. 18. Apr. 2019 um 11:09:

Hi Nikolas,

 

Today I'll retry the installation from scratch, I made a snapshot of all my VMs just after the installations of all prerequisites and before apply all playbooks ;)

 

Two things:

- First:

When I set the openshift_http(s)_proxy variable in my inventory I lost my connection to the dockerhub registry -_-'

I know that here our proxy is a complete mess (a long story…) so I disabled theses variables and re-install and going back to the old situation

The servers are accessible from the subnet they're on but not from another subnet, I tried to access the console from a windows servers on the same subnet and it was ok. Except the redirection from: https://okdmst01.stluc.ucl.ac.be:8443 redirect to a short name like https://okdmst01t:8443/console but I still can access and log in the okd console.

 

- Second:

I monitored a basic ping to the master server from outside its subnet and found the moment when I lost connection:

thu avr 18 10:16:02 CEST 2019: 64 bytes from okdmst01t.stluc.ucl.ac.be (10.244.246.66): icmp_seq=1044 ttl=63 time=0.684 ms

 

At the same moment on journactl it seems that there is somes operations with networkmanager:

Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27063]: pam_unix(sudo:session): session closed for user root

Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27149]:   aw1538 : TTY=unknown ; PWD=/home/aw1538 ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-jqpitossiewjikeadherhxalutqivypc; /usr/bin/python

Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27149]: pam_unix(sudo:session): session opened for user root by (uid=0)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device ovs-system entered promiscuous mode

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.0222] manager: (ovs-system): new Generic device (/org/freedesktop/NetworkManager/Devices/4)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be python[27156]: ansible-stat Invoked with checksum_algorithm=sha1 get_checksum=True follow=True checksum_algo=sha1 path=/usr/share/openshift/examples/ get_md5=None get_mime=True get_attributes=True

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27149]: pam_unix(sudo:session): session closed for user root

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device br0 entered promiscuous mode

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.0530] manager: (br0): new Generic device (/org/freedesktop/NetworkManager/Devices/5)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device vxlan_sys_4789 entered promiscuous mode

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.0931] manager: (vxlan_sys_4789): new Vxlan device (/org/freedesktop/NetworkManager/Devices/6)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.0933] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.1002] device (vxlan_sys_4789): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.1027] device (vxlan_sys_4789): enslaved to non-master-type device ovs-system; ignoring

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.1040] device (vxlan_sys_4789): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device tun0 entered promiscuous mode

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.1180] manager: (tun0): new Generic device (/org/freedesktop/NetworkManager/Devices/7)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info>  [1555575363.1326] device (tun0): carrier: link connected

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]:   aw1538 : TTY=unknown ; PWD=/home/aw1538 ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-oqtjywlqnddthrxxoriwvlubusutuskn; /usr/bin/python

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]: pam_unix(sudo:session): session opened for user root by (uid=0)

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: ctnetlink v0.93: registering with nfnetlink.

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be python[27350]: ansible-unarchive Invoked with directory_mode=None force=None remote_src=False exclude=[] owner=None follow=False group=None unsafe_writes=None keep_newer=False setype=None content=NOT_LOGGING_PARAMETER serole=None extra_opts=[] dest=/usr/share/openshift/examples/ selevel=None regexp=None src="" validate_certs=True list_files=False seuser=None creates=None delimiter=None mode=None attributes=None backup=None

Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]: pam_unix(sudo:session): session closed for user root

Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: setting upstream servers from DBus

Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using nameserver 10.97.200.151#53

Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using nameserver 10.244.244.151#53

Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using nameserver 127.0.0.1#53 for domain in-addr.arpa

Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using nameserver 127.0.0.1#53 for domain cluster.local

 

At first sight it seems related with openshift SDN ?

It happen a few seconds after the task [ openshift_sdn: Apply the config ]

 

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : mercredi 17 avril 2019 12:00


À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : Re: OKD installation on CentOS 7.6

 

Hi Wilfried,

maybe you should define the proxy used by the system also in the inventory file:

I don't think this causes the issue, but you should define them there anyway.

Keep me up to date when you get some news :)

 

Regards,
Nikolas

 

Am Mi., 17. Apr. 2019 um 11:46 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Hi Nikolas,

 

When I restart origin-node.service I indeed lost the connection.

I will check and monitor the firewall and services as you recommanded and came back.

 

I already use the ansible installer playbooks from github, branch release-3.11.

 

The main difference from a vanilla CentOS for theses severs are:

- Add Red Hat Satellite subscription to use our internal Satellite as CentOS masters repositories

- Installation of somes packages (mostly debugging  tools like net-utils, iotop… And tools to integrate Active Diretory oodjob, adcli  ...)

- The proxy and proxy credentials were configured at profile level

- NTP use our internal NTP servers

- Security rules are applied to be compliant with SCAP content "Standard System Security Profile for Red Hat Enterprise Linux 7", it's mostly auditing rules and few tules to disable root login via ssh and prevent empty password login

 

Thanks

 

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : mercredi 17 avril 2019 11:18
À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : Re: OKD installation on CentOS 7.6

 

Hi Wilfred,

just as some input: When you can access your node while origin-node isn't running/disable, what happens when you start docker and origin-node? The access should go down I guess. That way you should be able to track down the process causing the issue. For example set up an external port check on 22, and log ps -ef / netstat -tupln / docker ps / journalctl / iptables -L frequently to get the time/process when the node gets unavailable. 

OKD does not limit the network access based on subnet or smiliar. So this behaviour is an unwanted side effect caused by the environment (network, sysconfig, ext. firewall etc.). What is different from a vanilla CentOS installation? Are there any routines while starting the node? Maybe it's an issue of services going up in the wrong order. I wouldn't study the ansible installer itself, as it seems to be working correctly. Try to find the exact moment/process, when the access gets denied. 

And are you using the openshift-installer from github or from the CentOS repository? The RPMs from the CentOS repo are not that well updated, so maybe try the openshift-installer from github (branch release-3.11) and use the playbooks from there. Sometimes there are relevant bug fixes included.

 

Regards,
Nikolas

 

Am Mi., 17. Apr. 2019 um 10:44 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Hi Nikolas,

 

I just ask the netwok team here to see with them if there's something that block OKD at network level and it seems not.

 

And since you can access the servers only from certain hosts after the installation really looks like an external component breaks somethings.

Because you have a short window to access the server from your client, I'm pretty sure it's not a local firewalld issue, as network/firewall go up together. So a different service is causing the issue. I would try to identify this processes, until it's clear what component issues that behaviour. 

To me as well the issue seems related to an openshift component as the server is inaccessible when OKD start. I'll try to identify which one …

 

I asked you once for the wrong nodes. Is dnsmasq running on the LB node?

I just checked and DNSMasq is not running on the LB.

 

You could maybe verify that with stopping services origin-node and docker and try to get rid of all openshift specific processes (also dnsmasq), so only basic services are running (or disabling and reboot). 

Stop origin-node.service and docker.service units and nothing changed.

disable origin-node.service and docker.service and reboot and the node server is accessible from outside it's subnet.

The issue seems clearly related to OKD ;)

 

On the LB using a cli browser (lynx) I can access to the master URL (https://okdmst01t.stluc.ucl.ac.be:8443 which redirect correctly to https://okdmst01t:8443/console/ = but obviously there a mention to activate _javascript_ on the login page ).

I just saw that I forgot to put the okd master / node IP in the /etc/hosts of the LB.

I just add them but it change nothing.

 

I'm also out of idea but I will check every OKD pods and better read the openshift installer (but as it's well wrtitten it's also insanely imbricated with a lot of import_playbook, import_tasks …)

 

:'(

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : mercredi 17 avril 2019 09:48
À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : Re: OKD installation on CentOS 7.6

 

Hi Wilfried,

sadly I'm a bit out of ideas what could cause this issue. All the settings and configs I saw from you were looking good. 

And since you can access the servers only from certain hosts after the installation really looks like an external component breaks somethings.

My guess would be that maybe an external/internal firewall blocks external traffic to your nodes when certain ports are open (or similar). Maybe because of DNS to prevent spoofing? (I asked you once for the wrong nodes. Is dnsmasq running on the LB node?)

You could maybe verify that with stopping services origin-node and docker and try to get rid of all openshift specific processes (also dnsmasq), so only basic services are running (or disabling and reboot). 

Because you have a short window to access the server from your client, I'm pretty sure it's not a local firewalld issue, as network/firewall go up together. So a different service is causing the issue. I would try to identify this processes, until it's clear what component issues that behaviour. 

 

But you can access the cluster through the LB (e.g. 8443 or 443), right? 

 

Regards,

Nikolas

 

Am Mi., 17. Apr. 2019 um 09:12 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Hello Nikola,

 

Here the output of the firewall-cmd command on the LB and master:

LB:

public (active)

  target: default

  icmp-block-inversion: no

  interfaces: ens192

  sources:

  services: ssh dhcpv6-client

  ports: 10250/tcp 10256/tcp 80/tcp 443/tcp 4789/udp 9000-10000/tcp 1936/tcp

  protocols:

  masquerade: no

  forward-ports:

  source-ports:

  icmp-blocks:

  rich rules:

 

MASTER:

public (active)

  target: default

  icmp-block-inversion: no

  interfaces: ens192

  sources:

  services: ssh dhcpv6-client

  ports: 10250/tcp 10256/tcp 80/tcp 443/tcp 4789/udp 9000-10000/tcp 1936/tcp 2379/tcp 2380/tcp 9000/tcp 8443/tcp 8444/tcp 8053/tcp 8053/udp

  protocols:

  masquerade: no

  forward-ports:

  source-ports:

  icmp-blocks:

  rich rules:

 

 

logo-stluc

Wilfried Anuzet
Service Infrastructure
Département Information & Systèmes
Tél: +32 2 764 2488


Avenue Hippocrate, 10 - 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be

logo-fsl

Soutenez les Cliniques, soutenez la Fondation Saint-Luc
Support our Hospital, support
Fondation Saint-Luc

 

 

De : Nikolas Philips <nikolas philips gmail com>
Envoyé : mardi 16 avril 2019 19:17
À : ANUZET Wilfried <wilfried anuzet uclouvain be>
Cc : OpenShift Users List <users lists openshift redhat com>
Objet : Re: OKD installation on CentOS 7.6

 

Sorry Wilfried,

I missed the line with "os_firewall_use_firewalld" in your inventory file. 

What's the output of "firewall-cmd --list-all" on the LB and master?

 

 

Am Di., 16. Apr. 2019 um 17:52 Uhr schrieb ANUZET Wilfried <wilfried anuzet uclouvain be>:

Thanks Nikolas;

 

Here some answer to better identify the source problem:

 

·         I can connect via ssh before running the ansible installer, I run another ansible playbook before to be compliant wit our enterprise policy

In this playbook I just ensure that firewalld is up an running but I keep the default value (just ssh service open and icmp response not blocked.)

If I uninstall Openshift and reboot the server I can connect to it again.

 

·         All of these servers have only one NIC

 

·         I tried to disable firewalld and flush all iptables rules but stil can't join the server

/!\ I just see that I can join the server with another server in the same subnet without deactivate and flush the firewall /!\

 

·         Connected on one node:

disable origin node via systemd => still no connection

add ssh port and icmp in iptable => still no connection or icmp response

it seems that kubernets recreate some rules (via the pods/ docker container which are still running ? do I have to stop them all via docker container stop $(docker container ls -q) ?)

 

·         Here the information about one node

ip a sh

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000

    link/ether 00:50:56:92:79:03 brd ff:ff:ff:ff:ff:ff

    inet 10.244.246.68/24 brd 10.244.246.255 scope global noprefixroute ens192

       valid_lft forever preferred_lft forever

    inet6 fe80::250:56ff:fe92:7903/64 scope link

       valid_lft forever preferred_lft forever

3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default

    link/ether 02:42:cb:3e:8f:86 brd ff:ff:ff:ff:ff:ff

    inet 172.17.0.1/16 scope global docker0

       valid_lft forever preferred_lft forever

4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000

    link/ether 82:94:30:55:98:12 brd ff:ff:ff:ff:ff:ff

5: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000

    link/ether ee:73:3d:25:b7:48 brd ff:ff:ff:ff:ff:ff

6: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000

    link/ether 5a:63:33:de:9f:70 brd ff:ff:ff:ff:ff:ff

    inet6 fe80::5863:33ff:fede:9f70/64 scope link

       valid_lft forever preferred_lft forever

7: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000

    link/ether b6:35:b5:77:d4:60 brd ff:ff:ff:ff:ff:ff

    inet 10.131.0.1/23 brd 10.131.1.255 scope global tun0

       valid_lft forever preferred_lft forever

    inet6 fe80::b435:b5ff:fe77:d460/64 scope link

       valid_lft forever preferred_lft forever

 

netstat

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   

tcp        0      0 127.0.0.1:9101          0.0.0.0:*               LISTEN      16787/node_exporter

tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd          

tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      13155/openshift    

tcp        0      0 10.131.0.1:53           0.0.0.0:*               LISTEN      9666/dnsmasq       

tcp        0      0 10.244.246.68:53        0.0.0.0:*               LISTEN      9666/dnsmasq       

tcp        0      0 172.17.0.1:53           0.0.0.0:*               LISTEN      9666/dnsmasq       

tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      6515/sshd          

tcp        0      0 127.0.0.1:11256         0.0.0.0:*               LISTEN      13155/openshift    

tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      6762/master        

tcp6       0      0 :::9100                 :::*                    LISTEN      16837/./kube-rbac-p

tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd          

tcp6       0      0 :::10256                :::*                    LISTEN      13155/openshift    

tcp6       0      0 fe80::5863:33ff:fede:53 :::*                    LISTEN      9666/dnsmasq       

tcp6       0      0 fe80::b435:b5ff:fe77:53 :::*                    LISTEN      9666/dnsmasq       

tcp6       0      0 fe80::250:56ff:fe92::53 :::*                    LISTEN      9666/dnsmasq       

tcp6       0      0 :::22                   :::*                    LISTEN      6515/sshd          

tcp6       0      0 ::1:25                  :::*                    LISTEN      6762/master        

udp        0      0 127.0.0.1:53            0.0.0.0:*                           13155/openshift    

udp        0      0 10.131.0.1:53           0.0.0.0:*                           9666/dnsmasq       

udp        0      0 10.244.246.68:53        0.0.0.0:*                           9666/dnsmasq       

udp        0      0 172.17.0.1:53           0.0.0.0:*                           9666/dnsmasq       

udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd          

udp        0      0 127.0.0.1:323           0.0.0.0:*                           5855/chronyd       

udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -                  

udp        0      0 0.0.0.0:922             0.0.0.0:*                           5857/rpcbind       

udp6       0      0 fe80::5863:33ff:fede:53 :::*                                9666/dnsmasq       

udp6       0      0 fe80::b435:b5ff:fe77:53 :::*                                9666/dnsmasq       

udp6       0      0 fe80::250:56ff:fe92::53 :::*                                9666/dnsmasq       

udp6       0      0 :::111                  :::*                                1/systemd          

udp6       0      0 ::1:323                 :::*                                5855/chronyd       

udp6       0      0 :::4789                 :::*                                -                   

udp6       0      0 :::922                  :::*                                5857/rpcbind    


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]