Hi,This is quite puzzling, ... could you share your inventory with us? make sure to obfuscate any sensitive data (ldap/htpasswd credentials among others, ...)mostly interested in potential openshift_node_groups edition. Although something else might come up (?)At first glance, you are right, it sounds like a firewalling issue.Yet from your description, you did open all required ports.I could suggest you check back on these, make sure your data is accurate - although I would assume it is.Also: if using Cri-O as a runtime, note that you would be missing port 10010, that should be opened on all nodes. Yet I don't think that one would be related to nodes registrations against your master API.Another explanation could be related to DNS (can your infra/compute nodes properly resolve your masters name? the contrary would be unusual, still could explain what's going on).As a general rule, at that stage, I would restart the origin-node service on those hosts that fail to register, keeping an eye on /var/log/messages (or journalctl -f).If that doesn't help, I might raise log levels in /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can change it to 99, beware it would give you a lots of logs/could saturate your disks at some point, don't keep it like this over a long period)Dealing with large volumes of logs, note that openshift services tends to store messages with prefix based on severity: you might be able to "| grep -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for warnings, ...Your issue being potentially related to firewalling, I might also use tcpdump looking into what's being exchanged between nodes.Look for any packets with a SYN flag ("[S]") that would not be followed by an SYN-ACK ("[S.]").Let us know how that goes,Good luck.Failing during the "Approve node certificate" steps is relatively common, and could have several causes, from node groups configuration, to DNS, firewalls, broken TCP handshake, MTU not allowing for certificates to go through, ... we'll want to dig deeper, to elucidate that issue.Regards.On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <dan punga gmail com> wrote:Hello all!_______________________________________________I'm hitting a problem when trying to install a OKD3.11 on one master 2 infra and 2 compute nodes. The hosts are VM that run centos7.I've gone through the issues related to this subject: https://access.redhat.com/solutions/3680401 which suggest naming the hosts as FQDN. Tried it with the same problem appearing for the same set of hosts(all except the master).In my case the error is only for the 2 infra nodes and 2 compute nodes, so not for the master as well.oc get nodes gives me just the master node, but I guess this is the case as the other OKD-nodes stand to be created by the process that fails. Am I wrong?oc get csr gives me a result of 3 csrs:[root master ~]# oc get csr
NAME AGE REQUESTOR CONDITION
csr-4xjjb 24m system:admin Approved,Issued
csr-b6x45 24m system:admin Approved,Issued
csr-hgmpf 20m system:node:master Approved,IssuedHere I believe I have 2 csrs for system:Admin because I ran the playbooks/openshift-node/join.yml a second time.The bootstrapping certificates on the master look fine(??)[root master ~]# ll /etc/origin/node/certificates/
-rw-------. 1 root root 2830 iun 1 11:30 kubelet-client-2019-06-01-11-30-04.pem
-rw-------. 1 root root 1135 iun 1 11:31 kubelet-client-2019-06-01-11-31-23.pem
lrwxrwxrwx. 1 root root 68 iun 1 11:31 kubelet-client-current.pem -> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem
-rw-------. 1 root root 1179 iun 1 11:35 kubelet-server-2019-06-01-11-35-42.pem
lrwxrwxrwx. 1 root root 68 iun 1 11:35 kubelet-server-current.pem -> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pemI've rechecked the open ports thinking the issue lies in some network-related config.- all hosts have the node related ports opened: 53/udp, 10250/tcp, 4789/udp- master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp, 4789/udp, 53/udp- infra has on top of the node ones, the ports related to router/routes and logging components which it will hostThe chosen SDN is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no extra config in the inventory file. (Do I need any?)Any hints about where and what to check would be much appreciated!Best regards,Dan Pungă
users mailing list
users lists openshift redhat com
--Samuel Martín Moro
"Nobody wants to say how this works.
Maybe nobody knows ..."