I'm trying to bootstrap a disconnected (air-gapped) 4.2 cluster using the bare metal method
. It is technically vmware, but I'm following the bare metal version as our vmware cluster wasn't quite compatible with the vmware instructions.
After a few false starts I managed to get the bootstrapping to start to take place. One strange thing that happened was that it was trying to download images from "quay.io/openshift-release-dev/ocp-v4.0-art-dev
" instead of the documented "quay.io/openshift-release-dev/ocp-release
". I found this rather odd, and I couldn't find many references to "ocp-v4.0-art-dev" on the internet, so I'm not sure exactly where it came from. I did a "strings openshift-install | grep ocp-v4.0-art-dev" but that didn't show anything, so it's a bit of a strange one.
So my image content sources ended up being:
I was watching the journalctl on the bootstrap server, and I saw each etcd server join one by one, then once they had all joined, then the apiserver on the bootstrap server seemed to lockup, when I tried to connect to https://localhost:6443
the connections would hang. Initially, I thought this meant that bootstrap had completed, but then I noticed that none of the master nodes were listing on 6443, they were all trying to look themselves up in etcd at "api-int.<cluster_name>.<base_domain>" but nothing was listening.
I then scoured the journal on the bootstrap node, but I struggled to find logs related to why the apiserver had disappeared. The journal was mostly full of the bootstrap node trying to connect to https://localhost:6443
, which suggested to me that bootstrap was not yet complete.
I tried rebooting the bootstrap node, but I think that made it worse, it seemed to be in a crash loop whinging about files in /etc/kubernetes already existing or something like that. I had a look through /var/logs and found this error message in some pod logs:
exiting because of error: log: unable to create log: open /var/log/bootstrap-control-plane/kube-apiserver.log: permission denied
I'm not sure if that error is because I restarted before bootstrap was successful, or if that is actually some sort of problem.
I tried reinstalling from scratch a few times, and it always got stuck in the same place, so it doesn't seem to be transient.
Where can I look for errors? Is "ocp-v4.0-art-dev" an indication of a problem? Since it's an air-gapped solution it's difficult to get logs out of the system, so I don't know if I'll be able to use must-gather. However, if I'm understanding it correctly, must-gather can only be used after bootstrap has succeeded.