[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: OpenShift 3.6 on AWS creating EBS volumes in wrong region



Is the volume at least attached to the node where you were expecting?

Can you post following:

1. oc get pvc <pvc_name> -o json
2. oc get pv <pv> -o json
3. oc get pod <pod> -o json
4. oc describe pod <pod>
6. output of lsblk and /proc/self/mountinfo on node where volume was supposed to get attached and mounted.
7. Both kubelet and controller-manager logs. Controller-manager logs are important in to debug - why volume did not attach in time. You find controller-manager's log via journacl -u atomic-openshift-master-controller-manager (or whatever is the name of controller-manager systemd unit)


You can send them to me personally - if you would rather not post sensitive information to public mailing list.





On Sun, Jan 7, 2018 at 2:58 PM, Marc Boorshtein <mboorshtein gmail com> wrote:
sounds like the SELinux error is a red herring.  found a red hat bug report showing this isn't an issue.  This is all I'm seeing in the node's system log:

Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.381938    1750 kubelet.go:1854] SyncLoop (ADD, "api"): "mariadb-3-5425j_test2(f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)"
Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.495545    1750 reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2")
Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.595841    1750 reconciler.go:257] operationExecutor.MountVolume started for volume "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2")
Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.608039    1750 operation_generator.go:481] MountVolume.SetUp succeeded for volume "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2")
Jan  7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395023    1750 kubelet.go:1594] Unable to mount volumes for pod "mariadb-3-5425j_test2(f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)": timeout expired waiting for volumes to attach/mount for pod "test2"/"mariadb-3-5425j". list of unattached/unmounted volumes=[mariadb-data]; skipping pod
Jan  7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395068    1750 pod_workers.go:186] Error syncing pod f6e9aa44-f3e3-11e7-96b9-0abad0f909f2 ("mariadb-3-5425j_test2(f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)"), skipping: timeout expired waiting for volumes to attach/mount for pod "test2"/"mariadb-3-5425j". list of unattached/unmounted volumes=[mariadb-data]

i'm kind of at a loss where else to look.  There are other EBS volumes on the server to handle local disks and the docker storage volume.  No selinux errors.  Any ideas where to look?

Thanks

On Sun, Jan 7, 2018 at 2:28 PM Marc Boorshtein <mboorshtein gmail com> wrote:
The only errors I can find are in dmesg on the node thats running the pod:

[ 1208.768340] XFS (dm-6): Mounting V5 Filesystem
[ 1208.907628] XFS (dm-6): Ending clean mount
[ 1208.937388] XFS (dm-6): Unmounting Filesystem
[ 1209.016985] XFS (dm-6): Mounting V5 Filesystem
[ 1209.148183] XFS (dm-6): Ending clean mount
[ 1209.167997] XFS (dm-6): Unmounting Filesystem
[ 1209.218989] XFS (dm-6): Mounting V5 Filesystem
[ 1209.342131] XFS (dm-6): Ending clean mount
[ 1209.386249] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 1217.550065] pci 0000:00:1d.0: [1d0f:8061] type 00 class 0x010802
[ 1217.550128] pci 0000:00:1d.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 1217.551181] pci 0000:00:1d.0: BAR 0: assigned [mem 0xc0000000-0xc0003fff]
[ 1217.559756] nvme nvme3: pci function 0000:00:1d.0
[ 1217.568601] nvme 0000:00:1d.0: enabling device (0000 -> 0002)
[ 1217.575951] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X
[ 1218.500526] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X
[ 1218.500547] nvme 0000:00:1d.0: irq 34 for MSI/MSI-X

google's found some issues with coreos, but nothing for openshift and ebs.  I'm running cetos 7.4, docker is at Docker version 1.12.6, build ec8512b/1.12.6 running on M5.large instances

On Sat, Jan 6, 2018 at 10:19 PM Hemant Kumar <hekumar redhat com> wrote:
The message you posted is generic message that is logged (or surfaced via events) when openshift-node process couldn't find attached volumes within specified time. That message in itself does not mean that node process will not retry (in fact it will retry more than once) and if volume is attached and mounted - pod will start correctly.

There may be something else going on here - I can't say for sure without looking at openshift's node and controller-manager's logs.





On Sat, Jan 6, 2018 at 9:38 PM, Marc Boorshtein <mboorshtein gmail com> wrote:
Thank you for the explanation.  That now makes sense.  I redeployed with 3.7 and the correct tags on the ec2 instances.  Now my new issue is that I'm continuously getting the error "Unable to mount volumes for pod "jenkins-2-lrgjb_test(ca61f578-f352-11e7-9237-0abad0f909f2)": timeout expired waiting for volumes to attach/mount for pod "test"/"jenkins-2-lrgjb". list of unattached/unmounted volumes=[jenkins-data]" when trying to deploy jenkins.   The EBS volume is created, the volume is attached to the node when i run lsblk i see the device but it just times out.

Thanks
Marc 

On Sat, Jan 6, 2018 at 6:43 AM Hemant Kumar <hekumar redhat com> wrote:
Correction in last sentence:

" hence it will pick NOT zone in which Openshift cluster did not exist."

On Sat, Jan 6, 2018 at 6:36 AM, Hemant Kumar <hekumar redhat com> wrote:
Let me clarify - I did not say that you have to "label" nodes and masters. 

I was suggesting to tag nodes and masters, the way you tag a cloud resource via AWS console or AWS CLI. I meant - AWS tag not openshift labels.

The reason you have volumes created in another zone is because - your AWS account has nodes in more than one zone, possibly not part of Openshift cluster. But when you are requesting a dynamic provisioned volume - Openshift considers all nodes it can find and accordingly it "randomly" selects a zone among zone it discovered. 

But if you were to use AWS Console or CLI to tag all nodes(including master) in your cluster with "KubernetesCluster" : "cluster_id"  then it will only select tagged nodes and hence it will pick zone in which Openshift cluster did not exist.



On Fri, Jan 5, 2018 at 11:48 PM, Marc Boorshtein <mboorshtein gmail com> wrote:
how do i label a master?  When i create PVCs it switches between 1c and 1a.  look on the master I see:

Creating volume for PVC "wtf3"; chose zone="us-east-1c" from zones=["us-east-1a" "us-east-1c"]

Where did us-east-1c come from???

On Fri, Jan 5, 2018 at 11:07 PM Hemant Kumar <hekumar redhat com> wrote:
Both nodes and masters. The tag information is picked from master itself(Where controller-manager is running) and then openshift uses same value to find all nodes in the cluster.




On Fri, Jan 5, 2018 at 10:26 PM, Marc Boorshtein <mboorshtein gmail com> wrote:
node and masters?  or just nodes? (sounded like just nodes from the docs)

On Fri, Jan 5, 2018 at 9:16 PM Hemant Kumar <hekumar redhat com> wrote:
Make sure that you configure ALL instances in the cluster with tag "KubernetesCluster": "value". The value of the tag for key "KubernetesCluster" should be same for all instances in the cluster. You can choose any string you want for value.

You will probably have to restart openshift controller-manager after the change at very minimum.



On Fri, Jan 5, 2018 at 8:21 PM, Marc Boorshtein <mboorshtein gmail com> wrote:
Hello,

I have a brand new Origin 3.6 running on AWS, the master and all nodes are in us-east-1a but whenever I try to have AWS create a new volume, it puts it in us-east-1c so then no one can access it and all my nodes go into a permanent pending state because NoVolumeZoneConflict.  Looking at aws.conf it states us-east-1a.  What am I missing?

Thanks

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users








[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]