Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

Unfortunately, I didn't run it before I made the manual change.

I ran it just then, and I can see error messages in the output, is that worth giving it to you still?

The errors seemed to be coming from "azure_controller_standard.go" which seemed to be the code responsible for attaching/detaching Azure disks.  Although I'm guessing the code that decides when to detach a disk, is hiding somewhere else?

On Mon, 25 Nov 2019 at 15:26, Clayton Coleman <ccoleman redhat com> wrote:
Did you run must-gather while it couldn’t detach?

Without deeper debug info from the interval it’s hard to say.  If you can recreate it and run must gather we might be able to find it.

On Nov 24, 2019, at 10:25 PM, Joel Pearson <japearson agiledigital com au> wrote:


I updated some machine config to configure chrony for masters and workers, and I found that one of my containers got stuck after the masters had restarted.

One of the containers still couldn't start for 15 minutes, as the disk was still attached to master-2 whereas the pod had been scheduled on master-1.

In the end I manually detached the disk in the azure console.

Is this a known issue? Or should I have waited for more than 15 minutes?

Maybe this happened because the masters restarted and maybe whatever is responsible for detaching the disk got restarted, and there wasn't a cleanup process to detach from the original node? I'm not sure if this is further complicated by the fact that my masters are also workers?

Here is the event information from the pod:

  Warning  FailedMount         57s (x8 over 16m)   kubelet, resource-group-prefix-master-1  Unable to mount volumes for pod "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1 odoo-data default-token-5d6x7]

  Warning  FailedAttachVolume  55s (x15 over 15m)  attachdetach-controller                       AttachVolume.Attach failed for volume "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to instance "resource-group-prefix-master-1" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="ConflictingUserInput" Message="A disk with name resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2 already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached to VM /subscriptions/xxxx-xxx-xxxx-xxxx-xxxxx/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2. 'Name' is an optional property for a disk and a unique name will be generated if not provided." Target="/subscriptions/xxxx-xxx-xxxx-xxxx-xxxxx/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2"


