[libvirt] [frankschreuder at outlook.com: Fwd: Reproducible live disk detach bug]

Stefan Hajnoczi stefanha at gmail.com
Mon Aug 10 12:14:27 UTC 2015


On Sat, Aug 8, 2015 at 12:16 PM, Frank Schreuder <fschreuder at transip.nl> wrote:
> Op 8/7/2015 om 10:18 AM schreef Frank Schreuder:
>> Op 8/7/2015 om 12:24 AM schreef Laine Stump:
>>>>>
>>>>> Subject: Reproducible live disk detach bug
>>>>>
>>>>> Hey guys,
>>>>>
>>>>> I am currently running into a reproducible libvirt bug when I detach a
>>>>> disk from a running VM using libvirt 1.2.18 and qemu 2.3.0.
>>>>>
>>>>> I can attach a disk to a running VM without any problems.
>>>>> During the attach operation I don't create a disk alias name, which
>>>>> results in libvirt generating an unique alias.
>>>>> Using virsh dumpxml I get the following output(snippet):
>>>>>
>>>>> <disk type='file' device='disk'>
>>>>>        <driver name='qemu' type='raw' cache='none'/>
>>>>>        <source file='/var/local/mnt/test-storage/test-storage.raw'/>
>>>>>        <backingStore/>
>>>>>        <target dev='vde' bus='virtio'/>
>>>>>        <iotune>
>>>>> <read_bytes_sec>94371840</read_bytes_sec>
>>>>> <write_bytes_sec>31457280</write_bytes_sec>
>>>>>          <read_iops_sec>1000</read_iops_sec>
>>>>>          <write_iops_sec>200</write_iops_sec>
>>>>>        </iotune>
>>>>>        <alias name='virtio-disk4'/>
>>>>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x09'
>>>>> function='0x0'/>
>>>>> </disk>
>>>>>
>>>>> As you can see, libvirt generates 'virtio-disk4' as an alias. But as
>>>>> soon as I try to detach the disk from the running VM, I will receive
>>>>> the
>>>>> following two errors in my libvirt log:
>>>>>
>>>>> 2015-08-06 10:48:59.943+0000: 11657: warning :
>>>>> qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device
>>>>> deleted event
>>>>> 2015-08-06 10:48:59.945+0000: 11662: error :
>>>>> qemuMonitorTextDriveDel:2594 : operation failed: deleting
>>>>> drive-virtio-disk4 drive failed: 2015-08-06T10:48:59.945058Z Device
>>>>> 'drive-virtio-disk4' not found
>>>>>
>>>>> Apparently libvirt tries to detach the disk with a drive- prefix, which
>>>>> doesn't exist?
>>>
>>> I can't tell you right off why the failure message is there, but the
>>> alias with the added "drive-" is easily explained.
>>>
>>> Each <disk> element in the libvirt config corresponds to two entities in
>>> the qemu virtual machine - a "device" component (what is seen by the
>>> guest) and a "drive" component (what resources are used on the host to
>>> back up that device). Each of these corresponds to a separate qemu
>>> commandline argument (named "-device" and "-drive"), and each of those
>>> must have a different, unique "id" option. For example, this is what is
>>> added to my qemu commandline when I add a virtio disk named
>>> "rawhide.qcow2" to my virtual machine:
>>>
>>> -drive
>>>
>>> file=/var/lib/libvirt/images/rawhide.qcow2,if=none,id=drive-virtio-disk0,format=qcow2
>>> -device
>>>
>>> virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>>>
>>>
>>> If you look at the status XML of the domain while it is running, you'll
>>> see that the <alias> for the disk element is "virtio-disk0", and that's
>>> what is given to the -device half of the disk as its qemu id. The -drive
>>> half uses "drive-$alias" as its qemu id though (you'll notice that the
>>> -device half references that id with its "drive" option - that's how the
>>> two halves are tied together by qemu).
>>>
>>> So, there is nothing unusual about qemu wanting to delete something
>>> called drive-virtio-disk4 when you have a <disk> with
>>> alias='virtio-disk4'. It obviously isn't right that an error message is
>>> being logged. Something sounds familiar about the "missing device in
>>> device deleted event" message, but I can't recall details. Does anyone
>>> else have a better memory? Frank - would it be possible for you to try
>>> this same sequence of events with the same libvirt and an older qemu
>>> (and/or vice-versa)?
>>>
>>
>> I have some more interesting debug information.
>>
>> I'm -not- able to reproduce it on libvirt 1.2.11 - qemu 2.3.0 and libvirt
>> 1.2.11 - qemu 1.7.1 using a FreeBSD VM.
>> I'm able to reproduce the errors with libvirt 1.2.11 - qemu 2.3.0 and
>> libvirt 1.2.18 - qemu 2.3.0 using a Debian VM.
>> Seems like the error triggers depending on the guest OS?
>>
>> During hot-unplugging of network interfaces libvirt also triggers the same
>> error on a Debian VM:
>>
>> 2015-08-06 10:48:59.945+0000: 11662: error :
>> qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device deleted
>> event
>>
>>
>> I gathered some logging information with info level logging enabled while
>> detaching a disk:
>> http://pastebin.com/raw.php?i=vVLuavG2
>>
>> Debug level logging output was too big for a pastebin, contact me if you
>> are interested in the debug level output logging.
>>
>> Hope this will help solving this issue, if there is more information I can
>> provide or tests I can execute, please let me know.
>>
>>
>
> I may found something:
>
> src/qemu/qemu_monitor_text.c
> /* (qemu) drive_del wark
>       * Device 'wark' not found */
>     } else if (STRPREFIX(reply, "Device '") && (strstr(reply, "not found")))
> {
>          /* NB: device not found errors mean the drive was auto-deleted and
> we
>           * ignore the error */
>
> This piece of code should ignore the following error:
> 2015-08-06 21:14:56.847+0000: 8039: error : qemuMonitorTextDriveDel:2594 :
> operation failed: deleting drive-virtio-disk4 drive failed:
> 2015-08-06T21:14:56.847359Z Device 'drive-virtio-disk4' not found
>
> When I look at the error message in the logs again, it seems like (newer
> versions of) qemu or libvirt prefixes the command output with a time stamp
> (2015-08-06T21:14:56.847359Z).
> This means the string no longer starts with "Device '".
>
> I made a patch to make this check less strict (untested).
>
> diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c
> index 2aa0460..1052450 100644
> --- a/src/qemu/qemu_monitor_text.c
> +++ b/src/qemu/qemu_monitor_text.c
> @@ -2586,7 +2586,7 @@ int qemuMonitorTextDriveDel(qemuMonitorPtr mon,
>
>      /* (qemu) drive_del wark
>       * Device 'wark' not found */
> -    } else if (STRPREFIX(reply, "Device '") && (strstr(reply, "not
> found"))) {
> +    } else if (strstr(reply, "Device '") && (strstr(reply, "not found"))) {
>          /* NB: device not found errors mean the drive was auto-deleted and
> we
>           * ignore the error */
>      } else if (STRNEQ(reply, "")) {
>
> I hope somebody with more knowledge about libvirt can confirm this.

Weird that this hasn't been found before.  Maybe something changed recently?

With regards to guest OS, the behavior that libvirt sees may depend on
guest hot unplug support.  If the guest doesn't implement hot unplug,
then the Device foo not found error from drive_del will not occur.
That explains why FreeBSD doesn't hit this bug but Debian does.

Stefan




More information about the libvir-list mailing list