[libvirt] [PATCH] snapshot: fix rollback failure in transaction mode

Wed Sep 12 18:33:00 UTC 2012

On 09/12/2012 01:47 PM, Eric Blake wrote:
> On 09/12/2012 09:22 AM, Guannan Ren wrote:
>> After failure of qemu transaction command, the snapshot file still
>> be there with non-zero in size. In order to unlink the file, the
>> patch removes the file size checking.
> 
> Can you give some exact steps to reproduce this, so that I know who is
> making the file have non-zero size?  I'm worried that unlinking a
> non-empty file is discarding data, so the commit message needs a lot
> more details about how we are proving that the only way the file can be
> non-zero size is because qemu happened to put data into a previously
> empty file prior to the failed 'transaction' attempt.
> 
> That is, after re-reading context code just now, I'm fairly confident
> that this code can only be reached when qemu supports the 'transaction'
> monitor command, and libvirt's --reuse-ext flag was not specified, and
> therefore libvirt must have just created the file.  But I want that in
> the commit message rather than having to re-read the code every time I
> visit this commit in future reads of the git log.  It may also be that
> qemu has a bug in that the 'transaction' command is modifying files even
> when it fails, so even while this works around the bug, I'm cc'ing Jeff
> to see if qemu also needs a bug fix.

If QEMU creates the snapshot file, upon transaction failure it is
possible to have a newly created image file left, depending on when the
failure occurs.  The running QEMU instance, however, will not be
affected.

For instance, if we are performing qcow2 snapshots on 2 drives using the
transaction command (e.g. drive0 and drive1), we will:

1. create qcow2 image file for drive0 new active layer
2. create qcow2 image file for drive1 new active layer
3. If 1 & 2 were successful, then we modify the live image chain
   structure in memory to use the newly created files.  Otherwise, we
   abandon the change, notify libvirt of the error, and leave any newly
   created files intact.

That means, on a snapshot failure, QEMU's live running operation will
not be affected, but the management software (libvirt) should clean up
any resulting image files, if appropriate.

It sounds like you expect QEMU to unlink any of the newly created
snapshot files on failure - is that an accurate statement?

> 
>> ---
>>  src/qemu/qemu_driver.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 8e8e00c..1fedfb8 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -10833,7 +10833,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct qemud_driver *driver,
>>      if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0)
>>          VIR_WARN("Unable to release lock on %s", disk->src);
>>      if (need_unlink && stat(disk->src, &st) == 0 &&
>> -        st.st_size == 0 && S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>> +        S_ISREG(st.st_mode) && unlink(disk->src) < 0)
>>          VIR_WARN("Unable to remove just-created %s", disk->src);
>>  
>>      /* Update vm in place to match changes.  */
>>
>