[libvirt] [PATCH] qemu: Fix regression in snapshot-revert

Daniel Henrique Barboza danielhb413 at gmail.com
Tue Sep 10 19:34:07 UTC 2019



On 9/10/19 3:59 PM, Eric Blake wrote:
> On 9/9/19 5:17 PM, Daniel Henrique Barboza wrote:
>>
>> On 9/9/19 5:52 PM, Eric Blake wrote:
>>> Commit f10562799 introduced a regression: if reverting to a snapshot
>>> fails early (such as when we refuse to revert to an external
>>> snapshot), we lose track of the domain's current snapshot.
>>>
>>> See: https://bugzilla.redhat.com/1738747
>>> Signed-off-by: Eric Blake <eblake at redhat.com>
>>> ---
>> I don't understand why f10562799 broke this code like this - tried
>> looking the changes made in the commit and the "if (snap)" was
>> there since a long time, no changes were made in the 'snap'
>> variable as well - but this change didn't break anything else, so:
>>
>> Reviewed-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
> I'll update the commit message to give more details:
>
> Before that patch, we maintained the notion of the current snapshot in
> two places: vm->current_snapshot, and snap->def->current.  If you have a
> domain with two snapshots, s1 and s2 (where s2 is current) and want to
> revert to s1, the old code cleared the def->current flag on s2 before
> noticing that reverting to s1 was not possible, but at least left
> vm->current_snapshot unchanged.  That meant we had a _different_ bug -
> the code was inconsistent on whether the domain had a current snapshot
> (as long as libvirtd didn't shut down, vm->current_snapshot ruled, so s2
> was still claimed as current; but if you restart libvirtd, the XML for
> s2 didn't track that it was current, so after restart you'd have no
> current snapshot).  After that patch, the code was unconditionally
> changing vm->current_snapshot to NULL (but at least was consistent
> everywhere else that the XML never got out of sync with the notion of
> which snapshot was current).  It also didn't help that after that patch,
> the code clearing the snapshot occurs twice in the function - once right
> after determining that the early checks have succeeded, the other
> unconditionally on all failure paths.
>
> So the fix is as simple as removing the unconditional clearing of s2 as
> the current snapshot in the cleanup code, in favor of the earlier
> clearing that happens only after the early checks succeed.

Thanks for the explanation!

>
>>
>> ps: I think adding a test case for this failure (in tests/virsh-snapshot ?)
>> is worth considering, especially considering that we changes from
>> from Maxiwell (adding an inactive XML to the snapshot) that can
>> add up more complexity in the snapshot mechanics.
> I tried. But the test driver doesn't forbid 'snapshot-revert' on
> external snapshots, and also doesn't try to write state to XML files, so
> it never copied the problematic code from the qemu driver that could
> even trigger the bug.

I see. But now that I understood what you did and how the code is
neater, I changed my mind - I don't think Maxiwell's changes are going to
be too hard of a hit there. We can enhance the test_driver to
behave like the regular driver in this case of external snapshots, and
add a test case for this scenario, in another day.


Thanks,


DHB

>
>
>>>    src/qemu/qemu_driver.c | 2 --
>>>    1 file changed, 2 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>> index b28a26c3d6..093b15f500 100644
>>> --- a/src/qemu/qemu_driver.c
>>> +++ b/src/qemu/qemu_driver.c
>>> @@ -16941,8 +16941,6 @@
>>> qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
>>>                virDomainSnapshotSetCurrent(vm->snapshots, NULL);
>>>                ret = -1;
>>>            }
>>> -    } else if (snap) {
>>> -        virDomainSnapshotSetCurrent(vm->snapshots, NULL);
>>>        }
>>>        if (ret == 0 && config && vm->persistent &&
>>>            !(ret = virDomainSaveConfig(cfg->configDir, driver->caps,
>>




More information about the libvir-list mailing list