[libvirt] [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap

Eric Blake eblake at redhat.com
Fri Dec 6 14:36:29 UTC 2019


[adding in Peter Maydell, as there is now potential talk of other 
4.2-worthy patches]

On 12/6/19 4:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> 05.12.2019 23:16, John Snow wrote:
>>
>>
>> On 12/5/19 3:09 PM, Eric Blake wrote:
>>> On 12/5/19 1:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Here is double bug:
>>>>
>>>> First, return error but not set errp. This may lead to:
>>>> qmp block-dirty-bitmap-remove may report success when actually failed
>>>>
>>>> block-dirty-bitmap-remove used in a transaction will crash, as
>>>> qmp_transaction will think that it returned success and will cal
>>>
>>> call
>>>
>>>> block_dirty_bitmap_remove_commit which will crash, as state->bitmap is
>>>> NULL
>>>>
>>>> Second (like in anecdote), this case is not an error at all. As it is
>>>> documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
>>>> definition, absence of bitmap is not an error, and similar case handled
>>>> at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
>>>> there is no bitmaps at all..
>>>
>>> double .
>>>
>>>>
>>>> But when there are some bitmaps, but not the requested one, it return
>>>> error with errp unset.
>>>>
>>>> Fix that.
>>>>
>>>> Fixes: b56a1e31759b750
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
>>>> ---
>>>>
>>>> Hi all!
>>>>
>>>> Ohm, suddenly we faced this bug. It's a regression in 4.2. I'm very
>>>> sorry for introducing it, and it sad that it's found so late..
>>>>
>>>> Personally, I think that this one worth rc5, as it makes new bitmap
>>>> interfaces unusable. But the decision is yours.
>>>>
>>>> Last minute edit: hmm, actually, transaction action introduced in
>>>> 4.2, so crash is not a regression, only broken block-dirty-bitmap-remove
>>>> command is a regression... Maybe it's OK for stable.
>>>
>>> Libvirt REALLY wants to use transaction bitmap management (and require
>>> qemu 4.2) for its incremental backup patches that Peter is almost ready
>>> to merge in.  I'm trying to ascertain:
>>>
>>> When exactly can this bug hit?  Can you give a sequence of QMP commands
>>> that will trigger it for easy reproduction?  Are there workarounds (such
>>> as checking that a bitmap exists prior to attempting to remove it)?  If
>>> it does NOT get fixed in rc5, how will libvirt be able to probe whether
>>> the fix is in place?
>>>
>>
>> It looks like:
>>
>> - You need to have at least one bitmap
>> - You need to use transactional remove
>> - you need to misspell the bitmap name
>> - The remove will fail (return -EINVAL) but doesn't set errp
>> - The caller chokes on incomplete information, state->bitmap is NULL
> 
> 
> No, that would be too simple, the thing is worse. Absolutele correct removing is broken, without any misspelling
> 
> Bug triggers when we are removing persistent bitmap that is not stored yet in the image AND at least one (another) bitmap already stored in the image. So, something like:
> 
> 1. create persistent bitmap A
> 2. shutdown vm  (bitmap A is synced)
> 3. start vm
> 4. create persistent bitmap B
> 5. remove bitmap B - it fails (and crashes if in transaction)
> 
> ====
> 
> Hmm, workaround..
> 
> I'm afraid that the only thing is not remove persistent bitmaps, which were never synced to the image. So, instead the sequence above, we need
> 
> 
> 1. create persistent bitmap A
> 2. shutdown vm
> 3. start vm
> 4. create persistent bitmap B
> 5. remember, that we want to remove bitmap B after vm shutdown
> ...
>    some other operations
> ...
> 6. vm shutdown
> 7. start vm in stopped mode, and remove all bitmaps marked for removing
> 8. stop vm
> 
> But, I think that in real circumstances, vm shutdown is rare thing...

This is sounding a bit more serious. As I said earlier, it shouldn't 
delay 4.2 on its own, but if the fix is obvious (and other than 
comments, it is a single change from 'ret = -EINVAL' to 'ret = 0' which 
fixes a definite reproducible crash), I think it rises to the level of 
acceptable.

I've been so worried about the question of which release, that I don't 
know if I've previously offered:
Reviewed-by: Eric Blake <eblake at redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the libvir-list mailing list