[PATCH] qemu: Don't report error from domblkinfo if vm is inactive

Hyman huangy81 at chinatelecom.cn
Tue Sep 6 09:43:53 UTC 2022



在 2022/9/6 15:47, Peter Krempa 写道:
> On Tue, Sep 06, 2022 at 10:29:01 +0800, huangy81 at chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81 at chinatelecom.cn>
>>
>> Libvirt logs and reports error when executing domblkinfo if vm
>> configured rbd storage and in inactive state.
>>
>> The steps to reproduce the problem:
>> 1. define and start domain with its storage configured rbd disk,
>>     and corresponding disk label is 'vda'.
>> 2. set vm in inactive state.
>> 3. call 'virsh domblklinfo' as the following and problem reproduced.
>>
>> $ virsh domblkinfo vm1 vda
>> error: internal error: missing storage backend for network files using
>> rbd protocol
>> Meanwhile, libvirtd log message also report the same error.
>>
>> To fix this, validate the disk type if vm is inactive before
>> refreshing capacity and allocation limits of a given storage source
>> in qemuDomainGetBlockInfo in advance, if storage source type is
>> VIR_STORAGE_TYPE_NETWORK and net protocol is
>> VIR_STORAGE_NET_PROTOCAOL_RBD, set info to 0 like 'domblkinfo --all'
>> command does and return directly.
> 
> This problem is not specific for RBD, but for everything where the
> backend is not loaded or doesn't exist.
> 
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1724808
> 
> This BZ is not appropriate, this was reported for log entries in the
> bulk stats code and more importantly it's actually closed as resolved,
> so pointing to it makes no sense.
> 
>> Signed-off-by: Hyman Huang(黄勇) <huangy81 at chinatelecom.cn>
>> Signed-off-by: Pengcheng Deng <dengpc12 at chinatelecom.cn>
>> ---
>>   src/qemu/qemu_driver.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index c7cca64..bfe1fa2 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -11118,6 +11118,19 @@ qemuDomainGetBlockInfo(virDomainPtr dom,
>>   
>>       /* for inactive domains we have to peek into the files */
>>       if (!virDomainObjIsActive(vm)) {
>> +        /* for rbd protocol, virStorageFileBackend not loaded if vm is inactive,
>> +         * so generate 0 based info like 'domblkinfo --all' does and return directly
>> +         * */
>> +        if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_NETWORK &&
>> +            disk->src->protocol == VIR_STORAGE_NET_PROTOCOL_RBD) {
> 
> So with this piece of code you plaster over a very specific sub-part of
> the problem noted above, but leave other bits unadressed. What's worse
> that the behaviour will differ between protocols and states which is
> unacceptable.
> 
> If you look at the BZ you've linked above you'll find there patches
> which fix the issue in the bulk stats code. Apart from commits for
> adding the infrastructure there is the following commit:
> 
>    commit 60b862cf9d6a335db65bbf2b011499dfa729ca2e
>    Author: Peter Krempa <pkrempa at redhat.com>
>    Date:   Wed Aug 14 18:46:09 2019 +0200
>    
>        qemu: Don't report some ignored errors in qemuDomainGetStatsOneBlockFallback
>    
>        The function ignores all errors from qemuStorageLimitsRefresh by calling
>        virResetLastError. This still logs them. Since qemuStorageLimitsRefresh
>        allows suppressing some, do so.
>    
>        Signed-off-by: Peter Krempa <pkrempa at redhat.com>
>        Reviewed-by: Ján Tomko <jtomko at redhat.com>
>    
>    diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>    index 243a8ac4cf..f44d134b92 100644
>    --- a/src/qemu/qemu_driver.c
>    +++ b/src/qemu/qemu_driver.c
>    @@ -21301,7 +21301,7 @@ qemuDomainGetStatsOneBlockFallback(virQEMUDriverPtr driver,
>         if (virStorageSourceIsEmpty(src))
>             return 0;
>    
>    -    if (qemuStorageLimitsRefresh(driver, cfg, dom, src, false) < 0) {
>    +    if (qemuStorageLimitsRefresh(driver, cfg, dom, src, true) <= 0) {
>             virResetLastError();
>             return 0;
>    
> 
> So at the very least to properly address all instances this patch would
> look similarly to the above.
> 
> 
>> +            info->capacity = 0;
>> +            info->allocation = 0;
>> +            info->physical = 0;
>> +
>> +            ret = 0;
>> +            goto endjob;
>> +        }
>> +
>>           if ((qemuStorageLimitsRefresh(driver, cfg, vm, disk->src, false)) < 0)
>>               goto endjob;
> 
> Now for this very specific instance 'qemuDomainGetBlockInfo' is querying
> data for only one disk (in contrast to the bulk stats API). As of such
> it's okay to report an error if the required data can't be obtained.
> Additionally that is what this code was doing for a long time.
> 
> In case of individual query APIs it's usually better if the user gets an
> error if the required data can't be fetched.
> 
> Based on the above, my stance is that the behaviour of reporting an
> error is correct here. If you need to collect stats without reporting
> error I strongly suggest using the bulk stats API as that is
> specifically designed to omit information it can't gather. Here where
> the query is specific, failure to obtain the information should produce
> an error.
> 

Thanks Peter for commenting this patch and expaining why error reporting 
is resonable. :)
So the conclusion we draw for upper app is that query data for specific 
disk should substituded with bulk stats API. Got it !



More information about the libvir-list mailing list