[PATCH] qemu: Don't report error from domblkinfo if vm is inactive
Hyman
huangy81 at chinatelecom.cn
Tue Sep 6 09:43:53 UTC 2022
在 2022/9/6 15:47, Peter Krempa 写道:
> On Tue, Sep 06, 2022 at 10:29:01 +0800, huangy81 at chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81 at chinatelecom.cn>
>>
>> Libvirt logs and reports error when executing domblkinfo if vm
>> configured rbd storage and in inactive state.
>>
>> The steps to reproduce the problem:
>> 1. define and start domain with its storage configured rbd disk,
>> and corresponding disk label is 'vda'.
>> 2. set vm in inactive state.
>> 3. call 'virsh domblklinfo' as the following and problem reproduced.
>>
>> $ virsh domblkinfo vm1 vda
>> error: internal error: missing storage backend for network files using
>> rbd protocol
>> Meanwhile, libvirtd log message also report the same error.
>>
>> To fix this, validate the disk type if vm is inactive before
>> refreshing capacity and allocation limits of a given storage source
>> in qemuDomainGetBlockInfo in advance, if storage source type is
>> VIR_STORAGE_TYPE_NETWORK and net protocol is
>> VIR_STORAGE_NET_PROTOCAOL_RBD, set info to 0 like 'domblkinfo --all'
>> command does and return directly.
>
> This problem is not specific for RBD, but for everything where the
> backend is not loaded or doesn't exist.
>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1724808
>
> This BZ is not appropriate, this was reported for log entries in the
> bulk stats code and more importantly it's actually closed as resolved,
> so pointing to it makes no sense.
>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81 at chinatelecom.cn>
>> Signed-off-by: Pengcheng Deng <dengpc12 at chinatelecom.cn>
>> ---
>> src/qemu/qemu_driver.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index c7cca64..bfe1fa2 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -11118,6 +11118,19 @@ qemuDomainGetBlockInfo(virDomainPtr dom,
>>
>> /* for inactive domains we have to peek into the files */
>> if (!virDomainObjIsActive(vm)) {
>> + /* for rbd protocol, virStorageFileBackend not loaded if vm is inactive,
>> + * so generate 0 based info like 'domblkinfo --all' does and return directly
>> + * */
>> + if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_NETWORK &&
>> + disk->src->protocol == VIR_STORAGE_NET_PROTOCOL_RBD) {
>
> So with this piece of code you plaster over a very specific sub-part of
> the problem noted above, but leave other bits unadressed. What's worse
> that the behaviour will differ between protocols and states which is
> unacceptable.
>
> If you look at the BZ you've linked above you'll find there patches
> which fix the issue in the bulk stats code. Apart from commits for
> adding the infrastructure there is the following commit:
>
> commit 60b862cf9d6a335db65bbf2b011499dfa729ca2e
> Author: Peter Krempa <pkrempa at redhat.com>
> Date: Wed Aug 14 18:46:09 2019 +0200
>
> qemu: Don't report some ignored errors in qemuDomainGetStatsOneBlockFallback
>
> The function ignores all errors from qemuStorageLimitsRefresh by calling
> virResetLastError. This still logs them. Since qemuStorageLimitsRefresh
> allows suppressing some, do so.
>
> Signed-off-by: Peter Krempa <pkrempa at redhat.com>
> Reviewed-by: Ján Tomko <jtomko at redhat.com>
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 243a8ac4cf..f44d134b92 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -21301,7 +21301,7 @@ qemuDomainGetStatsOneBlockFallback(virQEMUDriverPtr driver,
> if (virStorageSourceIsEmpty(src))
> return 0;
>
> - if (qemuStorageLimitsRefresh(driver, cfg, dom, src, false) < 0) {
> + if (qemuStorageLimitsRefresh(driver, cfg, dom, src, true) <= 0) {
> virResetLastError();
> return 0;
>
>
> So at the very least to properly address all instances this patch would
> look similarly to the above.
>
>
>> + info->capacity = 0;
>> + info->allocation = 0;
>> + info->physical = 0;
>> +
>> + ret = 0;
>> + goto endjob;
>> + }
>> +
>> if ((qemuStorageLimitsRefresh(driver, cfg, vm, disk->src, false)) < 0)
>> goto endjob;
>
> Now for this very specific instance 'qemuDomainGetBlockInfo' is querying
> data for only one disk (in contrast to the bulk stats API). As of such
> it's okay to report an error if the required data can't be obtained.
> Additionally that is what this code was doing for a long time.
>
> In case of individual query APIs it's usually better if the user gets an
> error if the required data can't be fetched.
>
> Based on the above, my stance is that the behaviour of reporting an
> error is correct here. If you need to collect stats without reporting
> error I strongly suggest using the bulk stats API as that is
> specifically designed to omit information it can't gather. Here where
> the query is specific, failure to obtain the information should produce
> an error.
>
Thanks Peter for commenting this patch and expaining why error reporting
is resonable. :)
So the conclusion we draw for upper app is that query data for specific
disk should substituded with bulk stats API. Got it !
More information about the libvir-list
mailing list