question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"

Tue Sep 21 10:02:06 UTC 2021

Please fix your address book, it's 'libvir-list at redhat.com' not
'libvirt-list at redhat.com'

On Tue, Sep 21, 2021 at 00:52:57 +0800, wangjie (P) wrote:
> bug reproduce process:
> 1、perform migrateToURI3.
> 2、kill libvirtd when enter memory migration phase，and restart libvirtd.

I presume this is a reproducer and not a normal approach.

> 3、perform migrateToURI3 again and again，migrateToURI3 will fail forever with err-msg "Requested operation is not valid: domain has active block job"
> 
> 
> I found the reasion which trigger the bug as follow:
> 
> 1、the qemuBlockJobData is not persistent when libvirtd restart，so the job which return from qemuBlockJobDiskGetJob while always NULL, so qemuMigrationSrcNBDCopyCancel will not be taken.
> 
> 2、calltrace:
> qemuProcessReconnect
> ->qemuProcessRecoverJob
>   ->qemuProcessRecoverMigrationOut
>     ->qemuMigrationSrcCancel
> 
> 3、code as follow:
> qemuMigrationSrcCancel(virQEMUDriver *driver,
>                        virDomainObj *vm)
> {
>     ... ...
>     for (i = 0; i < vm->def->ndisks; i++) {
>         virDomainDiskDef *disk = vm->def->disks[i];
>         qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
>         qemuBlockJobData *job;
> 
>         if (!(job = qemuBlockJobDiskGetJob(disk)) ||              //the job is always NULL !!!
>             !qemuBlockJobIsRunning(job))

I'll have a look. The blockjob data should have been recovered at this
point. There's possibility that it's just wrong ordering of function
calls.

>             diskPriv->migrating = false;
> 
>         if (diskPriv->migrating) {
>             qemuBlockJobSyncBegin(job);
>             storage = true;
>         }
> 
>         virObjectUnref(job);
>     }
>     ... ...
> 
>     if (storage &&
>     qemuMigrationSrcNBDCopyCancel(driver, vm, true,
>                                   QEMU_ASYNC_JOB_NONE, NULL) < 0)
>     return -1;
>     ... ...
> }

Next time please file an issue in the upstream bug tracker.