question: about the bug: current master had lost the ability "Cancel disk mirrors after libvirtd restart"

wangjie (P) wangjie88 at huawei.com
Mon Sep 20 16:52:57 UTC 2021


bug reproduce process:
1、perform migrateToURI3.
2、kill libvirtd when enter memory migration phase,and restart libvirtd.
3、perform migrateToURI3 again and again,migrateToURI3 will fail forever with err-msg "Requested operation is not valid: domain has active block job"


I found the reasion which trigger the bug as follow:

1、the qemuBlockJobData is not persistent when libvirtd restart,so the job which return from qemuBlockJobDiskGetJob while always NULL, so qemuMigrationSrcNBDCopyCancel will not be taken.

2、calltrace:
qemuProcessReconnect
->qemuProcessRecoverJob
  ->qemuProcessRecoverMigrationOut
    ->qemuMigrationSrcCancel

3、code as follow:
qemuMigrationSrcCancel(virQEMUDriver *driver,
                       virDomainObj *vm)
{
    ... ...
    for (i = 0; i < vm->def->ndisks; i++) {
        virDomainDiskDef *disk = vm->def->disks[i];
        qemuDomainDiskPrivate *diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
        qemuBlockJobData *job;

        if (!(job = qemuBlockJobDiskGetJob(disk)) ||              //the job is always NULL !!!
            !qemuBlockJobIsRunning(job))
            diskPriv->migrating = false;

        if (diskPriv->migrating) {
            qemuBlockJobSyncBegin(job);
            storage = true;
        }

        virObjectUnref(job);
    }
    ... ...

    if (storage &&
    qemuMigrationSrcNBDCopyCancel(driver, vm, true,
                                  QEMU_ASYNC_JOB_NONE, NULL) < 0)
    return -1;
    ... ...
}


4、I think current master had lost the ability of the followed patch:
http://10.175.124.40/cgit/cgit.cgi/code.huawei.com/libvirt.git/commit/?id=e8f263e0d006390c3764aaa07093b2d174b61379


can you give some suggestions to fix it?








More information about the libvir-list mailing list