[libvirt] [libvirt-3.2] NBD-based storage migration fails with "error: invalid argument: monitor must not be NULL"

Kashyap Chamarthy kchamart at redhat.com
Mon Apr 10 15:26:39 UTC 2017


On Fri, Apr 07, 2017 at 02:12:31PM +0200, Kashyap Chamarthy wrote:
> On Fri, Apr 07, 2017 at 08:22:01AM +0200, Jiri Denemark wrote:
> > On Thu, Apr 06, 2017 at 18:14:07 +0200, Kashyap Chamarthy wrote:
> > > [Filed this bug -- https://bugzilla.redhat.com/show_bug.cgi?id=1439841]
> > > 
> > > Easy reproducer:
> > > 
> > >     $ virsh migrate --verbose --copy-storage-all \
> > >         --p2p --live l2-f25 qemu+ssh://root@devstack-a/system
> > >     error: invalid argument: monitor must not be NULL
> > 
> > This is caused by the TLS migration code and most likely fixed by
> > https://www.redhat.com/archives/libvir-list/2017-April/msg00219.html
> 
> Thanks.  I'll test with your series & report back on that thread.

[Since the above series is pushed, responding here.]

I just built (RPMs) from libvirt Git, which has the above series ("qemu:
Properly reset all migration capabilities").  I was here when I tested it:

    $ git describe
    v3.2.0-80-gbe193c4

I did two tests (same reproducer command-line as above):

(Test-1) Migrate a guest from source to destination:

		 Result: Succeeds (the migrated guest successfully runs on the
	     destination)

(Test-2) Once 'Test-1' finished successfully, and the guest is running
		 successfully on the destination, migrate it back to source:

                 Result: Fails.

          $ virsh migrate --verbose --copy-storage-all \
                --p2p --live l2-f25 qemu+ssh://root@l1-f25/system

          error: operation failed: migration job: is not active

Looking at the source debug log (URLs to complete logs further below), I
see the dreaded "cannot acquire state change lock" error.

[...]
2017-04-10 06:29:23.322+0000: 22676: warning : qemuDomainObjBeginJobInternal:3607 : Cannot start job (modify, none) for domain l2-f25; current job is (none, migration out) owned by (0 <null>
, 16698 remoteDispatchDomainMigratePerform3Params) for (0s, 96s)
2017-04-10 06:29:23.322+0000: 22676: error : qemuDomainObjBeginJobInternal:3619 : Timed out during operation: cannot acquire state change lock (held by
+remoteDispatchDomainMigratePerform3Params)
[...]
2017-04-10 06:31:57.525+0000: 16698: error : qemuMigrationCheckJobStatus:1420 : operation failed: migration job: is not active
2017-04-10 06:31:57.525+0000: 16698: debug : qemuMigrationCancelDriveMirror:785 : Cancelling drive mirrors for domain l2-f25
[...]
2017-04-10 06:31:57.538+0000: 16698: debug : qemuMigrationDriveMirrorCancelled:700 : All disk mirrors are gone
2017-04-10 06:31:57.538+0000: 16698: debug : doPeer2PeerMigrate3:4428 : Finish3 0x7f39d801e3d0 ret=-1
2017-04-10 06:31:57.539+0000: 16698: debug : qemuDomainObjEnterRemote:3918 : Entering remote (vm=0x563b26a60e60 name=l2-f25)
2017-04-10 06:31:57.783+0000: 16698: error : virNetClientProgramDispatchError:177 : migration successfully aborted
[...]

Complete libvirt debug logs (with appropriate log filters):

- libvirtd debug log of source host (after a failed migration from
  destination to source) --
  https://bugzilla.redhat.com/attachment.cgi?id=1270407

- libvirtd debug log of destination host (after a failed migration from
  destination to source) --
  https://bugzilla.redhat.com/attachment.cgi?id=1270406

-- 
/kashyap




More information about the libvir-list mailing list