[libvirt] Question about verifying same uid:gid in src and dst for live migration

Wed May 9 05:45:53 UTC 2018

Hi,

When I do live migration using virsh command line based on NFS shared 
storage between two systems
having the same security mechanism and having the same kvm/qemu/libvirt 
version, I encounter the
following error:

debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1524893525, "microseconds": 522686},
"event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "nospace": false, "node-name": "#block120",
"reason": "Permission denied", "operation": "write", "action": "report"}}]
...
error: internal error: qemu unexpectedly closed the monitor: 
qemu-system-x86_64: load of migration failed: Input/output error
...

According to the "Permission denied" && "write" information, I find the 
below 2 ways can fix this error:
- Change the mode of guest's .qcow2 file from 644 to 646
- Keep qemu's uid the same one between src host and dst host (They are 
not same before I change them)

My environment and test cases:

src:~ # id qemu
uid=473(qemu) gid=476(qemu) groups=488(kvm),476(qemu)
dst:~ # id qemu
uid=467(qemu) gid=470(qemu) groups=488(kvm),470(qemu)

In /etc/libvirt/qemu.conf, my confifuration is the following default:
# The user for QEMU processes run by the system instance. It can be
# specified as a user name or as a user id. The qemu driver will try to
# parse this value first as a name and then, if the name doesn't exist,
# as a user id.
#
# Since a sequence of digits is a valid user name, a leading plus sign
# can be used to ensure that a user id will not be interpreted as a user
# name.
#
# Some examples of valid values are:
#
#       user = "qemu"   # A user named "qemu"
#       user = "+0"     # Super user (uid=0)
#       user = "100"    # A user named "100" or a user with uid=100
#
#user = "root"

# The group for QEMU processes run by the system instance. It can be
# specified in a similar way to user.
#group = "root"

# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
#dynamic_ownership = 1

On the src, do live migration "virsh -d 0 migrate --live vm-name qemu+ssh://dst-ip/system":
- after a vm is defined, user:group=root:root
- after a vm is started, user:group=qemu:qemu
- after migration begins, user:group=467:470 (that is dst's uid:gid)
- after migration succeeds, user:group=467:470 (that is dst's uid:gid)
- after a vm is destroyed, user:group=root:root (back to the src's)
- after migration fails, user:group=467:470; the vm is still running in src but the file inside the guest
   becomes read-only even its mode is 644

Other notes:
- I tried libvirt v3.3.0 && v4.0.0 to do the same test, both can see 
such error.

After confirming that keeping qemu's uid identical between src host and 
dst host can fix such issue,
my question is whether a fix in libvirt should be pursued or just 
document the requirement for same
uid:gid across host systems in a migration cluster is ok?
BTW, if a fix is needed, maybe the pre-migration checks in libvirt could 
determine different uid and/or gid
and fail sooner with a better/explicit error like "Should keep the qemu 
uid in src and dst be the same for
migration, or elsemigration will fail"?

Does anyone have noticed this and could give some suggestions? Thanks a lot!

Have a nice day, thanks again
Fei

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20180509/3047e375/attachment-0001.htm>