[libvirt] [BUG?] qemu was accidently killed by libvirtd after we restart libvirtd by service command
Laine Stump
laine at laine.org
Wed Jan 6 19:13:37 UTC 2016
On 12/17/2015 10:45 AM, Hailiang Zhang wrote:
> Hi all,
>
> We came across a problem, qemu was accidently killed by libvirtd after
> we restart libvirtd by service command.
> Steps to reproduce:
> 1) Attach net device to VM by 'virsh attach-device' command
> 2) Detach the net device that added before by 'virsh detach-device'
> command
> 3) Do the above two steps repeatedly, and at the same time, we restart
> libvirtd daemon.
> And then we have chance to find that, after libvirtd is restarted at
> one time, qemu is killed accidently.
>
> The libvirt log message is:
> [2015-12-17 11:34:36]: libvirtd : 216046: info : libvirt version: 1.2.17
> [2015-12-17 11:34:36]: libvirtd : 216046: warning :
> virDriverLoadModule:65 : Module
> /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not
> accessible
> [2015-12-17 11:34:36]: libvirtd : 216046: info :
> virNetlinkEventServiceStart:662 : starting netlink event service with
> protocol 0
> [2015-12-17 11:34:36]: libvirtd : 216046: info :
> virNetlinkEventServiceStart:712 : set netlink socket big buffer size
> [2015-12-17 11:34:36]: libvirtd : 216046: info :
> virNetlinkEventServiceStart:662 : starting netlink event service with
> protocol 15
> [2015-12-17 11:34:36]: libvirtd : 216046: info :
> virNetlinkEventServiceStart:712 : set netlink socket big buffer size
> [2015-12-17 11:34:36]: libvirtd : 216079: error :
> dnsmasqCapsRefreshInternal:741 : Cannot check dnsmasq binary
> /usr/sbin/dnsmasq: No such file or directory
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> networkReloadFirewallRules:1861 : Reloading iptables rules
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> networkRefreshDaemons:1830 : Refreshing network daemons
> [2015-12-17 11:34:36]: libvirtd : 216079: error :
> virNodeSuspendSupportsTarget:332 : internal error: Cannot probe for
> supported suspend types
> [2015-12-17 11:34:36]: libvirtd : 216079: warning :
> virQEMUCapsInit:1037 : Failed to get host power management capabilities
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23039 : Scanning for configs in
> /var/run/libvirt/qemu
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23039 : Scanning for configs in
> /etc/libvirt/qemu
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName
> in /var/lib/libvirt/qemu/snapshot/VMName
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName2
> in /var/lib/libvirt/qemu/snapshot/VMName2
> [2015-12-17 11:34:36]: libvirtd : 216079: info :
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName3
> in /var/lib/libvirt/qemu/snapshot/VMName3
> [2015-12-17 11:34:36]: libvirtd : 216047: info :
> remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by
> client: 127.0.0.1;0
> [2015-12-17 11:34:36]: libvirtd : 216050: info :
> remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by
> client: 127.0.0.1;0
> [2015-12-17 11:34:36]: libvirtd : 216047: info :
> virDomainDetachDevice:8508 : enter virDomainDetachDevice
> domainname=VMName2, xml= <interface type='bridge'>^M
> <source bridge='br-1'/>^M
> <mac address='00:16:3e:76:85:17'/>^M
> <virtualport type='openvswitch' />^M
> <model type='virtio'/>^M
> </interface>^M
> ^M
>
> [2015-12-17 11:34:36]: libvirtd : 216050: info :
> virDomainDetachDevice:8508 : enter virDomainDetachDevice
> domainname=VMName, xml= <interface type='bridge'>^M
> <source bridge='br-1'/>^M
> <mac address='00:16:3e:76:85:15'/>^M
> 2015-12-17 11:34:36]: libvirtd : 216095: error :
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
> command 'netdev_del': Device 'hostnet5' not found
> [2015-12-17 11:34:36]: libvirtd : 216096: error :
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
> command 'netdev_del': Device 'hostnet2' not found
> [2015-12-17 11:34:36]: libvirtd : 216097: error :
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU
> command 'netdev_del': Device 'hostnet1' not found
> [2015-12-17 11:34:37]: libvirtd : 216095: info :
> virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC
> user and group on '/data/suse11_sp3_64_1'
> [2015-12-17 11:34:37]: libvirtd : 216095: info :
> virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on
> '/data/suse11_sp3_64_1' to '0:0'
> [2015-12-17 11:34:37]: libvirtd : 216096: info :
> virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC
> user and group on '/data/suse11_sp3_64_2'
> [2015-12-17 11:34:37]: libvirtd : 216096: info :
> virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on
> '/data/suse11_sp3_64_2' to '0:0'
> [2015-12-17 11:34:37]: libvirtd : 216095: info : qemuProcessStop:5720
> : VM 'VMName' is dying,remove vBMC
> [2015-12-17 11:34:37]: libvirtd : 216095: error : rtev_vBMC_send:209 :
> Failed to remove domain vBMC bind path for
> 33ae1cb0-c451-4620-8cee-1bbddd124b82, errno is 2
> [2015-12-17 11:34:37]: libvirtd : 216050: info :
> remoteDispatchDomainAttachDevice:2855 : Domain is attached a device by
> client: 127.0.0.1;0
> [2015-12-17 11:34:37]: libvirtd : 216050: info :
> virDomainAttachDevice:8390 : enter virDomainAttachDevice
> domainname=VMName, xml=<interface type='bridge'>^M
>
> The VM's qemu log is:
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
> device_del, qmp_cmd_arguments: {"id": "net4"}
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352070, "microseconds": 402473}, "event":
> "DEVICE_DELETED", "data": {"path":
> "/machine/peripheral/net4/virtio-backend"}}
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352070, "microseconds": 402586}, "event":
> "DEVICE_DELETED", "data": {"device": "net4", "path":
> "/machine/peripheral/net4"}}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
> netdev_del, qmp_cmd_arguments: {"id": "hostnet4"}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: qom-list,
> qmp_cmd_arguments: {"path": "/machine/peripheral"}
> ---------------------------------------------- At this time, We
> restart libvirtd just right
> ------------------------------------------------------------
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
> device_del, qmp_cmd_arguments: {"id": "net5"}
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352070, "microseconds": 810030}, "event":
> "DEVICE_DELETED", "data": {"path":
> "/machine/peripheral/net5/virtio-backend"}}
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352070, "microseconds": 810150}, "event":
> "DEVICE_DELETED", "data": {"device": "net5", "path":
> "/machine/peripheral/net5"}}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name:
> netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
> [2015-12-17 11:34:35] send_control_event:225 virtio serial port 2 send
> control message event = 6, value = 0
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name:
> qmp_capabilities
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name:
> query-commands
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: add-fd,
> qmp_cmd_arguments: {"fdset-id": 0, "opaque": "/dev/null"}
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name:
> block-commit, qmp_cmd_arguments: {"device": "bogus"}
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-events
> [2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send
> control message event = 6, value = 1
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-status
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-block
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-chardev
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: qom-list,
> qmp_cmd_arguments: {"path": "/machine/peripheral"}
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name:
> netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
> [2015-12-17 11:34:36]: shutting down
> [2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send
> control message event = 6, value = 0
> qemu: terminating on signal 15 from pid 216046(libvirtd)
> [2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352076, "microseconds": 267427}, "event": "SHUTDOWN"}
> [2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1450352076, "microseconds": 267773}, "event":
> "DEVICE_TRAY_MOVED", "data": {"device": "drive-ide0-0-1", "tray-open":
> true}}
>
> We have analyzed this problem, and thought it was triggered by the
> inconsistent state for the net device
> between qemu and the new libvirtd.
> After libvirtd is started, it will connect to qemu, and updates the
> devices state by command 'qom-list',
> and also it will read the configure information from XML stored.
> The codes call process is:
> qemuProcessReconnect
> ->qemuProcessUpdateDevices
> ->if (qemuDomainUpdateDeviceList(driver, vm,
> QEMU_ASYNC_JOB_NONE) < 0)
> ->if ((tmp = old)) {
> while (*tmp) {
> if (!virStringArrayHasString(priv->qemuDevices, *tmp) &&
> virDomainDefFindDevice(vm->def, *tmp, &dev, false) ==
> 0 &&
> qemuDomainRemoveDevice(driver, vm, &dev) < 0) {
> goto cleanup; --->If we fail here, it will return -1;
> and it will call
> qemuProcessStop() in
> qemuProcessReconnect to kill VM.
> }
> tmp++;
> }
> }
> It is a little hard to reproduce this problem, we can reproduce it in
> a special way:
> 1) Attach a net device by virsh command
> 2) Stop libvirtd
> 3) Detach the net device by directly send qmp command to QEMU though
> the monitor, just like:
> # ./qmp-shell /var/lib/libvirt/qemu/redhat.monitor
> Welcome to the QMP low-level shell!
> Connected to QEMU 2.3.0
>
> (QEMU) qom-list path=/machine/peripheral
> {u'return': [{u'type': u'child<virtio-net-pci>', u'name': u'net1'},
> {u'type': u'child<cirrus-vga>', u'name': u'video0'}, {u'type':
> u'child<usb-tablet>', u'name': u'input0'}, {u'type':
> u'child<isa-serial>', u'name': u'serial0'}, {u'type':
> u'child<virtio-net-pci>', u'name': u'net0'}, {u'type':
> u'child<scsi-hd>', u'name': u'scsi0-0-0-0'}, {u'type':
> u'child<virtio-scsi-pci>', u'name': u'scsi0'}, {u'type':
> u'child<piix3-usb-uhci>', u'name': u'usb'}, {u'type': u'string',
> u'name': u'type'}]}
> (QEMU) device_del id=net1
> {u'return': {}}
> (QEMU) netdev_del id=hostnet1
> {u'return': {}}
> 4) Restart libvirtd daemon and then you will found that qemu is killed
> by libvirtd.
>
> The main reason for the inconsistent state is, we accidently restarted
> libvirtd before it finished treating
> 'detach device' work.
> It seems a little violent to kill such VM directly.
> Can we just do the unfinished cleanup work for the VM after libvirtd
> is starting ?
> (IMHO, we can't ensure not to restart libvirtd while it is treating
> the detach device work.)
Thanks for the detailed debugging and analysis!
While I don't think that libvirt should be required to deal properly
with someone going into the qmp shell and detaching a device while
libvirtd is down, it *should* properly handle the case where libvirtd is
restarted while a device detach is in progress (I had thought that
worked properly, but confess I've never actually tried it).
So that we don't lose track of it, can you file a bug for this in the
libvirt upstream bugzilla:
https://libvirt.org/bugs.html
More information about the libvir-list
mailing list