[libvirt] [BUG?] qemu was accidently killed by libvirtd after we restart libvirtd by service command

Laine Stump laine at laine.org
Wed Jan 6 19:13:37 UTC 2016


On 12/17/2015 10:45 AM, Hailiang Zhang wrote:
> Hi all,
>
> We came across a problem, qemu was accidently killed by libvirtd after 
> we restart libvirtd by service command.
> Steps to reproduce:
> 1) Attach net device to VM by 'virsh attach-device' command
> 2) Detach the net device that added before by 'virsh detach-device' 
> command
> 3) Do the above two steps repeatedly, and at the same time, we restart 
> libvirtd daemon.
> And then we have chance to find that, after libvirtd is restarted at 
> one time, qemu is killed accidently.
>
> The libvirt log message is:
> [2015-12-17 11:34:36]: libvirtd : 216046: info : libvirt version: 1.2.17
> [2015-12-17 11:34:36]: libvirtd : 216046: warning : 
> virDriverLoadModule:65 : Module 
> /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not 
> accessible
> [2015-12-17 11:34:36]: libvirtd : 216046: info : 
> virNetlinkEventServiceStart:662 : starting netlink event service with 
> protocol 0
> [2015-12-17 11:34:36]: libvirtd : 216046: info : 
> virNetlinkEventServiceStart:712 : set netlink socket big buffer size
> [2015-12-17 11:34:36]: libvirtd : 216046: info : 
> virNetlinkEventServiceStart:662 : starting netlink event service with 
> protocol 15
> [2015-12-17 11:34:36]: libvirtd : 216046: info : 
> virNetlinkEventServiceStart:712 : set netlink socket big buffer size
> [2015-12-17 11:34:36]: libvirtd : 216079: error : 
> dnsmasqCapsRefreshInternal:741 : Cannot check dnsmasq binary 
> /usr/sbin/dnsmasq: No such file or directory
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> networkReloadFirewallRules:1861 : Reloading iptables rules
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> networkRefreshDaemons:1830 : Refreshing network daemons
> [2015-12-17 11:34:36]: libvirtd : 216079: error : 
> virNodeSuspendSupportsTarget:332 : internal error: Cannot probe for 
> supported suspend types
> [2015-12-17 11:34:36]: libvirtd : 216079: warning : 
> virQEMUCapsInit:1037 : Failed to get host power management capabilities
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23039 : Scanning for configs in 
> /var/run/libvirt/qemu
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23039 : Scanning for configs in 
> /etc/libvirt/qemu
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName2.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> virDomainObjListLoadAllConfigs:23063 : Loading config file 'VMName3.xml'
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName 
> in /var/lib/libvirt/qemu/snapshot/VMName
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName2 
> in /var/lib/libvirt/qemu/snapshot/VMName2
> [2015-12-17 11:34:36]: libvirtd : 216079: info : 
> qemuDomainSnapshotLoad:490 : Scanning for snapshots for domain VMName3 
> in /var/lib/libvirt/qemu/snapshot/VMName3
> [2015-12-17 11:34:36]: libvirtd : 216047: info : 
> remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by 
> client: 127.0.0.1;0
> [2015-12-17 11:34:36]: libvirtd : 216050: info : 
> remoteDispatchDomainDetachDevice:4105 : Domain is detached a device by 
> client: 127.0.0.1;0
> [2015-12-17 11:34:36]: libvirtd : 216047: info : 
> virDomainDetachDevice:8508 : enter virDomainDetachDevice 
> domainname=VMName2, xml= <interface type='bridge'>^M
>   <source bridge='br-1'/>^M
>           <mac address='00:16:3e:76:85:17'/>^M
> <virtualport type='openvswitch' />^M
>       <model type='virtio'/>^M
> </interface>^M
> ^M
>
> [2015-12-17 11:34:36]: libvirtd : 216050: info : 
> virDomainDetachDevice:8508 : enter virDomainDetachDevice 
> domainname=VMName, xml= <interface type='bridge'>^M
>   <source bridge='br-1'/>^M
>           <mac address='00:16:3e:76:85:15'/>^M
> 2015-12-17 11:34:36]: libvirtd : 216095: error : 
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU 
> command 'netdev_del': Device 'hostnet5' not found
> [2015-12-17 11:34:36]: libvirtd : 216096: error : 
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU 
> command 'netdev_del': Device 'hostnet2' not found
> [2015-12-17 11:34:36]: libvirtd : 216097: error : 
> qemuMonitorJSONCheckError:380 : internal error: unable to execute QEMU 
> command 'netdev_del': Device 'hostnet1' not found
> [2015-12-17 11:34:37]: libvirtd : 216095: info : 
> virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC 
> user and group on '/data/suse11_sp3_64_1'
> [2015-12-17 11:34:37]: libvirtd : 216095: info : 
> virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on 
> '/data/suse11_sp3_64_1' to '0:0'
> [2015-12-17 11:34:37]: libvirtd : 216096: info : 
> virSecurityDACRestoreSecurityFileLabelInternal:325 : Restoring DAC 
> user and group on '/data/suse11_sp3_64_2'
> [2015-12-17 11:34:37]: libvirtd : 216096: info : 
> virSecurityDACSetOwnershipInternal:248 : Setting DAC user and group on 
> '/data/suse11_sp3_64_2' to '0:0'
> [2015-12-17 11:34:37]: libvirtd : 216095: info : qemuProcessStop:5720 
> : VM 'VMName' is dying,remove vBMC
> [2015-12-17 11:34:37]: libvirtd : 216095: error : rtev_vBMC_send:209 : 
> Failed to remove domain vBMC bind path for 
> 33ae1cb0-c451-4620-8cee-1bbddd124b82, errno is 2
> [2015-12-17 11:34:37]: libvirtd : 216050: info : 
> remoteDispatchDomainAttachDevice:2855 : Domain is attached a device by 
> client: 127.0.0.1;0
> [2015-12-17 11:34:37]: libvirtd : 216050: info : 
> virDomainAttachDevice:8390 : enter virDomainAttachDevice 
> domainname=VMName, xml=<interface type='bridge'>^M
>
> The VM's qemu log is:
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: 
> device_del, qmp_cmd_arguments: {"id": "net4"}
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status 
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status 
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352070, "microseconds": 402473}, "event": 
> "DEVICE_DELETED", "data": {"path": 
> "/machine/peripheral/net4/virtio-backend"}}
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352070, "microseconds": 402586}, "event": 
> "DEVICE_DELETED", "data": {"device": "net4", "path": 
> "/machine/peripheral/net4"}}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: 
> netdev_del, qmp_cmd_arguments: {"id": "hostnet4"}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: qom-list, 
> qmp_cmd_arguments: {"path": "/machine/peripheral"}
> ---------------------------------------------- At this time, We 
> restart libvirtd just right 
> ------------------------------------------------------------
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: 
> device_del, qmp_cmd_arguments: {"id": "net5"}
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status 
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_set_status:524 virtio-net device status 
> is 1 that means ACKNOWLEDGE
> [2015-12-17 11:34:30] virtio_pci_device_unplugged:966 virtio-net
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352070, "microseconds": 810030}, "event": 
> "DEVICE_DELETED", "data": {"path": 
> "/machine/peripheral/net5/virtio-backend"}}
> [2015-12-17 11:34:30] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352070, "microseconds": 810150}, "event": 
> "DEVICE_DELETED", "data": {"device": "net5", "path": 
> "/machine/peripheral/net5"}}
> [2015-12-17 11:34:30] handle_qmp_command:5115 qmp_cmd_name: 
> netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
> [2015-12-17 11:34:35] send_control_event:225 virtio serial port 2 send 
> control message event = 6, value = 0
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: 
> qmp_capabilities
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: 
> query-commands
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: add-fd, 
> qmp_cmd_arguments: {"fdset-id": 0, "opaque": "/dev/null"}
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: 
> block-commit, qmp_cmd_arguments: {"device": "bogus"}
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-events
> [2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send 
> control message event = 6, value = 1
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-status
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-block
> [2015-12-17 11:34:36] handle_qmp_command:5105 qmp_cmd_name: query-chardev
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: qom-list, 
> qmp_cmd_arguments: {"path": "/machine/peripheral"}
> [2015-12-17 11:34:36] handle_qmp_command:5115 qmp_cmd_name: 
> netdev_del, qmp_cmd_arguments: {"id": "hostnet5"}
> [2015-12-17 11:34:36]: shutting down
> [2015-12-17 11:34:36] send_control_event:225 virtio serial port 2 send 
> control message event = 6, value = 0
> qemu: terminating on signal 15 from pid 216046(libvirtd)
> [2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352076, "microseconds": 267427}, "event": "SHUTDOWN"}
> [2015-12-17 11:34:36] monitor_qapi_event_emit:483 {"timestamp": 
> {"seconds": 1450352076, "microseconds": 267773}, "event": 
> "DEVICE_TRAY_MOVED", "data": {"device": "drive-ide0-0-1", "tray-open": 
> true}}
>
> We have analyzed this problem, and thought it was triggered by the 
> inconsistent state for the net device
> between qemu and the new libvirtd.
> After libvirtd is started, it will connect to qemu, and updates the 
> devices state by command 'qom-list',
> and also it will read the configure information from XML stored.
> The codes call process is:
> qemuProcessReconnect
>   ->qemuProcessUpdateDevices
>        ->if (qemuDomainUpdateDeviceList(driver, vm, 
> QEMU_ASYNC_JOB_NONE) < 0)
>        ->if ((tmp = old)) {
>         while (*tmp) {
>             if (!virStringArrayHasString(priv->qemuDevices, *tmp) &&
>                 virDomainDefFindDevice(vm->def, *tmp, &dev, false) == 
> 0 &&
>                 qemuDomainRemoveDevice(driver, vm, &dev) < 0) {
>                 goto cleanup;  --->If we fail here, it will return -1; 
> and it will call
>                                    qemuProcessStop() in 
> qemuProcessReconnect to kill VM.
>             }
>             tmp++;
>         }
>     }
> It is a little hard to reproduce this problem, we can reproduce it in 
> a special way:
> 1) Attach a net device by virsh command
> 2) Stop libvirtd
> 3) Detach the net device by directly send qmp command to QEMU though 
> the monitor, just like:
> # ./qmp-shell /var/lib/libvirt/qemu/redhat.monitor
> Welcome to the QMP low-level shell!
> Connected to QEMU 2.3.0
>
> (QEMU) qom-list path=/machine/peripheral
> {u'return': [{u'type': u'child<virtio-net-pci>', u'name': u'net1'}, 
> {u'type': u'child<cirrus-vga>', u'name': u'video0'}, {u'type': 
> u'child<usb-tablet>', u'name': u'input0'}, {u'type': 
> u'child<isa-serial>', u'name': u'serial0'}, {u'type': 
> u'child<virtio-net-pci>', u'name': u'net0'}, {u'type': 
> u'child<scsi-hd>', u'name': u'scsi0-0-0-0'}, {u'type': 
> u'child<virtio-scsi-pci>', u'name': u'scsi0'}, {u'type': 
> u'child<piix3-usb-uhci>', u'name': u'usb'}, {u'type': u'string', 
> u'name': u'type'}]}
> (QEMU) device_del id=net1
> {u'return': {}}
> (QEMU) netdev_del id=hostnet1
> {u'return': {}}
> 4) Restart libvirtd daemon and then you will found that qemu is killed 
> by libvirtd.
>
> The main reason for the inconsistent state is, we accidently restarted 
> libvirtd before it finished treating
> 'detach device' work.
> It seems a little violent to kill such VM directly.
> Can we just do the unfinished cleanup work for the VM after libvirtd 
> is starting ?
> (IMHO, we can't ensure not to restart libvirtd while it is treating 
> the detach device work.)

Thanks for the detailed debugging and analysis!

While I don't think that libvirt should be required to deal properly 
with someone going into the qmp shell and detaching a device while 
libvirtd is down, it *should* properly handle the case where libvirtd is 
restarted while a device detach is in progress (I had thought that 
worked properly, but confess I've never actually tried it).

So that we don't lose track of it, can you file a bug for this in the 
libvirt upstream bugzilla:

   https://libvirt.org/bugs.html




More information about the libvir-list mailing list