[libvirt-users] "virsh list --all" is intermittently causing a shutdown client to boot?

Peter Kukla fruviad at yahoo.com
Thu Dec 13 13:34:28 UTC 2018


 The /etc/libvirt/libvirtd.conf file was modified to include the following:
   log_level = 1   log_filters="3:remote 4:event 3:json 3:rpc"   log_outputs="1:file:/var/log/libvirt/libvirtd.log"
The service was then restarted, and I ran "tail -f" against the log to determine exactly which entries were being created as a result of specific actions on my part.
I then reproduced the error.
I ran "virsh console proxy2" expecting it to tell me whether the VM was running, and that resulted in the VM booting.  (So it's not just "list --all" that's causing the unexpected boots.)  That resulted in the following chunk of data being written to the log:

2018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:47:57.361+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:47:57.373+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 18:47:57.373+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'tap3'2018-12-12 18:47:57.374+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'net' for device with sysname 'tap3'2018-12-12 18:47:57.374+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'INTERFACE' value 'tap3' for device with sysname 'tap3'2018-12-12 18:47:57.374+0000: 21323: debug : udevGetDeviceSysfsAttr:217 : Found sysfs attribute 'address' value 'fe:52:16:f6:1a:8d' for device with sysname 'tap3'2018-12-12 18:47:57.374+0000: 21323: debug : udevGetDeviceSysfsAttr:217 : Found sysfs attribute 'addr_len' value '6' for device with sysname 'tap3'2018-12-12 18:47:57.374+0000: 21323: debug : virFileClose:110 : Closed fd 292018-12-12 18:47:57.375+0000: 21323: debug : virPCIGetDeviceAddressFromSysfsLink:2624 : '/sys/class/net/tap3/device' does not exist2018-12-12 18:47:57.375+0000: 21323: debug : virFileClose:110 : Closed fd 292018-12-12 18:47:57.375+0000: 21323: info : virObjectNew:254 : OBJECT_NEW: obj=0x7f39f8001f70 classname=virNodeDeviceObj2018-12-12 18:47:57.375+0000: 21323: info : virObjectRef:388 : OBJECT_REF: obj=0x7f39f8001f702018-12-12 18:47:57.375+0000: 21323: info : virObjectNew:254 : OBJECT_NEW: obj=0x7f39f8002030 classname=virNodeDeviceEventLifecycle2018-12-12 18:47:57.375+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f8001f702018-12-12 18:47:57.375+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f80020302018-12-12 18:47:57.375+0000: 21323: info : virObjectUnref:352 : OBJECT_DISPOSE: obj=0x7f39f80020302018-12-12 18:47:57.375+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'rx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'queues' for device with sysname 'rx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'rx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001900 /sys/devices/virtual/net/tap3/queues/rx-02018-12-12 18:47:57.375+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'tx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'queues' for device with sysname 'tx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'tx-0'2018-12-12 18:47:57.375+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001900 /sys/devices/virtual/net/tap3/queues/tx-02018-12-12 18:47:57.460+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:47:57.460+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:47:57.464+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'change'2018-12-12 18:47:57.464+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'kvm'2018-12-12 18:47:57.464+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'misc' for device with sysname 'kvm'2018-12-12 18:47:57.464+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'kvm'2018-12-12 18:47:57.464+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001900 /sys/devices/virtual/misc/kvm


I then ran "virsh shutdown proxy2" to stop the VM, resulting in the following log output:
2018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:49:24.556+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:49:24.561+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'remove'2018-12-12 18:49:24.561+0000: 21323: debug : udevRemoveOneDevice:1409 : Failed to find device to remove that has udev name '/sys/devices/virtual/net/tap3/queues/tx-0'2018-12-12 18:49:24.561+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'remove'2018-12-12 18:49:24.562+0000: 21323: debug : udevRemoveOneDevice:1409 : Failed to find device to remove that has udev name '/sys/devices/virtual/net/tap3/queues/rx-0'2018-12-12 18:49:24.562+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'remove'2018-12-12 18:49:24.562+0000: 21323: info : virObjectRef:388 : OBJECT_REF: obj=0x7f39f8001f702018-12-12 18:49:24.562+0000: 21323: info : virObjectNew:254 : OBJECT_NEW: obj=0x7f39f80016f0 classname=virNodeDeviceEventLifecycle2018-12-12 18:49:24.562+0000: 21323: debug : udevRemoveOneDevice:1419 : Removing device 'net_tap3_fe_52_16_f6_1a_8d' with sysfs path '/sys/devices/virtual/net/tap3'2018-12-12 18:49:24.562+0000: 21323: info : virObjectRef:388 : OBJECT_REF: obj=0x7f39f8001f702018-12-12 18:49:24.562+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f8001f702018-12-12 18:49:24.563+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f8001f702018-12-12 18:49:24.563+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f8001f702018-12-12 18:49:24.563+0000: 21323: info : virObjectUnref:352 : OBJECT_DISPOSE: obj=0x7f39f8001f702018-12-12 18:49:24.563+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f80016f02018-12-12 18:49:24.563+0000: 21323: info : virObjectUnref:352 : OBJECT_DISPOSE: obj=0x7f39f80016f02018-12-12 18:49:24.633+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 18:49:24.633+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 18:49:24.646+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'change'2018-12-12 18:49:24.646+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'kvm'2018-12-12 18:49:24.648+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'misc' for device with sysname 'kvm'2018-12-12 18:49:24.648+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'kvm'2018-12-12 18:49:24.648+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001560 /sys/devices/virtual/misc/kvm
I then ran "watch virsh list --all" for 10 minutes, and it continuously showed the VM in a "shut off" state.  Next I ran "virsh console proxy2" and "virsh list --all" immediately after killing the "watch" process, and it dumped the following output to the screen:
[pkukla at vs2 ~]$ virsh console proxy2error: The domain is not running
[pkukla at vs2 ~]$ virsh list --all Id    Name                           State---------------------------------------------------- -     proxy2                         shut off -     proxy2-clone                   shut off

Then I waited ~2-3 minutes and ran "virsh list --all" again, and it caused the VM to boot:

[pkukla at vs2 ~]$ virsh list --all Id    Name                           State---------------------------------------------------- 1     proxy2                         running -     proxy2-clone                   shut off

The debug output from the log corresponding to this boot sequence:

2018-12-12 19:03:53.712+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 19:03:53.712+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 19:03:53.712+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 19:03:53.712+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 19:03:53.713+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 19:03:53.713+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 19:03:53.723+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 19:03:53.723+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'tap3'2018-12-12 19:03:53.724+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'net' for device with sysname 'tap3'2018-12-12 19:03:53.724+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'INTERFACE' value 'tap3' for device with sysname 'tap3'2018-12-12 19:03:53.724+0000: 21323: debug : udevGetDeviceSysfsAttr:217 : Found sysfs attribute 'address' value 'fe:c9:eb:76:37:d8' for device with sysname 'tap3'2018-12-12 19:03:53.724+0000: 21323: debug : udevGetDeviceSysfsAttr:217 : Found sysfs attribute 'addr_len' value '6' for device with sysname 'tap3'2018-12-12 19:03:53.724+0000: 21323: debug : virFileClose:110 : Closed fd 292018-12-12 19:03:53.725+0000: 21323: debug : virPCIGetDeviceAddressFromSysfsLink:2624 : '/sys/class/net/tap3/device' does not exist2018-12-12 19:03:53.725+0000: 21323: debug : virFileClose:110 : Closed fd 292018-12-12 19:03:53.725+0000: 21323: info : virObjectNew:254 : OBJECT_NEW: obj=0x7f39f8001760 classname=virNodeDeviceObj2018-12-12 19:03:53.725+0000: 21323: info : virObjectRef:388 : OBJECT_REF: obj=0x7f39f80017602018-12-12 19:03:53.725+0000: 21323: info : virObjectNew:254 : OBJECT_NEW: obj=0x7f39f80017d0 classname=virNodeDeviceEventLifecycle2018-12-12 19:03:53.725+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f80017602018-12-12 19:03:53.725+0000: 21323: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f39f80017d02018-12-12 19:03:53.725+0000: 21323: info : virObjectUnref:352 : OBJECT_DISPOSE: obj=0x7f39f80017d02018-12-12 19:03:53.725+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'tx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'queues' for device with sysname 'tx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'tx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001f60 /sys/devices/virtual/net/tap3/queues/tx-02018-12-12 19:03:53.725+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'add'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'rx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'queues' for device with sysname 'rx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'rx-0'2018-12-12 19:03:53.725+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001f60 /sys/devices/virtual/net/tap3/queues/rx-02018-12-12 19:03:53.800+0000: 21264: debug : virNetlinkEventCallback:707 : dispatching to max 0 clients, called from event watch 72018-12-12 19:03:53.800+0000: 21264: debug : virNetlinkEventCallback:720 : event not handled.2018-12-12 19:03:53.804+0000: 21323: debug : udevHandleOneDevice:1677 : udev action: 'change'2018-12-12 19:03:53.804+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'DRIVER' value '<null>' for device with sysname 'kvm'2018-12-12 19:03:53.804+0000: 21323: debug : udevGetDeviceProperty:149 : Found property key 'SUBSYSTEM' value 'misc' for device with sysname 'kvm'2018-12-12 19:03:53.804+0000: 21323: debug : udevGetDeviceType:1349 : Could not determine device type for device with sysfs name 'kvm'2018-12-12 19:03:53.804+0000: 21323: debug : udevAddOneDevice:1541 : Discarding device -1 0x7f39f8001f60 /sys/devices/virtual/misc/kvm

I then disabled logging and restarted the service.

I've been doing has been under my personal account ("pkukla") on the server.  The "root" account has production instances, so I have not been working under that account because I don't want to break anything.  In fact, I had only noticed this and reproduced it while working in my personal account, and I originally wasn't sure that it was happening with any other user accounts.
While collecting this data, however, I am reasonably sure that I managed to boot one of the VMs in the "root" account with this bug.  I didn't monitor the VM in the same way that I've been monitoring the bug in my own account, but a server that shouldn't have been booted did boot around the time that I was working on the server as "root", and I've confirmed with the other people with the appropriate privileges to login as "root", and none of them started the VM.  The VM in the "root" userspace that was booted was the "slo_redmine" VM.  I do not know whether it was booted during the time that logging was enabled, so maybe that is reflected in the log, also?
If I *did* managed to repro the bug while working as "root", then this implies something else:  This is apparently only affecting VM's in the user's on environment.  For example, I've been reproducing the trigger conditions frequently over the past few days to diagnose this, but all of my testing as user "pkukla" does NOT seem to have affected the VMs in the "root" user space.  I'm not willing to guarantee that the effects of this are only confined to a user's own VMs because I haven't exhaustively tested the theory, but it seems likely that's the case.
The full log has been provided to you as a link in another email.
Please let me know if you have any questions, and thanks again for looking at this!

Peter Kukla

    On Wednesday, December 12, 2018, 2:19:10 AM EST, Jiri Denemark <jdenemar at redhat.com> wrote:  
 
 On Tue, Dec 11, 2018 at 14:39:40 +0000, Peter Kukla wrote:
> Hello,
> I have a CentOS 7 client running on a CentOS 7 server.  "virsh --version" reports that 3.9.0 is being used.
> I'm issuing a "virsh shutdown" command to shutdown the client.  I then confirm that the client has actually shutdown using "virsh list --all".
> Lately I've been seeing instances where I shutdown the client, turn away to work on something else, and then turn back to find that the client has booted again.
> Initially I thought I was mistaken, and that I somehow hadn't properly shutdown the client, but further experimentation shows that's definitely not the case.  I've been able to reproduce this repeatedly using the following steps.
> 1.  Run "virsh shutdown CLIENT"2.  Wait ~30-60 seconds3.  Run "virsh list --all; virsh console CLIENT"
> If I run "virsh console CLIENT" on its own when the client is in a shutdown state, I get the expected "error: The domain is not running" message.  If I omit step #2 and run step #3 immediately after shutting down the client, virsh also complains that the client's not started and refuses to open a console.
> However, if I add a pause -- step 2, above -- between the shutdown and the "list --all" commands, then "list --all" shows that the client is running, and a console is opened to the VM.
> I have confirmed that the client boot is definitely being started by the "list --all" command (and not being started by something else while I'm not looking) by combining the "list --all" and "console" commands in step 3.  This is ensuring that the console is started as soon as the "list --all" is complete, and the resulting screen output shows the BIOS boot options quickly followed by the kernel boot messages being dumped to the console.  I can watch the entire boot process from start to finish.
> That pause between the shutdown and the "list --all" is critical.  I've shutdown the client and then immediately followed it up with the command:
>         watch virsh list --all
> ...which will run the "list --all" command every 2 seconds.  This will run indefinitely without triggering the boot, which eliminates the possibility of the boot occurring due to the amount of time since the shutdown.
> It may be that other virsh commands are also causing the client to boot, but thus far I've only reproduced it using the list --all command.
> Can anyone tell me why this is happening?  A bug?  Something in my config?  Something else?
> Thanks in advance,

This is indeed very strange and we would need to look at the logs to
check what's going on. Could you follow the steps in
https://wiki.libvirt.org/page/DebugLogs#Runtime_setting to enable debug
logs for libvirtd and share the (compressed) libvirtd.log somewhere?

Jirka
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20181213/bb861aa2/attachment.htm>


More information about the libvirt-users mailing list