[libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats

Ján Tomko jtomko at redhat.com
Thu Jan 18 07:25:30 UTC 2018


On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
>Hello libvirt-users list,
>
>We're catching the same bug since 3.4.0 version (3.3.0 works OK).
>So, we have process that is permanently connected to libvirtd via socket
>and it is collecting stats, listening to events and control the VPSes.
>
>When we try to 'shutdown' a number of VPSes we often catch the bug. One of
>VPSes sticks in 'in shutdown' state, no related 'qemu' process is present,
>and there is the next error in the log:
>
>Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000:
>20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data
>Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>20441: error : virFileReadAll:1420 : Failed to open file
>'/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
>No such file or directory
>Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>20441: error : virCgroupGetValueStr:844 : Unable to read from
>'/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
>No such file or directory
>Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu
>account: Operation not permitted
>Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000:
>20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job
>(destroy, none) for domain DOMAIN1; current job is (query, none) owned by
>(20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s)
>Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000:
>20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during
>operation: cannot acquire state change lock (held by
>remoteDispatchConnectGetAllDomainStats)
>
>I think only the last line matters.
>The bug is highly reproducible. We can easily catch it even when we call
>multiple 'virsh shutdown' in shell one by one.
>
>When we shutdown the process connected to the socket - everything become OK
>and the bug is gone.
>
>The system is used is Gentoo Linux, tried all modern versions of libvirt
>(3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git))
>and they have this bug. 3.3.0 works OK.
>

I don't see anything obvious stats related in the diff between 3.3.0 and
3.4.0. We have added reporting of the shutdown reason, but that's just
parsing one more JSON reply we previously ignored.

Can you try running 'git bisect' to pinpoint the exact commit that
caused this issue?

Jan

>Thanks for any help in advance.
>We can send any additional info if needed.
>
>~Serhii

>_______________________________________________
>libvirt-users mailing list
>libvirt-users at redhat.com
>https://www.redhat.com/mailman/listinfo/libvirt-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20180118/44725e17/attachment.sig>


More information about the libvirt-users mailing list