[libvirt-users] VM crash : Failed to terminate process X with SIGKILL: Device or resource busy

Michel Villeneuve Michel.Villeneuve at univ-brest.fr
Fri Apr 22 14:54:43 UTC 2016


Hello

I answer to myself partially, it seems that the problem occurs when the VMs
have timeout to access to their hardware,in
my case the VM are on an NFS shared storage. I know that sometime the NFS
server has timeout and after come back.

In the previous version of libvirt,qemu, system centos 6 the result was the
VM fell with a file system in read only.

I tried to repeat  the error in switching on/off the NFS on the servers .
I got some errors messages like

INFO: task kjournald:360 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.....
[<fffffffff8000a08bd>] wake_bit_funtion +0w0/0x23......
   [<fffffffff8800a0ead>] :jbd:journal_get_write_access+0x22/0x33
;;;;
   [<fffffffff800013ccd>] :ext3:ext3_dirty_inode+0x63/0x7b

If I try action on VM ( suspend shutdown , destroy )
the process becomes defunct and I can't do anything else

qemu      8142     1 17 10:35 ?        00:48:51
[qemu-system-x86] <defunct>
root      8151     2  0 10:35 ?        00:00:40
[vhost-8142]w
root      8153     2  0 10:35 ?        00:00:00
[kvm-pit/8142]

The major problem isn't that we see an defunct process with the ps command.
It 's that we see
a domain with libvirt in an anormal state and we can't stop it, restart it.....

So my question ;
Is there a way to hide, to don't list process in this state across the
libvirt, to have the possibility
to reuse the VM without rebooting the hypervisor.
Can libvirt just list active process ?
Or an elegant to remove pending process ?

Is the choice for the disk performance can have an inpact to qemu ( none,
writethrough, writeback ...)
Actually I 'm using Fedora 23 , libvirt 1.3.3 I will try with an older
version in centos 7.2 to see what happen and
if I got the same errors.

Thanks

  Michel Villeneuve <Michel.Villeneuve at univ-brest.fr> a écrit :

> Hello
>
> since I changed my hypervisor from centos 6.3 to Fedora-23
> I had many problems with differents VMs.
>
> Very often once by day ( I have about 150 VM ),
> some VMs crash or  freeze indifferently and I got
> messages like this on console.
>
>    [<fffffffff8000a08bd>] wake_bit_funtion +0w0/0x23
> .....
>    [<fffffffff8800a0ead>] :jbd:journal_get_write_access+0x22/0x33
> ;;;;
>    [<fffffffff800013ccd>] :ext3:ext3_dirty_inode+0x63/0x7b
>
> And so the VM is crashed and can't be accessed but often the ping
> command can respond
>
> Before my migration I never meet these problems, It 's strictly the same
> VMs between the  6.3 and
> fedora23 release. I just changed the parameter :
>
>   <type arch='x86_64' machine='rhel-6.0.0'>hvm</type>
> to
>   <type arch='x86_64' machine='pc-i440fx-2.4'>hvm</type>
> and do a virsh define
>
> I tried some other version of the parameters without success.
>
> and I also added a lockd manager in fedora23.
>
> Before and I used libvirt.0.9.5 or 1.0.2 on centos 6.03 without lockd
>
> A major problem with these crashs is that  the VMs couldn't be destroyed
> by the virsh command, the qemu process is notified as defunct by the ps
> command
> If I try a virsh destroy
> I get in log file
>
> 2016-04-20 20:32:47.318+0000: 5541: info : virEventPollRunOnce:641 :
> EVENT_POLL_RUN: nhandles=11 timeout=-1
> 2016-04-20 20:32:55.028+0000: 5567: debug : virProcessKillPainfully:368
> : Timed out waiting after SIGTERM to process 8720, sending SIGKILL
> 2016-04-20 20:33:00.032+0000: 5567: error : virProcessKillPainfully:398
> : Failed to terminate process 8720 with SIGKILL: Périphérique ou
> ressource occupé
>
> or on console
>
> Failed to terminate process xxx with SIGTERM: Device or resource busy
> and the VM is still in the list in a "Stopping" state
>
> Result of ps ps on the the qemu process attached to the VM
>
> qemu      8720     1  0 avril20 ?      00:07:16
[qemu-system-x86]
> <defunct>
> root      8733     2  0 avril20 ?      00:00:01
[vhost-8720]
> root      8735     2  0 avril20 ?      00:00:00
[kvm-pit/8720]
>
> libvirtd seems to be in an anormal state. If I restart the libvirtd
> the virsh command just hang and never remove the VM from the list.
>
> The only seems to reboot the hypervisor but all the VMs in production
too.
>
> Is there a way to remove the process qemu in defunct state without
> reboot the hypervisor.
> Perhaps the probleme come from the VM parameters which have been created
> on 6.3 Centos and libvirt <1.0 version. Do I need to convert some other
> parameters ?
>
> I 'am trying to put a new hypervisor in aFailed to terminate process X
> with SIGKILL: Device or resource busy version level less than fedora23
> perhaps a Centos 7.2 to
> see what 's happen and if there is a problem like mine.
>
> Thanks
>
> PS:
> I put the log_level to 1
> ----------------------information in logfile
> [root at kvmserver6 ~]# ls -al /var/lib/libvirt/qemu/domain-1-TEST-VM-A
> total 8
> drwxr-x---   2 qemu qemu 4096 20 avril 22:01 .
> drwxr-x--x. 18 qemu qemu 4096 20 avril 22:01 ..
> srwxrwxr-x   1 qemu qemu    0 20 avril 22:01 monitor.sock
> /var/lib/libvirt/qemu/channel/target/domain-1-TEST-VM-A/
>
> [root at kvmserver6 ~]# cat /var/log/libvirt/qemu/TEST-VM-A.log
> 2016-04-20 20:01:36.714+0000: starting up libvirt version: 1.3.3,
> package: 1.fc23 (Unknown, 2016-04-06-15:17:39, thinkpad2), qemu version:
> 2.4.1 (qemu-2.4.1-8.fc23), hostname: kvmserver6.univ-brest.fr
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name TEST-VM-A,debug-threads=on
> -S -machine pc-i440fx-2.4,accel=kvm,usb=off -m 1024 -realtime mlock=off
> -smp 1,sockets=1,cores=1,threads=1 -uuid
> 1e4c27e4-123e-719a-9fdf-f783d34cbb40 -no-user-config -nodefaults
> -chardev
>
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-TEST-VM-A/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/var/lib/libvirt/images/POOL_PROD4/TEST-VM-A.img,format=raw,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive if=none,id=drive-ide0-1-0,readonly=on -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:11:11,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0,password -k fr
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
ES1370,id=sound0,bus=pci.0,addr=0x4 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg
> timestamp=on
> char device redirected to /dev/pts/1 (label charserial0)
> qemu: terminating on signal 15 from pid 5541
>   Michel Villeneuve
> Tel 02 98 01 71 61
-- 
Michel Villeneuve
Tel 02 98 01 71 61
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20160422/1f08eb6d/attachment.htm>


More information about the libvirt-users mailing list