[libvirt] blkio cgroup
Dominik Klein
dk at in-telegence.net
Mon Feb 21 08:30:32 UTC 2011
On 02/21/2011 09:19 AM, Dominik Klein wrote:
>>> - Is it possible to capture 10-15 second blktrace on your underlying
>>> physical device. That should give me some idea what's happening.
>>
>> Will do, read on.
>
> Just realized I missed this one ... Had better done it right away.
>
> So here goes.
>
> Setup as in first email. 8 Machines, 2 important, 6 not important ones
> with a throttle of ~10M. group_isolation=1. Each vm dd'ing zeroes.
>
> blktrace -d /dev/sdb -w 30
> === sdb ===
> CPU 0: 4769 events, 224 KiB data
> CPU 1: 28079 events, 1317 KiB data
> CPU 2: 1179 events, 56 KiB data
> CPU 3: 5529 events, 260 KiB data
> CPU 4: 295 events, 14 KiB data
> CPU 5: 649 events, 31 KiB data
> CPU 6: 185 events, 9 KiB data
> CPU 7: 180 events, 9 KiB data
> CPU 8: 17 events, 1 KiB data
> CPU 9: 12 events, 1 KiB data
> CPU 10: 6 events, 1 KiB data
> CPU 11: 55 events, 3 KiB data
> CPU 12: 28005 events, 1313 KiB data
> CPU 13: 1542 events, 73 KiB data
> CPU 14: 4814 events, 226 KiB data
> CPU 15: 389 events, 19 KiB data
> CPU 16: 1545 events, 73 KiB data
> CPU 17: 119 events, 6 KiB data
> CPU 18: 3019 events, 142 KiB data
> CPU 19: 62 events, 3 KiB data
> CPU 20: 800 events, 38 KiB data
> CPU 21: 17 events, 1 KiB data
> CPU 22: 243 events, 12 KiB data
> CPU 23: 1 events, 1 KiB data
> Total: 81511 events (dropped 0), 3822 KiB data
>
> Very constant 296 blocked processes in vmstat during this run. But...
> apparently no data is written at all (see "bo" column).
>
> vmstat 2
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy
> id wa
> 0 296 0 125254224 21432 142016 0 0 16 633 181 331 0
> 0 93 7
> 0 296 0 125253728 21432 142016 0 0 0 0 17115 33794
> 0 0 25 75
> 0 296 0 125254112 21432 142016 0 0 0 0 17084 33721
> 0 0 25 74
> 1 296 0 125254352 21440 142012 0 0 0 18 17047 33736
> 0 0 25 75
> 0 296 0 125304224 21440 131060 0 0 0 0 17630 33989
> 0 1 23 76
> 1 296 0 125306496 21440 130260 0 0 0 0 16810 33401
> 0 0 20 80
> 4 296 0 125307208 21440 129856 0 0 0 0 17169 33744
> 0 0 26 74
> 0 296 0 125307496 21448 129508 0 0 0 14 17105 33650
> 0 0 36 64
> 0 296 0 125307712 21452 129672 0 0 2 1340 17117 33674
> 0 0 22 78
> 1 296 0 125307752 21452 129520 0 0 0 0 16875 33438
> 0 0 29 70
> 1 296 0 125307776 21452 129520 0 0 0 0 16959 33560
> 0 0 21 79
> 1 296 0 125307792 21460 129520 0 0 0 12 16700 33239
> 0 0 15 85
> 1 296 0 125307808 21460 129520 0 0 0 0 16750 33274
> 0 0 25 74
> 1 296 0 125307808 21460 129520 0 0 0 0 17020 33601
> 0 0 26 74
> 1 296 0 125308272 21460 129520 0 0 0 0 17080 33616
> 0 0 20 80
> 1 296 0 125308408 21460 129520 0 0 0 0 16428 32972
> 0 0 42 58
> 1 296 0 125308016 21460 129524 0 0 0 0 17021 33624
> 0 0 22 77
While we're on that ... It is impossible for me now to recover from this
state without pulling the power plug.
On the VMs console I see messages like
INFO: task (kjournald|flush-254|dd|rs:main|...) blocked for more than
120 seconds.
While the ssh sessions through which the dd was started seem intact
(pressing enter gives a new line), it is impossible to cancel the dd
command. Logging in on the VMs console also is impossible.
Opening a new ssh session to the host also does not work. Killing the
qemu-kvm processes from a session opened earlier leaves zomby processes.
Moving the VMs back to the root cgroup makes no difference either.
Regards
Dominik
More information about the libvir-list
mailing list