[libvirt] blkio cgroup

Mon Feb 21 07:36:14 UTC 2011

Dominik,

Would you try "oflag=direct" when you do tests in Guests. And make sure
/sys/block/xxx/queue/iosched/group_isolation is set to 1.

I guess with such setting, your tests should goes well.

Thanks,
Gui

Vivek Goyal wrote:
> On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
>> Hi Vivek
>>
>> I don't know whether you follow the libvirt list, I assume you don't. So
>> I thought I'd forward you an E-Mail involving the blkio controller and a
>> terrible situation arising from using it (maybe in a wrong way).
>>
>> I'd truely appreciate it if you read it and commented on it. Maybe I did
>> something wrong, but maybe also I found a bug in some way.
> 
> Hi Dominik, 
> 
> Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have
> just now subscribed.
> 
> Few questions inline.
> 
>> -------- Original Message --------
>> Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support
>> Date: Fri, 18 Feb 2011 14:42:51 +0100
>> From: Dominik Klein <dk at in-telegence.net>
>> To: libvir-list at redhat.com
>>
>> Hi
>>
>> back with some testing results.
>>
>>>> how about the start Guest with option "cache=none" to bypass pagecache?
>>>> This should help i think.
>>> I will read up on where to set that and give it a try. Thanks for the hint.
>> So here's what I did and found out:
>>
>> The host system has 2 12 core CPUs and 128 GB of Ram.
>>
>> I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of
>> RAm and one disk, which is an lv on the host. Cache mode is "none":
> 
> So you have only one root SATA disk and setup a linear logical volume on
> that? I not, can you give more info about the storage configuration?
> 
> - I am assuming you are using CFQ on your underlying physical disk.
> 
> - What kernel version are you testing with.
> 
> - Cache=none mode is good which should make all the IO O_DIRECT on host
>   and should show up as SYNC IO on CFQ without losing io context info.
>   The onlly probelm is intermediate dm layer and if it is changing the
>   io context somehow. I am not sure at this point of time.
> 
> - Is it possible to capture 10-15 second blktrace on your underlying
>   physical device. That should give me some idea what's happening.
> 
> - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
>   on your underlying physical device where CFQ is running and see if it makes
>   any difference.
> 
>> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
>> kernel8; do virsh dumpxml $vm|grep cache; done
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>       <driver name='qemu' type='raw' cache='none'/>
>>
>> My goal is to give more I/O time to kernel1 and kernel2 than to the rest
>> of the VMs.
>>
>> mount -t cgroup -o blkio none /mnt
>> cd /mnt
>> mkdir important
>> mkdir notimportant
>>
>> echo 1000 > important/blkio.weight
>> echo 100 > notimportant/blkio.weight
>> for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do
>> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
>> for task in *; do
>> /bin/echo $task > /mnt/notimportant/tasks
>> done
>> done
>>
>> for vm in kernel1 kernel2; do
>> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
>> for task in *; do
>> /bin/echo $task > /mnt/important/tasks
>> done
>> done
>>
>> Then I used cssh to connect to all 8 VMs and execute
>> dd if=/dev/zero of=testfile bs=1M count=1500
>> in all VMs simultaneously.
>>
>> Results are:
>> kernel1: 47.5593 s, 33.1 MB/s
>> kernel2: 60.1464 s, 26.2 MB/s
>> kernel3: 74.204 s, 21.2 MB/s
>> kernel4: 77.0759 s, 20.4 MB/s
>> kernel5: 65.6309 s, 24.0 MB/s
>> kernel6: 81.1402 s, 19.4 MB/s
>> kernel7: 70.3881 s, 22.3 MB/s
>> kernel8: 77.4475 s, 20.3 MB/s
>>
>> Results vary a little bit from run to run, but it is nothing
>> spectacular, as weights of 1000 vs. 100 would suggest.
>>
>> So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of
>> weighing I/O. First I rebooted everything so that no old configuration
>> of cgroup was left in place and then setup everything except the 100 and
>> 1000 weight configuration.
>>
>> quote from blkio.txt:
>> ------------
>> - blkio.throttle.write_bps_device
>>         - Specifies upper limit on WRITE rate to the device. IO rate is
>>           specified in bytes per second. Rules are per deivce. Following is
>>           the format.
>>
>>   echo "<major>:<minor>  <rate_bytes_per_second>" >
>> /cgrp/blkio.write_bps_device
>> -------------
>>
>> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
>> kernel8; do ls -lH /dev/vdisks/$vm; done
>> brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1
>> brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2
>> brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3
>> brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4
>> brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5
>> brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6
>> brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7
>> brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8
>>
>> /bin/echo 254:25 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:26 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:27 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:28 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:29 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:30 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>> /bin/echo 254:30 10000000 >
>> /mnt/notimportant/blkio.throttle.write_bps_device
>>
>> Then I ran the previous test again. This resulted in an ever increasing
>> load (last I checked was ~ 300) on the host system. (This is perfectly
>> reproducible).
>>
>> uptime
>> Fri Feb 18 14:42:17 2011
>> 14:42:17 up 12 min,  9 users,  load average: 286.51, 142.22, 56.71
> 
> Have you run top or something to figure out why load average is shooting
> up. I suspect that because of throttling limit, IO threads have been
> blocked and qemu is forking more IO threads. Can you just run top/ps
> and figure out what's happening.
> 
> Again, is it some kind of linear volume group from which you have carved
> out logical volumes for each virtual machine?
> 
> For throttling to begin with, can we do a simple test first. That is
> run a single virtual machine, put some throttling limit on logical volume
> and try to do READs. Once READs work, lets test WRITES and check why
> does system load go up.
> 
> Thanks
> Vivek
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

-- 
Regards
Gui Jianfeng