[libvirt] blkio cgroup

Sat Feb 19 01:33:14 UTC 2011

On Fri, Feb 18, 2011 at 11:31:37AM -0500, Vivek Goyal wrote:
> On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
> > Hi Vivek
> > 
> > I don't know whether you follow the libvirt list, I assume you don't. So
> > I thought I'd forward you an E-Mail involving the blkio controller and a
> > terrible situation arising from using it (maybe in a wrong way).
> > 
> > I'd truely appreciate it if you read it and commented on it. Maybe I did
> > something wrong, but maybe also I found a bug in some way.
> 
> Hi Dominik, 
> 
> Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have
> just now subscribed.
> 
> Few questions inline.
> 
> > -------- Original Message --------
> > Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support
> > Date: Fri, 18 Feb 2011 14:42:51 +0100
> > From: Dominik Klein <dk at in-telegence.net>
> > To: libvir-list at redhat.com
> > 
> > Hi
> > 
> > back with some testing results.
> > 
> > >> how about the start Guest with option "cache=none" to bypass pagecache?
> > >> This should help i think.
> > > 
> > > I will read up on where to set that and give it a try. Thanks for the hint.
> > 
> > So here's what I did and found out:
> > 
> > The host system has 2 12 core CPUs and 128 GB of Ram.
> > 
> > I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of
> > RAm and one disk, which is an lv on the host. Cache mode is "none":
> 
> So you have only one root SATA disk and setup a linear logical volume on
> that? I not, can you give more info about the storage configuration?
> 
> - I am assuming you are using CFQ on your underlying physical disk.
> 
> - What kernel version are you testing with.
> 
> - Cache=none mode is good which should make all the IO O_DIRECT on host
>   and should show up as SYNC IO on CFQ without losing io context info.
>   The onlly probelm is intermediate dm layer and if it is changing the
>   io context somehow. I am not sure at this point of time.
> 
> - Is it possible to capture 10-15 second blktrace on your underlying
>   physical device. That should give me some idea what's happening.
> 
> - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
>   on your underlying physical device where CFQ is running and see if it makes
>   any difference.

Dominik,

Apart from setting group_isolation=1, I would also recommend to do some
tests on READS also. Service differentiation is much more visible there.
Why? Because In case of writes I am seeing that there are extended
periods where ther is no IO on underlying device from higher weight
virtual machine. I am not sure what that virtual machine is doing
for that duration but that's what blktrace shows.

First I ran READS. Two partitions exported to two virtual machines.

I started, time dd if=/mnt/vdb/testfile of=/dev/zero and as soon as
it finished in first virtual machine, I stopped second virtual machine
job also (manually, there could be better test script or use of fio
tool which allows to run timed tests).

[vm1 ~]# time dd if=/mnt/vdb/testfile of=/dev/zero
3072000+0 records in
3072000+0 records out
1572864000 bytes (1.6 GB) copied, 12.35 s, 127 MB/s

real	0m12.503s
user	0m0.527s
sys	0m2.318s

[vm2 ~]# time dd if=/mnt/vdb/testfile of=/dev/zero
420853+0 records in
420852+0 records out
215476224 bytes (215 MB) copied, 12.331 s, 17.5 MB/s

real	0m12.342s
user	0m0.082s
sys	0m0.307s

Here in the duration of 12 seconds, first VM did 1.6GB of READS (weight
1000) and second VM did 215MB of READS (weight 100).

Then, I did some tests on WRITES and after setting group isolation with
two virtual machines following are the results.

[vm1 ~]# time dd if=/dev/zero of=/mnt/vdb/testfile bs=1M count=1500
1500+0 records in
1500+0 records out
1572864000 bytes (1.6 GB) copied, 6.47411 s, 243 MB/s

real	0m6.711s
user	0m0.002s
sys	0m2.233s

[vm2 ~]# time dd if=/dev/zero of=/mnt/vdb/testfile bs=1M count=1500
388+0 records in
388+0 records out
406847488 bytes (407 MB) copied, 6.68171 s, 60.9 MB/s

real	0m6.739s
user	0m0.002s
sys	0m0.697s

First machine wrote 1.6 GB while second machine wrote 400MB. And some
of it could be lying in second virtual machine's cache and never made
it do disk. So this is significant service differentiation I would
say.

Thanks
Vivek