[libvirt] blkio cgroup

Mon Feb 21 08:00:27 UTC 2011

Hi

first of all: Thanks for taking the time.

On 02/18/2011 05:31 PM, Vivek Goyal wrote:

> So you have only one root SATA disk and setup a linear logical volume on
> that? I not, can you give more info about the storage configuration?

The logical volumes reside on a JBOD-device [1] connected with a Dell
PERC H800 adapter.

I could also test this on the internal SAS disks, which are connected
with a Dell PERC H700 adapter.

> - I am assuming you are using CFQ on your underlying physical disk.

Yes.

cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]

> - What kernel version are you testing with.

2.6.37

> - Cache=none mode is good which should make all the IO O_DIRECT on host
>   and should show up as SYNC IO on CFQ without losing io context info.
>   The onlly probelm is intermediate dm layer and if it is changing the
>   io context somehow. I am not sure at this point of time.
> 
> - Is it possible to capture 10-15 second blktrace on your underlying
>   physical device. That should give me some idea what's happening.

Will do, read on.

> - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
>   on your underlying physical device where CFQ is running and see if it makes
>   any difference.

echo 1 > /sys/block/sdb/queue/iosched/group_isolation
cat /sys/block/sdb/queue/iosched/group_isolation
1

<snip>

>> Then I ran the previous test again. This resulted in an ever increasing
>> load (last I checked was ~ 300) on the host system. (This is perfectly
>> reproducible).
>>
>> uptime
>> Fri Feb 18 14:42:17 2011
>> 14:42:17 up 12 min,  9 users,  load average: 286.51, 142.22, 56.71
> 
> Have you run top or something to figure out why load average is shooting
> up. 

During the tests described in the first email, it was impossible to run
anything on that machine. You couldn't even login to the console, the
machine had to be disconnected from power to reboot it.

I now just started one machine (kernel3) and tried to throttle it as
described earlier.

> I suspect that because of throttling limit, IO threads have been
> blocked and qemu is forking more IO threads. Can you just run top/ps
> and figure out what's happening.

Kind of looks like that.

If I run
ps -eL -o pid,lwp,%cpu,blocked,time,args|grep kernel3
I see 48 qemu-kvm Threads while that machine is dd'ing zeroes.

On the other hand: Without throttling, I see 70. So whether 48 is much
here ...

> Again, is it some kind of linear volume group from which you have carved
> out logical volumes for each virtual machine?

Setup is:

pv = /dev/sdb
vg = /dev/vdisks
lv = /dev/vdisks/kernel1...kernel8

The vm is then given /dev/vdisks/kernelX as /dev/vda virtio device.

> For throttling to begin with, can we do a simple test first. That is
> run a single virtual machine, put some throttling limit on logical volume
> and try to do READs. Once READs work, lets test WRITES and check why
> does system load go up.

Both reads and write work. Without having to give [io]flag=direct to the
dd command by the wy (so cache=none seems worky).

Regards
Dominik

ps. I will now answer your two other emails.

pps. We could also talk on IRC. I'm usually in the #virt channel on
OFCT. kleind

[1]
http://www.thomas-krenn.com/en/storage-solutions/storage-systems/jbod-systems/3u-disk_expansion_unit-jbod-sc836.html