[Linux-cluster] All VMs are "blocked", terrible performance

Mon Nov 9 10:41:38 UTC 2009

On Sat, Nov 07, 2009 at 12:09:18AM -0500, Madison Kelly wrote:
> Hi all,
> 
>   I've built up a handful of VMs on my 2-node cluster and all are 
> showing as being in a blocked state. The performance is terrible, too. 
> All VMs are currently on one node (another problem, maybe related? is 
> keeping me from migrating any).
> 
> My Setup:
> (each node)
> 2x Quad Core AMD Opteron 2347 HE
> 32GB RAM (16GB/CPU)
> 
> Nodes have a DRBD partition running cluster-aware LVM for all domU VMs. 
> Each VM has it's own logical volume. The DRBD has a dedicate gigabit 
> link and DRBD is using 'Protocol C', as required. LVM is set to use 
> 'locking_type=3'.
> 
> Here's what I see:
> 
> # xm list
> Name       ID Mem(MiB) VCPUs State   Time(s)
> Domain-0    0    32544     8 r-----  22400.8
> auth01     10     1023     1 -b----   3659.7
> dev01      22     8191     1 -b----    830.2
> fw01       11     1023     1 -b----   1046.9
> res01      23     2047     1 -b----    812.1
> sql01      24    16383     1 -b----    817.0
> web01      20     2047     1 -b----   1156.3
> web02      21     1023     1 -b----    931.1
> 
>   When I ran that, all VMs were running yum update (all but two were 
> fresh installs).
> 
> Any idea what's causing this and/or why my performance is so bad? Each 
> VM is taking minutes to install each updated RPM.
> 
> In case it's related, when I tried to do a live migration of a VM from 
> one node to the other, I got an error saying that the VM's partition 
> couldn't be seen on the other node. However, '/proc/drbd' shows both 
> nodes are sync'ed and in Primary/Primary mode. Also, both nodes have 
> identical output from 'lvdisplay' (all LVs are 'active') and all LVs 
> were created during the provision on the first node.
> 
> Kinda stuck here, so any input will be greatly appreciated! Let me know 
> if I can post anything else useful.
>

Have you configured domain weights? With a busy host/dom0 it's
requirement to make sure dom0 will always get enough cpu time to be able
to process the important stuff (IO etc).

- Give dom0 more weight than domUs
- And you could also dedicate a single core only for dom0. 
  (in grub.conf add for xen.gz: dom0_vcpus=1 dom0_vcpus_pin)
  and after that make sure domU vcpus are pinned to other pcpus than 0.
  cpus=1-x parameter in /etc/xen/<guest> cfgfile. You can monitor with
  "xm vcpu-list".

-- Pasi