[Linux-cluster] Succesfull installation on centos 5.3 with live kvm migration

Tue Aug 11 12:59:13 UTC 2009

Getting cluster software including kvm virtual machines with live 
migration working,
can be a very difficult task, with many obstacles.

But I would like to mention to the mailing list, that I just booked some 
succes.
And because nobody is around to tell the wonderfull news,
I would like to share my hapiness here ;)

My setup:
2 NAS servers and 1 Supermicro bladeserver with 5 blades.

The NAS servers are running Openfiler 2.3
both NAS servers have:
1 Transcend IDE 4Gbyte flashcard (on the ide port on the mainboard).
3 x Transcend 4Gbyte usb sticks
8 SATA disks.

The IDE flashcard is setup in a raid-1 mirroring (md0) with one USB 
stick providing the root FS voor openfiler
The other 2 USB sticks have 5 partitions: 4 x 500 MB and 1 x 2GB.
those are mirrored with raid-1 together. (md5 until md8 are the 500mb 
partions, and md9 is the 2Gb partition).
Then the 8 harddisks are also tied together per 2 as mirroring raid1 
(md1 until md4).

Then I used DRBD (8.2.7) to mirror the 4 raid-1's of the disks (md1 
until md4) and the 2GB mirror (md9)
over the network to the other NAS server. (drbd1, drbd2, drbd3 and drbd4)
The 500mb raid-1's are used to store metadata of the 4 disk raid-1's.
The 2gb drbd (drbd0) has internal metadata.

The 2gb drbd (drbd0) is mounted as ext3 on one only server and is used 
to store all kinds
of openfiler information that is needed on both nas servers,
like the openfiler config (mostly), dhcp leases database, openldap database.
And heartbeat makes sure that one NAS server is running all the 
software, and with any problems,
it can switch over very easily.

The drbd1 til 4 are setup as a LVM PV, and bound together in one big VG.
 From that VG, I created a 5 x 5GB LV to be used as root device for 
blade1 til blade5
These LV's are stripped accross 2 PV for speed (altough that's still my 
only bottleneck at the moment, but more later about this...).
These LV's are setup as iSCSI

I also created one big LV of around 600GB, which can be mounted through NFS.

Then a few more LV's are created (around 10GB, also iscsi) for every VM 
I want.
For every iSCSI LV I create a separate target.

The Supermicro blades can boot from an iscsi device.
The exact scsi device is given through a DHCP option.
I only setup a initiatior name in the iscsi bios of the blade.

On the blade LV's I installed CentOS 5.3 (latest updates).
But with a few modifications.

I changes a few things in the initrd, to bound eth0 to br0 during the 
linux boot,
and before linux is taking over the iscsi from the bios, because
when you have a linux root through iscsi, and try to attach eth0 to br0,
you loose networkconnectivity for a moment, and could crash the linux,
because everything it uses, comes from the network (iscsi root).
I also added a little script to the initrd to call iscsiadm with a fixed 
iscsi
target, because unfortunately iscsiadm can't read the iscsi settings 
from dhcp
or the supermicro firmware.

When the blades are booted, they all join one redhat cluster with 3 
nodes to be quorum.
Because I have 5 blades, two can fail before everything stops working.

Then I compiled the following software my own, because the ones in the 
centos repo,
and the testing repo didn't function correctly:
libvirt 0.7.0 (./configure --prefix=/usr)
kvm-88 (./configure --prefix=/usr --disable-xen)

The /usr/share/cluster/vm.sh from the default centos repo is still based 
on xen.
I downloaded the latest from 
https://bugzilla.redhat.com/show_bug.cgi?id=412911
but it appears that that one is not working correctly either.
I made some changes myself.

And now it's working all together very nicely

I just ran a VM on blade1, and while this VM was running bonnie++ on a 
NFS mount to the NAS server,
I live-migrated it about 10 times to blade2 and back.

During this bonnie++ run and live migrations, I pinged the device.
And where the normal ping times are around 20-35 ms (I pinged through a 
VPN line from my home to the data center).
I only saw one or 2 pings just around the end of the live migration that 
were around 40-60ms.
but no drops, and no errors in bonnie++.

I will write some more information about the complete setup, and post it 
somewhere on my blog or someting,
But I just wanted to let everybody know, that it can be done ;)

If you have any questions, let me know.

The only 'problem' I still have is the speed to and from the disks.
When I update any settings on the bladeserver. I always do this on blade1.
Then shut it down, On the NAS server I copy the content of the iscsi LV 
to an image file on the ext3 LV.
Then I can power up blade1, wait until it reenters the cluster,
and then one by one shut down the next blade, On the NAS copy the image 
from the ext3 LV to the blade LV.
And start the blade again.

I use the drbd1 til drbd4 as 4 PV's for a VG.
The speed (hdparm -t on the NAS) of all PV's are around 75 MB/sec 
(except for one which is 45MB/sec)

The blade LV (/dev/vg0/blade1 for example) is striped over 2 PV's.
The Speed (hdparm -t) of /dev/vg0/blade1 is 122MB/sec.

The ext3 LV (/dev/vg0/data0) is striped over 4 PV's.
The Speed (hdparm -t) of /dev/vg0/data0 is 227 MB/sec.

But when copying from the blade LV to the ext3 LV:
dd if=/dev/vg0/blade1 of=/mnt/vg0/data0/vm/image/blade_v2.7.img
it takes about 70 seconds, which is about 75MB/sec.

but when copying back:
dd if=/mnt/vg0/data0/vm/image/blade_v2.7.img of=/dev/vg0/blade1
It takes about 390 seconds, which is about 13MB/sec

I think it has something to do with the striped over 4 PV's of the LVM.
So I will try to create a new ext3 LV stiped accross 2 PV's and see if 
this is faster.

Robert Verspuy

-- 
*Exa-Omicron*
Patroonsweg 10
3892 DB Zeewolde
Tel.: 088-OMICRON (66 427 66)
http://www.exa-omicron.nl