[virt-tools-list] Live migration of iscsi based targets

Wed Oct 27 17:09:36 UTC 2010

  On 10/27/2010 03:10 AM, Gildas Bayard wrote:
> Hello,
>
> I'm using libvirt and KVM for a dozen of virtual servers. Each virtual 
> server's disk is an iscsi LUN that is mounted by the physical host 
> blade which runs KVM.
> Every thing works fine at that stage for about a year. Both the 
> servers and the blade are running ubuntu server 10.04 LTS.
> I've been trying live migration for a while but it was not working, at 
> least with my setup, on previous versions of ubuntu (actually 
> virt-manager was showing the VM on the target host but the machine 
> became unreachable by the network).
>
> Anyway for some reasons it's working now. But there's a big pb: let 
> say I use 2 blades (A and B) to host my VMs. If I start a VM on blade 
> A and live migrate it to blade B everything is fine. But if I migrate 
> it back to blade A awful things happen: at first it's ok but sooner or 
> later the VM will complain about disk corruption and destroy itself 
> more and more as time goes by. Oops!
>
> My understanding is that blade A got it's iscsi disk cache up and 
> running and that when the VM comes back, blade A has no way to know 
> that the VM got its disk altered by its stay on blade B for a while. 
> Hence the corruption.
>
> Am I getting this correct? Should I switch to NFS "disk in a file" 
> instead of using iscsi?
>
> Sincerely,
> Gildas

This is how we do it here.

Prior to the live migrate:

Add permissions to the iscsi target for the New Host.

New Host discovers and logs into the iscsi target.

Live migrate:
virsh -c qemu+ssh://root\@<Old host>/system migrate --live <domain> 
qemu+ssh://root\@<New Host>/system"

On the new host:
    virsh dumpxml <domain> > /tmp/<domain>.xml
    virsh define /tmp/<domain>.xml

On the old host
    virsh undefine <domain>
    iscsiadm -m node --logout -T <iscsi-target>
    iscsiadm -m node -T <iscsi-target> -o delete

Remove permissions from the iscsi target for the Old Host

That seems to work reliably for us.

As for iscsi vs. NFS:

We tested this extensively and determined that a reasonably optimized 
iscsi based storage array using one iscsi target per VM achieved vastly 
improved I/O performance.

We tested Linux based iscsi target hosts using both TGT and 
iscsi-target, as well as several vendor provided iscsi storage arrays.

As a result of this testing, we settled on the Dell EquaLogic arrays.  
They can support 512 connections per pool, which (barely) meets our very 
specific requirements.

But the kicker for EQ is the dedicated 4GB cache.  Working within the 
cache from up to 500 VMs, we were able to achieve a sustained 7000 iops 
using 14 normal SATA drives (16 in RAID 6).

Normally those drives on a pair of 3Ware controllers (512MB cache on the 
controller) maxed out at 1500 iops no matter how we configured the targets.

We are keeping a close eye on the Isilon products, and we have an Isilon 
cluster in production for NFS, but they now have a fair to good iscsi 
implementation and its getting better each release.  Version 6 may bring 
it to par (or better) with anyone else.  However, the Isilon's ability 
to grow nearly forever with no hassle is a major selling point.  
Dropping a new unit into the cluster and having those new resources 
available within 60 seconds is a very long way from the 36 hour wait to 
VERIFY a new EQ array!

Good Luck!