[libvirt] RFC: Migration with NPIV

Zou, Yi yi.zou at intel.com
Tue Nov 20 01:36:05 UTC 2012


> On 2012年11月19日 17:30, Osier Yang wrote:
> > Hi,
> >
> > This proposal is trying to figure out a solution for migration
> > of domain which uses LUN behind vHBA as disk device (QEMU
> > emulated disk only at this stage). And other related NPIV
> > improvements which are not related with migration. I'm not
> > luck to get a environment to test if the thoughts are workable,
> > but I'd like see if guys have good idea/suggestions earlier.
Glad to see this topic on the list.

> >
> > 1) Persistent vHBA support
> >
> > This is the useful stuff missed for long time. Assuming
> > that one created a vHBA, did masking/zoning, everything works
> > as expected. However, after a system rebooting, everything is
> > just lost. If the user wants to get things back, he has to
> > find out the preivous WWNN & WWPN, and create the vHBA again.
> >
> > On the other hand, Persistent vHBA support is actually required
> > for domain which uses LUN behind a vHBA. Othewise the domain
> > could fail to start after a system rebooting.
> >
> > To support the persistent vHBA, new APIs like virNodeDeviceDefineXML,
> > virNodeDeviceUndefine is required. Also it's useful to introduce
> > "autostart" for vHBA, so that the vHBA could be started automatically
> > after system rebooting.
> >
> > Proposed APIs:
> >
> > virNodeDevicePtr
> > virNodeDeviceDefineXML(virConnectPtr conn,
> > const char *xml,
> > unsigned int flags);
> >
> > int
> > virNodeDeviceUndefine(virConnectPtr conn,
> > virNodeDevicePtr dev,
> > unsigned int flags);
> >
> > int
> > virNodeDeviceSetAutostart(virNodeDevicePtr dev,
> > int autostart,
> > unsigned int flags);
> >
> > int
> > virNodeDeviceGetAutostart(virNodeDevicePtr dev,
> > int *autostart,
> > unsigned int flags);
> 
> One API missed is:
> 
>    int
>    virNodeDeviceCreate(virNodeDevicePtr dev,
>                        unsigned int flags);
> 
>    To create the vHBA.
> 
> >
> > 2) Associate vHBA with domain XML
> >
> > There are two ways to attach a LUN to a domain: as an QEMU emulated
> > device; or passthrough. Since passthrough a LUN is not supported in
> > libvirt yet, let's focus on the emulated LUN at this stage.
> >
> > New attributes "wwnn" and "wwpn" are introduced to indicate the
> > LUN behind the vHBA. E.g.
> >
> > <disk type='block' device='disk'>
> > <driver name='qemu' type='raw'/>
> > <source wwnn="2001001b32a9da4e" wwpn="2101001b32a90004"/>
> > <target dev='vda' bus='virtio'/>
> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
> > </disk>
> >
> > Before the domain starting, we have to check if there is LUN
> > assigned to the vHBA, error out if not.
> >
> > Using the stable path of LUN also works, e.g.
> >
> > <source dev="/dev/disk/by-path/pci-0000\:00\:07.0-scsi-0\:0\:0\:0"/>
> >
> > But the disadvantage is the user have to figure out the stable
> > path himself; And we have to do checking of every stable path to
> > see if it's behind a vHBA in migration "Begin" stage. Or an new
> > XML tag for element "source" to indicate that it's behind a vHBA?
> > such as:
> >
> > <source dev="disk-by-path" model="vport"/>
> >
> > 3) Migration with vHBA
> >
> > One possible solution for migration with vHBA is to use one pair
> > of WWNN & WWPN on source host, one is using for domain, one is
> > reserved for migration purpose. It requires the storage admin maps
> > the same LUN to the two vHBAs when doing the masking and zoning.

Is WWNN part of the migration? I mean, isn't WWNN normally associated
w/ the underlying real vendor HBA and to have that also means the target
of your migration has to match up that WWNN? Just for the sake of getting
the LUN back after migration, the vHBA would only need the WWNN for 
the zoning and LUN masking, so you can migrate the domain across different
vendor HBAs, as long as you make the WWPN naming non-vendor specific.
Particularly the guest vm you are migrating is using LUNs via a NPIV port in the
host.

> >
> > One of the two vHBA is called "Primary vHBA", another is called
> > "secondary vHBA". To maitain the relationship between these two
> > vHBAs, we have to introduce new XMLs to vHBA. E.g.
> >
> > In XML of primary vHBA:
> >
> > <secondary wwpn="2101001b32a90004"/>
> >
> > In XML of secondary vHBA:
> >
> > <primary wwpn="2101001b32a90002"/>
> >
> > Primary vHBA is going to be guaranteed not used by any domain which
> > is driven by libvirt (we do some checking eariler before the domain
> > starting). And it's also guaranteed that the LUN can't be used by
> > other domain with sVirt or Sanlock. So it's safe to have two vHBAs
> > on source host too.
Not familiar w/ sVirt or Sanlock, but will there be any race condition that
two domains start migration may end-up getting the same secondary WWPN?
Or, I guess my question should how is that prevented? Unless some central
database keeps track of it, or the algorithm of generating of secondary vHBA
WWPN guarantees that.

> >
> > To prevent one using the LUN by creating vHBA using the same WWNN &
> > WWPN on another host, we must create the secondary vHBA on source
> > host, even it's not being used.
> >
> > Both primary and secondary vHBA must be defined and marked as
> > "autostart" so that the domain could be started after system
> > rebooting.
> >
> > When do migration, we have to bake a bigger cookie with secondary
> > vHBA's info (basically it's WWNN and WWPN) in migration "Begin"
> > stage, and eat that in migration "Prepare" stage on target host.
> >
> > In "Begin" stage, the XMLs represents the secondary vHBA is
> > constructed. And the secondary vHBA is destoyed on source host,
> > not undefined though.
> >
> > In "Prepare" stage, a new vHBA is created (define and start)
> > on target host with the same WWNN & WWPN as secondary vHBA on
> > source host. The LUN then should be visible to target host
> > automatically? and thus migration can be performed. After migration
If zoning is correct, then yes.

> > is finished on target host, the primary vHBA on source host is
> > destroyed, not undefined.
> >
> > If migration fails, the new vHBA created on target host will
> > be destroyed and undefined. And both primary and secondary
> > vHBA on source host will be started, so that the domain could
> > be resumed.
> >
> > Finally if migration succeeds, primary vHBA on source host
> > will be transtered to target host as secondary vHBA (defined).
> > And both primary and secondary vHBA on source host will be
> > undefined.
Maybe you can get rid of having 2nd vHBA all the time per domain for migration.
I hope I understand you correctly, but maybe not, anyway, bear w/ me
in below:

You need a transient vHBA for the target domain that is zoned already
to see the LUNs at the same time they are seen on the source via the original (primary)
vHBA. Once you have transferred the primary vHBA on source to target,
you don't need the secondary vHBA, you only need the transferred primary,
since it's already undefined on the source.

This cuts down the WWPN space, that is unique per fabric. You may also
reserve a pool of these transient WWPNs for the purpose of migration only, i.e.,
whoever wants to do migration, sends a request to get one of these, along w/
the request, automatically puts the transient WWPN in the zone of the requesting
domain's vHBA WWPN. When migration succeeds, the post-cleanup routine can
just reconfig the zoning to take the transient WWPN out and then put it back
for other domain to use for migration.

> >
> > 4) Enrich HBA's XML
> >
> > It's hard to known the vHBAs created from a HBA with current
> > implementation. One have to dump XML of each (v)HBAs and find
> > out the clue with element "parent" of vHBAs. It's good to introduce
> > new element for HBA like "vports", so that one can easily known
> > what (how many) vHBAs are created from the HBA?
> >
> > And also it's good to have the maximum vports the HBA supports.
> >
> > Except these, other useful information should be exposed too,
> > such as the vendor name, the HBA state, PCI address, etc.
> >
> > The new XMLs should be like:
> >
> > <vports num='2' max='64'>
> > <vport name="scsi_host40" wwpn="2101001b32a90004"/>
> > <vport name="scsi_host40" wwpn="2101001b32a90005"/>
> > </vports>
> > <online/>
> > <vendor>QLogic</vendor>
> > <address type="pci" domain="0" bus="0" slot="5" function="0"/>
> >
> > "online", "vendor", "address" make sense to vHBA too.
> >
> > 5) Improve the way to lookup LUN's stable path
> >
> > Currently, to lookup the LUN's stable path with WWNN & WWPN,
> > it needs to iterate over the sysfs each time, maintaining the
> > stable path in vHBA's XML doesn't make sense, as the LUN assigned
> > to the vHBA could be changed as the storage admin's mood. I'm
> > wondering if there is a way to notify the change asynchronously,
> > if there is, then maintaining the stable path internally make
> > sense.
> >
> > 6) Miscellaneous
> >
> > This is only about QEMU emulated device, passthroughed scsi_host
> > with vHBA is still not covered, we have to support the vHBA passthough
> > first. The good thing is the solution should be similiar.
I assume you meant PCI passthrough here? I am not sure if you want to do migration
for that, since anyone using passthrough really wants to bind to the real underlying HW.
But maybe there is a usage case, maybe for passing through of PCI virtual functions?

Thanks, this is great start for this issue, let me know if I can help.

yi
> >
> > Regards,
> > Osier
> >
> > --
> > libvir-list mailing list
> > libvir-list at redhat.com
> > https://www.redhat.com/mailman/listinfo/libvir-list
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list