[libvirt] NPIV storage pools do not map to same LUN units across hosts.

John Ferlan jferlan at redhat.com
Wed Sep 7 12:47:41 UTC 2016



On 07/15/2016 05:59 AM, Nitesh Konkar wrote:
> Link:  http://wiki.libvirt.org/page/NPIV_in_libvirt
> Topic: Virtual machine configuration change to use vHBA LUN
> 

Sorry for the delay on this... trying to "page in" NPIV details and
configure environments is time consuming - time I haven't been able to
carve out...

> There is a NPIV storage pool defined on two hosts and  pool contains a
> total of 8 volumes, allocated from a storage device.
> 
> Source:
> 
> # virsh vol-list poolvhba0
>  Name                 Path                                    
> ------------------------------------------------------------------------------
>  unit:0:0:0           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000366
>  unit:0:0:1           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000367
>  unit:0:0:2           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000368
>  unit:0:0:3           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000369
>  unit:0:0:4           /dev/disk/by-id/wwn-0x6005076802818bda300000000000036a
>  unit:0:0:5           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000380
>  unit:0:0:6           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000381
>  unit:0:0:7           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000382
> --------------------------------------------------------------------
> 
> Destination:
> --------------------------------------------------------------------
> # virsh vol-list poolvhba0
>  Name                 Path                                    
> ------------------------------------------------------------------------------
>  unit:0:0:0           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000380
>  unit:0:0:1           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000381
>  unit:0:0:2           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000382
>  unit:0:0:3           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000367
>  unit:0:0:4           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000368
>  unit:0:0:5           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000366
>  unit:0:0:6           /dev/disk/by-id/wwn-0x6005076802818bda300000000000036a
>  unit:0:0:7           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000369
> --------------------------------------------------------------------
> 
> As you can see in the above output,the same set of eight LUNs from the storage server have been mapped,
> but the order that the LUNs are probed on each host is different, resulting in different unit names 
> on the two different hosts .

Yep - I see what you're pointing out.  As I think I've pointed out
before to you either in private chat or email, libvirt relies on systemd
for population of the /dev/disk/by-{path|id|uuid}.  Furthermore, there
have been issues fixed in the "by-path" logic (as of systemd-219), but I
believe you asked about a by-id issue for which I forget the details.  I
did find an external bz for the by-path issue, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1032218 (although it's RHEL6
based).  There's also a github patch referenced in the private bz as:
https://github.com/lnykryn/systemd-rhel/commit/69fc7c636b2b962c386e835ca33c4e380e63dd18

> 
> If the the guest XMLs is referencing its storage by "unit" number then is 
> it safe to migrate such guests  because the "unit number" is assigned by the 
> driver according to the specific way it probes the storage and hence when you migrate 
> these guests , it results in  different unit names on the destination hosts. 
> Thus the migrated guest gets mapped to the wrong LUNs and is given the wrong disks.
> The problem is that the LUN numbers on the destination host and source host do not agree. 
> Example, LUN 0 on source_host, for example, may be LUN 5 on destination_host.
> When the guest is given the wrong disk, it suffers a fatal I/O error. (This is 
> manifested as fatal I/O errors since the guest has no idea that its disks just 
> changed out under it.)The migration does not take into account that the unit numbers do 
> match on on the source and destination sides.
> 
> So, should libvirt make sure that the guest domains reference NPIV pool volumes by their
> globally-unique wwn instead of by "unit" numbers?
> 
> The guest XML references its storage by "unit" number.
> 
> Eg:-
> <disk type='volume' device='lun'>
>       <driver name='qemu' type='raw' cache='none'/>
>       <source pool='poolvhba0' volume='unit:0:0:0'/>
>       <backingStore/>
>       <target dev='vdb' bus='virtio'/>
>       <alias name='virtio-disk1'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
>     </disk>
> 
> I am planning to write a patch for it. Any comments on the above observation/approach would be appreciated. 
> 

I suppose I'd have to see the patch(es) in order to fully understand
what your proposal is. It seems as though what you want to do is
allow/add a 'wwn' attribute instead of a 'volume' attribute so that
regardless of whether the pool is "by-id" or "by-path", a search could
be done on the 'wwn' and a path to the volume created.  I think that
would work, but seeing the details would certainly help.

Interestingly there's a thread on libvirt-users:

https://www.redhat.com/archives/libvirt-users/2016-August/msg00063.html

which is taking a slightly different use case approach for NPIV/vHBA. In
that case, the desire seems to be more multiple paths on a single host
and being able to force failover between the paths. I'm not quite sure
why that failover just doesn't work behind the scenes, but it also seems
to point out a need/desire to have the vHBA be exposed in the guest and
let the guest "find" the volumes it wants.


John
> Thanks,
> 
> Nitesh. 
> 
> 
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 




More information about the libvir-list mailing list