[libvirt-users] Ang: Ang: Re: Ang: Ang: Re: Ang: Re: attaching storage pool error

John Ferlan jferlan at redhat.com
Wed Sep 7 12:06:11 UTC 2016



On 09/03/2016 09:13 AM, Johan Kragsterman wrote:
> 
> Hi!
> 
> 
> Report from my multipath tests today.
> 
> 
> My test virtual machine, that runs from an NPIV pool, is not able to use multipath.
> 
> When I pulled the cable from one of the targets, it crashed.
> 
> But, strangely, it could boot up again on that other path, that it just crashed on.
> 
> That tells me it can use both paths, and is not limited to one of them only, but because the multipath layer isn't there, it can not survive a path failure, but can come up again on a reboot.
> 
> The question is, WHY doesn't an NPIV pool support multipath? It is sort of the idea behind FC to be redundant and to always have multiple paths to failover to. Why was the NPIV pool designed like this?
> 

Not having done the original design, I can surmise that the "goal" for
the design was to not force the user to provide something to the guest
that could change in subsequent reboots as well as having a way to
"migrate" a guest to a different host. Although perhaps proven not
perfectly reliable per this libvir-list posting:

http://www.redhat.com/archives/libvir-list/2016-July/msg00524.html

The "issue" (so to speak) is that there's no guarantee the vHBA
'scsi_hostM' remains constant between reboots, but having a pool based
upon some vHBA that gets created from a vport capable HBA and providing
the guest with a volume by a unit number rather than some path that
can/may change.

Like you've noted, it's a complicated technology and there's more than
"one" design goal that needs to be considered. Your goal is seemingly to
be able to have the storage available on either path for the same host,
while someone else may want to be able to migrate between two hosts
using the same target where each host "sees" a different path to the target.

BTW: It's not clear to me from your description how you have added the
volume to the pool.

Is the volume in a SCSI pool?

XML for the volume in the guest:

    <disk type='volume' device='lun'>
      <driver name='qemu' type='raw'/>
      <source pool='vhbapool_host3' volume='unit:0:4:0'/>
      <target dev='sda' bus='scsi'/>
    </disk>

# virsh vol-list vhbapool_host3
 Name                 Path
------------------------------------------------------------------------------
 unit:0:4:0
/dev/disk/by-path/pci-0000:10:00.0-fc-0x5006016844602198-lun-0

# virsh pool-dumpxml vhbapool_host3
<pool type='scsi'>
  <name>vhbapool_host3</name>
...
  <source>
    <adapter type='fc_host' parent='scsi_host3' wwnn='5001a4a4d2f10190'
wwpn='5001a4af287f9b40'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
...


or in a MPATH pool:

XML for the volume in the guest

    <disk type='block' device='lun' sgio='unfiltered'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/mapper/3600a0b80005ad1d700002dde4fa32ca8'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


# virsh vol-list mpath
Name                 Path
-----------------------------------------
...
dm-5                 /dev/mapper/3600a0b80005ad1d700002dde4fa32ca8


# virsh pool-dumpxml mpath
  <pool type='mpath'>
  <name>mpath</name>
...
  <source>
  </source>
  <target>
    <path>/dev/mapper</path>
...


caveat: I have limited mpath details knowledge. I know what the
technology is, but limited usage.

Conceptually, I understand what you're trying to accomplish. The devil
of course is in the details and yes, we really need to do a better job
documenting the various usage models. Of course, I'd be remiss if I
didn't say "patches welcome"!

> If we could use the underlying devices and pass them directly to the guest, then we could implement multipath in the guest.
> 
> But I sort of lean to that not use the NPIV anymore, since it only seems to complicate things. In VmWare they can attach the NPIV directly to the guest, which means that the NPIV, and whith that the LUN's are easily transfered across the SAN hosts. Here, with libvirt/qemu/kvm, we can not attach an NPIV to the guest, which sort of makes the whole idea fall. Especially if this is the case, that there is no multipath support. Better then to map the LUN's directly to the host, and use the multipath devices for the guests.
> 

A terminology thing - do you mean passing the vHBA through to the guest,
such as:

XML to pass a 'scsi_host' to the guest:

...
    <hostdev mode='subsystem' type='scsi' managed='no'>
      <source>
        <adapter name='scsi_host15'/>
        <address bus='0' target='0' unit='0'/>
      </source>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>
...

where scsi_host15 is the vHBA created from the vport capable scsi_host3

# virsh nodedev-dumpxml scsi_host15
<device>
  <name>scsi_host15</name>

<path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3/vport-3:0-10/host15</path>
  <parent>scsi_host3</parent>
...
    <capability type='fc_host'>
      <wwnn>5001a4a4d2f10190</wwnn>
      <wwpn>5001a4af287f9b40</wwpn>
      <fabric_wwn>2002000573de9a81</fabric_wwn>
...

the wwnn/wwpn are "fabricated" by libvirt (automagically created).

# virsh nodedev-dumpxml scsi_host3
<device>
  <name>scsi_host3</name>
  <path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3</path>
  <parent>pci_0000_10_00_0</parent>
  <capability type='scsi_host'>
...
    <capability type='fc_host'>
      <wwnn>20000000c9848140</wwnn>
      <wwpn>10000000c9848140</wwpn>
      <fabric_wwn>2002000573de9a81</fabric_wwn>
    </capability>
    <capability type='vport_ops'>
      <max_vports>127</max_vports>
      <vports>2</vports>
    </capability>
...

You'll note the fabric_wwn are the same for both.

I haven't used this method in my limited test environment. It's been a
"recent question" at KVM Forum that I have on my "todo" list to
investigate a bit more (competing with multiple other items).

John


> If anyone else has opinions on this, or ideas that are better than mine, I would very much like to hear them.
> 
> Regards Johan
> 
> 


> 
> 
> -----libvirt-users-bounces at redhat.com skrev: -----
> Till: John Ferlan <jferlan at redhat.com>
> Från: Johan Kragsterman 
> Sänt av: libvirt-users-bounces at redhat.com
> Datum: 2016-09-03 08:36
> Kopia: libvirt-users at redhat.com, Yuan Dan <dyuan at redhat.com>
> Ärende: [libvirt-users] Ang: Re: Ang: Ang: Re: Ang: Re: attaching storage pool error
> 
> Hi, John, and thank you!
> 
> This was a very thorough and welcome response, I was wondering where all the storage guys were...
> 
> I will get back to you with more details later, specifically about multipath, since this needs to be investigated thoroughly.
> 
> I have, with trial and error method, during the elapsed time, been able to attach the NPIV pool LUN to a virtio-scsi controller, and it seems it already uses multipath, when I look at the volumes in the host.
> 
> It seems for me a little bit confusing with this multipath pool procedure, since an NPIV vhba by nature always is multipath. I will do a very simple test later today, the best test there is: Just pulling a cable, first from one of the FC targets, and put it back again, and then do the same with the other one. This will give me the answer if it runs on multipath or not.
> 
> The considerations I got was, whether I would implement multipath on the guest or on the host, and I don't know which I would prefer. Simplicity is always to prefer, so if it is working fine on the host, I guess I'd prefer that.
> 
> Get back to you later...
> 
> /Johan
> 
> 
> -----John Ferlan <jferlan at redhat.com> skrev: -----
> Till: Johan Kragsterman <johan.kragsterman at capvert.se>, Yuan Dan <dyuan at redhat.com>
> Från: John Ferlan <jferlan at redhat.com>
> Datum: 2016-09-02 20:51
> Kopia: libvirt-users at redhat.com
> Ärende: Re: [libvirt-users] Ang: Ang: Re: Ang: Re: attaching storage pool error
> 
> 
> On 08/24/2016 06:31 AM, Johan Kragsterman wrote:
>>
>> Hi again!
>>
> 
> I saw this last week while I was at KVM Forum, but just haven't had the
> time until now to start thinking about this stuff again ... as you point
> out with your questions and replies - NPIV/vHBA is tricky and
> complicated... I always have try to "clear the decks" of anything else
> before trying to page how this all works back into the frontal cortex.
> Once done, I quickly do a page flush.
> 
> It was also a bit confusing with respect to how the responses have been
> threaded - so I just took the most recent one and started there.
> 
>> -----libvirt-users-bounces at redhat.com skrev: -----
>> Till: Yuan Dan <dyuan at redhat.com>
>> Från: Johan Kragsterman 
>> Sänt av: libvirt-users-bounces at redhat.com
>> Datum: 2016-08-24 07:52
>> Kopia: libvirt-users at redhat.com
>> Ärende: [libvirt-users] Ang: Re: Ang: Re: attaching storage pool error
>>
>> Hi and thanks for your important input,Dan!
>>
>>
>>>>
>>>>
>>>> System centos7, system default libvirt version.
>>>>
>>>> I've succeeded to create an npiv storage pool, which I could start without
>>>> problems. Though I couldn't attach it to the vm, it throwed errors when
>>>> trying. I want to boot from it, so I need it working from start. I read one
>>>> of Daniel Berrange's old(2010) blogs about attaching an iScsi pool, and
>>>> draw
>>>> my conclusions from that. Other documentation I haven't found. Someone can
>>>> point me to a more recent documentation of this?
>>>>
>>>> Are there other mailing list in the libvirt/KVM communities that are more
>>>> focused on storage? I'd like to know about these, if so, since I'm a
>>>> storage
>>>> guy, and fiddle around a lot with these things...
>>>>
>>>> There are quite a few things I'd like to know about, that I doubt this list
>>>> cares about, or have knowledge about, like multipath devices/pools,
>>>> virtio-scsi in combination with npiv-storagepool, etc.
>>>>
>>>> So anyone that can point me further....?
>>>
>>> http://libvirt.org/formatstorage.html
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-NPIV_storage.html
>>>
> 
> The Red Hat documentation is most up-to-date - it was sourced (more or
> less) from:
> 
> http://wiki.libvirt.org/page/NPIV_in_libvirt
> 
> There's some old stuff in there and probably needs a cleanse to provide
> all the "supported" options.
> 
> 
>>> Hope it can help you to get start with it.
>>>
>>>
>>> Unfortunatly I have already gone through these documents, several times as
>>> well, but these are only about the creation of storage pools, not how you
>>> attach them to the guest.
>>
>> If the pool is ready, here are kinds of examples http://libvirt.org/formatdomain.html#elementsDisks
>>
>> you can use it in guest like this:
>>     <disk type='volume' device='disk'>
>>       <driver name='qemu' type='raw'/>
>>       <source pool='iscsi-pool' volume='unit:0:0:1' mode='host'/>
>>       <auth username='myuser'>
>>         <secret type='iscsi' usage='libvirtiscsi'/>
>>       </auth>
>>       <target dev='vdb' bus='virtio'/>
>>     </disk>
>>
> 
> This is an 'iscsi' pool format, but something similar can be crafted for
> the 'scsi' pool used for fc_host devices.
> 
>>
>>
>> As I described above, I created an npiv pool for my FC backend. I'd also like to get scsi pass through, which seems to be possible only if I use "device=lun". Can I NOT use "device=lun", and then obviously NOT get "scsi pass through", if I use an npiv storage pool? Is the only way to get "scsi pass through" to NOT use a storage pool, but instead use the host lun's?
>>
> 
> So for the purposes of taking the right steps, I assume you used 'virsh
> nodedev-list --cap vports' in order to find FC capable scsi_host#'s.
> 
> Then you created your vHBA based on the FC capable fc_host, using XML
> such as:
> 
>    <device>
>      <parent>scsi_hostN</parent>
>      <capability type='scsi_host'>
>        <capability type='fc_host'>
>        </capability>
>      </capability>
>    </device>
> 
> where scsi_hostN and 'N' in particular is the FC capable fc_host
> 
> Then creation of the node device :
> 
> #virsh nodedev-create vhba.xml
> Node device scsi_hostM created from vhba.xml
> 
> where 'M' is whatever the next available scsi_host# is on your host.
> 
> If you 'virsh nodedev-dumpxml scsi_hostM' you'll see the wwnn/wwpn details.
> 
> You can then create a vHBA scsi pool from that in order to ensure the
> persistence of the vHBA.  Although it's not required - the vHBA scsi
> pool just allows you to provide a source pool and volume by unit # for
> your guest rather than having to edit guests between host reboots or
> other such conditions which cause
>>
>> What do you think about this?:
>>
>> <disk type='volume' device='disk'>
>>   <driver name='qemu' type='raw'/>
>>   <source pool='vhbapool_host8' volume='unit:0:0:1'/>
>>   <target dev='hda' bus='ide'/>
>> </disk>
>>
>>
>> But I'd prefer to be able to use something like this instead:
>>
>>
>>
>> <disk type='volume' device='lun'>
>>   <driver name='qemu' type='raw'/>
>>   <source pool='vhbapool_host8' volume='unit:0:0:1'/>
>>   <target dev='vda' bus='scsi'/>
>> </disk>
>>
>> But that might not be possible...?
>>
> 
> The "volume" and "disk" or "volume" and "lun" syntax can be used
> somewhat interchangeably. As your point out the features for disk and
> lun are slightly different.  Usage of the 'lun' syntax allows addition
> of the attribute "sgio='unfiltered'"
> 
>>
>>
>> A couple of additional questions here:
>>
>> * Since the target device is already defined in the pool, I don't see the reason for defining it here as well, like in your example with the iscsi pool?
> 
> Let's forget about iscsi
> 
>> * I'd like to use virtio-scsi combined with the pool, is that possible?
> 
> Works on my test guest (ok not the order from dumpxml):
> 
> ...
>     <controller type='scsi' index='0' model='virtio-scsi'>
>       <alias name='scsi0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
> function='0x0'/>
>     </controller>
> ...
>     <disk type='volume' device='lun'>
>       <driver name='qemu' type='raw'/>
>       <source pool='vhbapool_host3' volume='unit:0:4:0'/>
>       <backingStore/>
>       <target dev='sda' bus='scsi'/>
>       <shareable/>
>       <alias name='scsi0-0-0-0'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> ...
> 
>> * If so, how do I define that? I understand I can define a controller separatly, but how do I tell the guest to use that specific controller in combination with that pool?
> 
> See above...  The controller has a "type", "index", and "model"... Then
> when adding the disk, use the type='drive' controller='#', where # is
> the index number from your virtio-scsi controller.
> 
>> * Since the npiv pool obviously is a pool based on an fc initiator, the fc target can/will provision more lun's to that initiator, how will that effect the pool and the guest's access to new lun's? In this example the volume says 'unit:0:0:1', and I guess that will change if there will be more lun's in there? Or is that "volume unit" the "scsi target device", and can hold multiple lun's?
>>
> 
> You can use 'virsh pool-refresh $poolname' - it will find new luns...
> Err, it *should* find new luns ;-)  Existing 'unit:#:#:#' values
> shouldn't change - they should be "tied to" the same wwnn.  Use "virsh
> vol-list $poolname" to see the Path. So when new ones are added they are
> given new unit number's.  Reboots should find the same order.
> 
>> ...more...
>>
>>
>> I've found something here in the RHEL7 virt guide:
>>
>>
>> <disk type='volume' device='lun' sgio='unfiltered'>
>>   <driver name='qemu' type='raw'/>
>>   <source pool='vhbapool_host3' volume='unit:0:0:0'/>
>>   <target dev='sda' bus='scsi'/>
>>   <shareable />
>> </disk>
>>
> 
> Fair warning, use of sgio='unfiltered' does require some specific
> kernels... There were many "issues" with this - mostly related to kernel
> support. If not supported by the kernel, you are advised :
> 
> error: Requested operation is not valid: unpriv_sgio is not supported by
> this kernel
> 
>>
>>
>>
>> Question that shows up here is the multipath question. Since this is fibre channel it is of coarse multipath. The "target dev" says 'sda'. In a multipath dev list it should say "/dev/mapper/mpatha".
>>
>> How to handle that?
>>
> 
> Uhh... multipath... Not my strong suit... I'm taking an example from a
> bz that you won't be able to read because it's marked private.
> 
> Once you have your vHBA and scsi_hostM for that vHBA on the host you can
> use 'lsscsi' (you may have to yum/dnf install it - it's a very useful
> tool)...
> 
> 
> # lsscsi
> ...
> //assume scsi_host6 is the new vHBA created as follow
> [6:0:0:0]    disk    IBM      1726-4xx  FAStT  0617  -
> [6:0:1:0]    disk    IBM      1726-4xx  FAStT  0617  -
> [6:0:2:0]    disk    IBM      1726-4xx  FAStT  0617  /dev/sdf
> [6:0:3:0]    disk    IBM      1726-4xx  FAStT  0617  /dev/sdg
> 
> 
> You'll need an mpath pool:
> 
> # virsh pool-dumpxml mpath
> <pool type='mpath'>
>   <name>mpath</name>
>   <source>
>   </source>
>   <target>
>     <path>/dev/mapper</path>
>     <permissions>
>       <mode>0755</mode>
>       <owner>-1</owner>
>       <group>-1</group>
>     </permissions>
>   </target>
> </pool>
> 
> # virsh pool-define mpath
> # virsh pool-start mpath
> 
> # virsh vol-list mpath
> Name                 Path
> -----------------------------------------
> dm-0                 /dev/mapper/3600a0b80005adb0b0000ab2d4cae9254
> dm-5                 /dev/mapper/3600a0b80005ad1d700002dde4fa32ca8
> <=== this one is from vhba scsi_host6
> 
> Then using something like:
> 
>     <disk type='block' device='lun' sgio='unfiltered'>
>       <driver name='qemu' type='raw'/>
>       <source dev='/dev/mapper/3600a0b80005ad1d700002dde4fa32ca8'/>
>       <target dev='sda' bus='scsi'/>
>       <alias name='scsi0-0-0-0'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> 
> HTH,
> 
> John
> 
> (FWIW: I'm not sure how the leap of faith was taken that dm-5 is the
> vHBA for scsi_host6... Although I think it's from the wwnn for a volume
> in the vHBA as seen when using a virsh vol-list from a pool created
> using the vHBA within the bz).
> 
> 
> 
> 
> _______________________________________________
> libvirt-users mailing list
> libvirt-users at redhat.com
> https://www.redhat.com/mailman/listinfo/libvirt-users
> 
> 
> 




More information about the libvirt-users mailing list