[libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

Eric Farman farman at linux.vnet.ibm.com
Tue Sep 13 21:48:15 UTC 2016



On 09/13/2016 04:49 PM, John Ferlan wrote:
> [...]
>
>>> I took a peek at the v1, but suffice to say didn't follow it that
>>> closely since there's a number of changes in v2 related to patch
>>> ordering. You could get 3 different opinions on that matter too!
>> Hopefully a consensus before too long.  :)
>>
> yeah - v1 was lots of little steps that you were asked to combine. OTOH
> there are those that like to combine multiple things in one patch which
> can make it really difficult to follow. The order here was "followable"
> - my suggestion was more to try and get the guts (qemu) done first, then
> the fluff (domain/xml).

Agreed.  v1 was perhaps overkill in breaking things down so small, 
because I was picking up an idea with a bit of dust on it.

>
> Consider your patch order to be making progress towards some sustainable
> feature/bug fix - each step along the way being able to pass "make check
> syntax-check" so that future attempts to git bisect can point the fickle
> finger of fate (at someone else!).  If a patch adds something that will
> be external that a followup patch will use - then great. If it's
> testable, even better. For example, adding virSCSIOpenVhost as a
> separate patch with the tests/qemuxml2argvmock.c is fine. One more
> tidbit here - it's a very faint memory, but check out
> tests/commandtest.c and the lengthy comment in mymain() regarding
> "expected" fd values. I think you're "safe" in what you've done -
> although your mock should mimic whatever changes you make to the primary
> function.
>
> [...]
>
>>> A wwpn doesn't generally have the "naa." prefix on it. So, why not just
>>> add that to the command line instead? That is a wwpn is a 16 hex digit
>>> value. I'm assuming that in order to support this feature a "naa." is
>>> prefixed and sent to qemu. Does that necessarily mean some other
>>> hypervisor would choose to place the "naa." prefix or would then code
>>> need to be added to strip it off for those other hypervisors?  Looking
>>> at the qemu code - it seems fairly specific.
>> Fairly certain that it's more a vhost thing of which the hypervisor
>> utilizes.  Per a recommendation made on this list years ago, I used
>> targetcli to do the configuration rather than direct manipulation of
>> sysfs, and according to its man page:
>>
>>      All WWNs are prefaced by their type, such as "iqn", "naa", or "ib",
>> and may be given with or without colons.
>>
> I don't use targetcli so I'm not familiar with it.
>
> Still not convinced using naa.# in the XML is the best way to go, but
> I'm not against it. I guess a lot depends on how the sysfs views things.
>
>> As to whether other hypervisors follow the same behavior, I do not
>> know.  I was just trying to conform to what it would allow in the case
>> of vhost-scsi.  Now, I suppose I'm confusing matters by saying "wwpn"
>> here since as you say that's 16 hex digits, and should probably say just
>> "wwn"  ?
>>
> I guess I've always thought of a wwn/wwpn/wwnn as that string of
> bytes...

I always thought WWPN and WWNN as types of WWN...

> Usually WWN/WWNN have been interchangeable, while WWPN is a
> WWN assigned to a port in a Fibre Channel fabric

...but have seen it this way also.  Either way makes sense to me.

>
> FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
> will list a wwnn, wwpn, and fabric_wwn.  The fabric_wwn's will match.
> The wwnn/wwpn for the vport is what was used when running the
> HBA's/vport_create function.  It's a tangled web.  Wait until vGPU's
> show up (that's another black magic trick).
>
>
> [...]
>
>>> It would be nice to tell a user how to find that wwpn value via some
>>> command (virsh or otherwise) in order to fill in the wwpn attribute.
>>> Although there's always varying opinions on this since many times it's
>>> distro dependent. That's why I suggest virsh - at least if it's
>>> supportable we can display the wwpn there using nodedev-dumpxml.
>> Hrm...  nodedev-dumpxml is new to me, neat!  I'll go down that bunny
>> trail, but it seems like it'd be a bit of work to add the smarts for this.
>>
> Hours of all sorts of fun.... The 'nodedev-{list|dumpxml}' are
> essentially the libvirt mechanism to list/describe the various rabbit
> holes that is/are the sysfs tree and especially branches. The
> nodedev-list --tree will probably be of most interest, but so is virsh
> nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}
>
>>> What/where is the 'vhost_scsi_wwpn'?  IOW: It's not something defined
>>> yet - I'm assuming there's some output somewhere where that's pulled
>>> from.
>> As near as I can tell, it's not anything that can be autogenerated off
>> the bat.  Somebody somewhere needs to first establish the sysfs entries
>> for the vhost target, which can then be stored and loaded on a next
>> (host) boot.  The end result looks something like this:
>>
> AFAIK systemd handles the sysfs creation (ok at least for my purposes).
>
> This is what I was concerned about - we really need a way to tell
> someone how to "easily" find the wwpn that's to be used. Although I
> don't have vhost-scsi on my laptop ;-)

Me neither.  Of course, I'm using an s390, so that's a little different 
too.  :)

> - I have to figure that by using
> 'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that
> is an HBA will show a <path> that path would then be a starting point...
>
> Googling "scsi_host HBA wwn" shows me some interesting results including
> ones that seem to indicate 'fc_host' is involved in some way even for
> the HBA (I guess that makes sense - where the vHBA is just an
> abstraction off of that). That said, a '/sys/class/fc_host/' is the
> starting point then to find things and virsh nodedev-list --cap vports
> would be the way to get capable HBA's.

/sys/class/fc_host is just showing the individual SCSI disks, but 
nothing that looks to point to the vhost-scsi target that I'm passing to 
QEMU.  /sys/class/fc_vports is empty.  (Oh wait, is NPIV turned on 
here?  Ugh, pick this up later.)

>
>
>> [root at xxxxxxxx ~]# ls -l
>> /sys/kernel/config/target/vhost/naa.50014056e4794195/tpgt_1/lun/lun_0/
>> total 0
> Is the 'vhost' or 'vhost/naa.%s' linked from some
> /sys/class/{scsi|fc}_host/host%d directory?

No, I can't see anybody linking it from anywhere except the couple cross 
links in /sys/kernel/config/ shown below.

>   I think that becomes the
> key for traversal. Or maybe finding "5001...4195" in some other link
> from /sys/class/{scsi|fc}_host/....
>
>> lrwxrwxrwx 1 root root    0 Sep 13 15:44 752e150d11 ->
>> ../../../../../../target/core/iblock_0/disk0
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_gp
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_offline
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_status
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_write_md
>> drwxr-xr-x 5 root root    0 Sep 13 15:44 statistics
>> [root at xxxxxxxx ~]# cat /sys/kernel/config/target/core/iblock_0/disk0/info
>> Status: ACTIVATED  Max Queue Depth: 0  SectorSize: 512 HwMaxSectors: 4592
>>          iBlock device: dm-0  UDEV PATH: /dev/mapper/mpathb readonly: 0
>>          Major: 252 Minor: 0  CLAIMED: IBLOCK
>>
>> (Aside: note the folder name in /sys/kernel/config that contains the
>> wwn/wwpn name with the "naa." prefix.)
>>
>>
> I see... Then again "/sys/kernel" isn't the path used by nodedev driver,
> hence my question above.

Erg, so then the nodedev stuff becomes bigger still.

>
>>> One final question - what is this exposed as on the guest?  "Some"
>>> scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu side
>>> seems to infer some amount of scsi_generic code (read_naa_id), so it's
>>> mostly curiosity/learning for me.
>> Anything that's put behind the target gets seen to the guest.  So that
>> can be one or more sgN LUNs and their sdX equivalents, and as near as I
>> can tell a scsi_hostM as you describe.  A quick test with two LUNs
>> behind one vhost-scsi target gives me the following in the guest:
>>
>> # ls -l /sys/bus/scsi/devices
>> total 0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:0 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:1 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:1
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 host0 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:11 target0:0:1 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1
>> # lsscsi
>> [0:0:1:0]    disk    LIO-ORG  disk0            4.0   /dev/sda
>> [0:0:1:1]    disk    LIO-ORG  disk1            4.0   /dev/sdb
>>
>> FYI, hotplugging/unplugging devices to a target works just fine, except
>> there's no notification that a guest can pick up on to know that a
>> device had been added.  Or worse, taken away.
>>
> Hmm. Maybe I'm just not asking the right question.
>
> I guess my perception was that if you expose the 'scsi_hostN' HBA then
> you are giving the guest the whole HBA to use as it sees fit. The host
> shouldn't be touching the luns afterwards.

Correct, it's just not a "scsi_hostN" HBA that exists as part of the 
normal fibre attachment, but a vHBA that's defined to the vhost-scsi 
driver (via targetcli or some other manual handshaking).

>
> What you've shown is 2 LUNS in the /sys/bus/scsi/ tree; however, is
> there a /sys/bus/scsi_host/hostN that corresponds to the passed through
> HBA?

Nope...  (s390 host)

# ls -l /sys/bus/
total 0
drwxr-xr-x 4 root root 0 Sep 13 23:12 ccw
drwxr-xr-x 4 root root 0 Sep 13 23:12 clockevents
drwxr-xr-x 4 root root 0 Sep 13 23:12 clocksource
drwxr-xr-x 4 root root 0 Sep 13 23:12 container
drwxr-xr-x 4 root root 0 Sep 13 23:12 cpu
drwxr-xr-x 4 root root 0 Sep 13 23:12 css
drwxr-xr-x 4 root root 0 Sep 13 23:12 etr
drwxr-xr-x 4 root root 0 Sep 13 23:12 event_source
drwxr-xr-x 4 root root 0 Sep 13 23:12 memory
drwxr-xr-x 5 root root 0 Sep 13 23:12 pci
drwxr-xr-x 4 root root 0 Sep 13 23:12 platform
drwxr-xr-x 4 root root 0 Sep 13 23:12 scm
drwxr-xr-x 4 root root 0 Sep 13 23:12 scsi
drwxr-xr-x 4 root root 0 Sep 13 23:12 stp
drwxr-xr-x 4 root root 0 Sep 13 23:12 virtio
drwxr-xr-x 4 root root 0 Sep 13 23:12 workqueue



>
> [...]
>
>>>> +          <attribute name="protocol">
>>>> +            <choice>
>>>> +              <value>vhost</value>     <!-- vhost, required -->
>>>> +            </choice>
>>>> +          </attribute>
>>>> +          <attribute name="wwpn">
>>>> +            <data type="string">
>>>> +              <param name="pattern">(naa\.)[0-9a-fA-F]{16}</param>
>>>> +            </data>
>>> If you go with the we'll add "naa." later, then you could just:
>>>
>>> <ref name='wwn'/>
>> This rings a bell with me, as to why I had the label be "wwpn" instead
>> of "wwn" above.  My attempt at trying to distinguish libvirt internals
>> with something user-facing, maybe?
>>
> I think wwpn will still be technically correct (for some strange reason
> after skimming google results).  The HBA will have a port which is how
> it's reached. I suppose the answer though is in how sysfs views things.
>
> [...]
>
>>>>                  "pci",
>>>> -              "scsi")
>>>> +              "scsi",
>>>> +              "scsi_host")
>>>>      VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
>>>>                  VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST,
>>>> @@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
>>>>                  "adapter",
>>>>                  "iscsi")
>>>>    +VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol,
>>>> +              VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
>>>> +              "vhost")
>>>> +
>>> Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first -
>>> this should do something similar (although the preference is to use
>>> "none" - I cannot recall why that was different).
>> Because "none" == "adapter" in the case of SCSI?
>>
> Some sort of RNG magic in commit id '54ac483e6' - I forget the details
> (too long ago), but I think it had to do with being able to reuse the
> existing RNG without creating a whole new option.

Fair enough.

>
> [...]
>
>>> hmmm... If you're going to change this here, then
>>> qemuDomainAttachHostDevice would be changed at the same time.  Hence why
>>> this ordering stuff to patches is important.
>>>
>>> I cannot recall if removing would cause a build error in this case.
>> It does, because of the switch statement not having a "default" case.
>>
> Perhaps a different ordering of patches removes the need.... But I know
> it's tough with this new type... I guess it's odd the *RemoveHostDevice
> needed it, but the corollary Attach didn't... Ahhh - I see why - the
> switch in Attach doesn't have the typecast (virDomainHostdevSubsysType).
>
> I can send a patch for that or you can make the modification too...

I have a patch started, but kept it separate from this series because I 
got distracted/hoped to get v2 out before the last freeze.  I will dust 
that off.

>
> [...]
>
>> Thanks for the review.  (Silent ACK on a lot of the above comments.)
>> I'll try to get these all in place for a v3 with a little runway before
>> the next freeze.
>>
> OK - hopefully I won't be neck deep in some other firefight

Thanks for taking a few minutes away from those to look over this little 
pile.  I appreciate the fires that pop up and get in the way of the 
usual todo list.  :)

> and will be
> able to give them quicker attention. At the very least getting the
> capabilities changes in which is generally the most conflict...

Apologies for that.  I've already rebased/resolved the capability 
conflict that occurred between posting to the list and now, and moved 
that to the head of my series.  Taking a day's holiday tomorrow, so will 
get back to the v3 series Thursday.

  - Eric




More information about the libvir-list mailing list