[libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type

John Ferlan jferlan at redhat.com
Tue Sep 13 20:49:46 UTC 2016


[...]

>>>
>> I took a peek at the v1, but suffice to say didn't follow it that
>> closely since there's a number of changes in v2 related to patch
>> ordering. You could get 3 different opinions on that matter too!
> 
> Hopefully a consensus before too long.  :)
> 

yeah - v1 was lots of little steps that you were asked to combine. OTOH
there are those that like to combine multiple things in one patch which
can make it really difficult to follow. The order here was "followable"
- my suggestion was more to try and get the guts (qemu) done first, then
the fluff (domain/xml).

Consider your patch order to be making progress towards some sustainable
feature/bug fix - each step along the way being able to pass "make check
syntax-check" so that future attempts to git bisect can point the fickle
finger of fate (at someone else!).  If a patch adds something that will
be external that a followup patch will use - then great. If it's
testable, even better. For example, adding virSCSIOpenVhost as a
separate patch with the tests/qemuxml2argvmock.c is fine. One more
tidbit here - it's a very faint memory, but check out
tests/commandtest.c and the lengthy comment in mymain() regarding
"expected" fd values. I think you're "safe" in what you've done -
although your mock should mimic whatever changes you make to the primary
function.

[...]

>> A wwpn doesn't generally have the "naa." prefix on it. So, why not just
>> add that to the command line instead? That is a wwpn is a 16 hex digit
>> value. I'm assuming that in order to support this feature a "naa." is
>> prefixed and sent to qemu. Does that necessarily mean some other
>> hypervisor would choose to place the "naa." prefix or would then code
>> need to be added to strip it off for those other hypervisors?  Looking
>> at the qemu code - it seems fairly specific.
> 
> Fairly certain that it's more a vhost thing of which the hypervisor
> utilizes.  Per a recommendation made on this list years ago, I used
> targetcli to do the configuration rather than direct manipulation of
> sysfs, and according to its man page:
> 
>     All WWNs are prefaced by their type, such as "iqn", "naa", or "ib",
> and may be given with or without colons.
> 

I don't use targetcli so I'm not familiar with it.

Still not convinced using naa.# in the XML is the best way to go, but
I'm not against it. I guess a lot depends on how the sysfs views things.

> As to whether other hypervisors follow the same behavior, I do not
> know.  I was just trying to conform to what it would allow in the case
> of vhost-scsi.  Now, I suppose I'm confusing matters by saying "wwpn"
> here since as you say that's 16 hex digits, and should probably say just
> "wwn"  ?
> 

I guess I've always thought of a wwn/wwpn/wwnn as that string of
bytes...  Usually WWN/WWNN have been interchangeable, while WWPN is a
WWN assigned to a port in a Fibre Channel fabric

FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
will list a wwnn, wwpn, and fabric_wwn.  The fabric_wwn's will match.
The wwnn/wwpn for the vport is what was used when running the
HBA's/vport_create function.  It's a tangled web.  Wait until vGPU's
show up (that's another black magic trick).


[...]

>> It would be nice to tell a user how to find that wwpn value via some
>> command (virsh or otherwise) in order to fill in the wwpn attribute.
>> Although there's always varying opinions on this since many times it's
>> distro dependent. That's why I suggest virsh - at least if it's
>> supportable we can display the wwpn there using nodedev-dumpxml.
> 
> Hrm...  nodedev-dumpxml is new to me, neat!  I'll go down that bunny
> trail, but it seems like it'd be a bit of work to add the smarts for this.
> 

Hours of all sorts of fun.... The 'nodedev-{list|dumpxml}' are
essentially the libvirt mechanism to list/describe the various rabbit
holes that is/are the sysfs tree and especially branches. The
nodedev-list --tree will probably be of most interest, but so is virsh
nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}

>>
>> What/where is the 'vhost_scsi_wwpn'?  IOW: It's not something defined
>> yet - I'm assuming there's some output somewhere where that's pulled
>> from.
> 
> As near as I can tell, it's not anything that can be autogenerated off
> the bat.  Somebody somewhere needs to first establish the sysfs entries
> for the vhost target, which can then be stored and loaded on a next
> (host) boot.  The end result looks something like this:
> 

AFAIK systemd handles the sysfs creation (ok at least for my purposes).

This is what I was concerned about - we really need a way to tell
someone how to "easily" find the wwpn that's to be used. Although I
don't have vhost-scsi on my laptop ;-) - I have to figure that by using
'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that
is an HBA will show a <path> that path would then be a starting point...

Googling "scsi_host HBA wwn" shows me some interesting results including
ones that seem to indicate 'fc_host' is involved in some way even for
the HBA (I guess that makes sense - where the vHBA is just an
abstraction off of that). That said, a '/sys/class/fc_host/' is the
starting point then to find things and virsh nodedev-list --cap vports
would be the way to get capable HBA's.


> [root at xxxxxxxx ~]# ls -l
> /sys/kernel/config/target/vhost/naa.50014056e4794195/tpgt_1/lun/lun_0/
> total 0

Is the 'vhost' or 'vhost/naa.%s' linked from some
/sys/class/{scsi|fc}_host/host%d directory?  I think that becomes the
key for traversal. Or maybe finding "5001...4195" in some other link
from /sys/class/{scsi|fc}_host/....

> lrwxrwxrwx 1 root root    0 Sep 13 15:44 752e150d11 ->
> ../../../../../../target/core/iblock_0/disk0
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_gp
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_offline
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_status
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_write_md
> drwxr-xr-x 5 root root    0 Sep 13 15:44 statistics
> [root at xxxxxxxx ~]# cat /sys/kernel/config/target/core/iblock_0/disk0/info
> Status: ACTIVATED  Max Queue Depth: 0  SectorSize: 512 HwMaxSectors: 4592
>         iBlock device: dm-0  UDEV PATH: /dev/mapper/mpathb readonly: 0
>         Major: 252 Minor: 0  CLAIMED: IBLOCK
> 
> (Aside: note the folder name in /sys/kernel/config that contains the
> wwn/wwpn name with the "naa." prefix.)
> 
> 

I see... Then again "/sys/kernel" isn't the path used by nodedev driver,
hence my question above.

>>
>> One final question - what is this exposed as on the guest?  "Some"
>> scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu side
>> seems to infer some amount of scsi_generic code (read_naa_id), so it's
>> mostly curiosity/learning for me.
> 
> Anything that's put behind the target gets seen to the guest.  So that
> can be one or more sgN LUNs and their sdX equivalents, and as near as I
> can tell a scsi_hostM as you describe.  A quick test with two LUNs
> behind one vhost-scsi target gives me the following in the guest:
> 
> # ls -l /sys/bus/scsi/devices
> total 0
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:0 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:0
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:1 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:1
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 host0 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0
> lrwxrwxrwx 1 root root 0 Sep 13 16:11 target0:0:1 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1
> # lsscsi
> [0:0:1:0]    disk    LIO-ORG  disk0            4.0   /dev/sda
> [0:0:1:1]    disk    LIO-ORG  disk1            4.0   /dev/sdb
> 
> FYI, hotplugging/unplugging devices to a target works just fine, except
> there's no notification that a guest can pick up on to know that a
> device had been added.  Or worse, taken away.
> 

Hmm. Maybe I'm just not asking the right question.

I guess my perception was that if you expose the 'scsi_hostN' HBA then
you are giving the guest the whole HBA to use as it sees fit. The host
shouldn't be touching the luns afterwards.

What you've shown is 2 LUNS in the /sys/bus/scsi/ tree; however, is
there a /sys/bus/scsi_host/hostN that corresponds to the passed through
HBA?

[...]

>>> +          <attribute name="protocol">
>>> +            <choice>
>>> +              <value>vhost</value>     <!-- vhost, required -->
>>> +            </choice>
>>> +          </attribute>
>>> +          <attribute name="wwpn">
>>> +            <data type="string">
>>> +              <param name="pattern">(naa\.)[0-9a-fA-F]{16}</param>
>>> +            </data>
>> If you go with the we'll add "naa." later, then you could just:
>>
>> <ref name='wwn'/>
> 
> This rings a bell with me, as to why I had the label be "wwpn" instead
> of "wwn" above.  My attempt at trying to distinguish libvirt internals
> with something user-facing, maybe?
> 

I think wwpn will still be technically correct (for some strange reason
after skimming google results).  The HBA will have a port which is how
it's reached. I suppose the answer though is in how sysfs views things.

[...]

>>>                 "pci",
>>> -              "scsi")
>>> +              "scsi",
>>> +              "scsi_host")
>>>     VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
>>>                 VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST,
>>> @@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
>>>                 "adapter",
>>>                 "iscsi")
>>>   +VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol,
>>> +              VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
>>> +              "vhost")
>>> +
>> Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first -
>> this should do something similar (although the preference is to use
>> "none" - I cannot recall why that was different).
> 
> Because "none" == "adapter" in the case of SCSI?
> 

Some sort of RNG magic in commit id '54ac483e6' - I forget the details
(too long ago), but I think it had to do with being able to reuse the
existing RNG without creating a whole new option.

[...]

>> hmmm... If you're going to change this here, then
>> qemuDomainAttachHostDevice would be changed at the same time.  Hence why
>> this ordering stuff to patches is important.
>>
>> I cannot recall if removing would cause a build error in this case.
> 
> It does, because of the switch statement not having a "default" case.
> 

Perhaps a different ordering of patches removes the need.... But I know
it's tough with this new type... I guess it's odd the *RemoveHostDevice
needed it, but the corollary Attach didn't... Ahhh - I see why - the
switch in Attach doesn't have the typecast (virDomainHostdevSubsysType).

I can send a patch for that or you can make the modification too...

[...]

> Thanks for the review.  (Silent ACK on a lot of the above comments.) 
> I'll try to get these all in place for a v3 with a little runway before
> the next freeze.
> 

OK - hopefully I won't be neck deep in some other firefight and will be
able to give them quicker attention. At the very least getting the
capabilities changes in which is generally the most conflict...

John




More information about the libvir-list mailing list