[libvirt] [PATCH v2 1/5] Introduce a "scsi_host" hostdev type
Eric Farman
farman at linux.vnet.ibm.com
Tue Sep 13 21:48:15 UTC 2016
On 09/13/2016 04:49 PM, John Ferlan wrote:
> [...]
>
>>> I took a peek at the v1, but suffice to say didn't follow it that
>>> closely since there's a number of changes in v2 related to patch
>>> ordering. You could get 3 different opinions on that matter too!
>> Hopefully a consensus before too long. :)
>>
> yeah - v1 was lots of little steps that you were asked to combine. OTOH
> there are those that like to combine multiple things in one patch which
> can make it really difficult to follow. The order here was "followable"
> - my suggestion was more to try and get the guts (qemu) done first, then
> the fluff (domain/xml).
Agreed. v1 was perhaps overkill in breaking things down so small,
because I was picking up an idea with a bit of dust on it.
>
> Consider your patch order to be making progress towards some sustainable
> feature/bug fix - each step along the way being able to pass "make check
> syntax-check" so that future attempts to git bisect can point the fickle
> finger of fate (at someone else!). If a patch adds something that will
> be external that a followup patch will use - then great. If it's
> testable, even better. For example, adding virSCSIOpenVhost as a
> separate patch with the tests/qemuxml2argvmock.c is fine. One more
> tidbit here - it's a very faint memory, but check out
> tests/commandtest.c and the lengthy comment in mymain() regarding
> "expected" fd values. I think you're "safe" in what you've done -
> although your mock should mimic whatever changes you make to the primary
> function.
>
> [...]
>
>>> A wwpn doesn't generally have the "naa." prefix on it. So, why not just
>>> add that to the command line instead? That is a wwpn is a 16 hex digit
>>> value. I'm assuming that in order to support this feature a "naa." is
>>> prefixed and sent to qemu. Does that necessarily mean some other
>>> hypervisor would choose to place the "naa." prefix or would then code
>>> need to be added to strip it off for those other hypervisors? Looking
>>> at the qemu code - it seems fairly specific.
>> Fairly certain that it's more a vhost thing of which the hypervisor
>> utilizes. Per a recommendation made on this list years ago, I used
>> targetcli to do the configuration rather than direct manipulation of
>> sysfs, and according to its man page:
>>
>> All WWNs are prefaced by their type, such as "iqn", "naa", or "ib",
>> and may be given with or without colons.
>>
> I don't use targetcli so I'm not familiar with it.
>
> Still not convinced using naa.# in the XML is the best way to go, but
> I'm not against it. I guess a lot depends on how the sysfs views things.
>
>> As to whether other hypervisors follow the same behavior, I do not
>> know. I was just trying to conform to what it would allow in the case
>> of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn"
>> here since as you say that's 16 hex digits, and should probably say just
>> "wwn" ?
>>
> I guess I've always thought of a wwn/wwpn/wwnn as that string of
> bytes...
I always thought WWPN and WWNN as types of WWN...
> Usually WWN/WWNN have been interchangeable, while WWPN is a
> WWN assigned to a port in a Fibre Channel fabric
...but have seen it this way also. Either way makes sense to me.
>
> FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
> will list a wwnn, wwpn, and fabric_wwn. The fabric_wwn's will match.
> The wwnn/wwpn for the vport is what was used when running the
> HBA's/vport_create function. It's a tangled web. Wait until vGPU's
> show up (that's another black magic trick).
>
>
> [...]
>
>>> It would be nice to tell a user how to find that wwpn value via some
>>> command (virsh or otherwise) in order to fill in the wwpn attribute.
>>> Although there's always varying opinions on this since many times it's
>>> distro dependent. That's why I suggest virsh - at least if it's
>>> supportable we can display the wwpn there using nodedev-dumpxml.
>> Hrm... nodedev-dumpxml is new to me, neat! I'll go down that bunny
>> trail, but it seems like it'd be a bit of work to add the smarts for this.
>>
> Hours of all sorts of fun.... The 'nodedev-{list|dumpxml}' are
> essentially the libvirt mechanism to list/describe the various rabbit
> holes that is/are the sysfs tree and especially branches. The
> nodedev-list --tree will probably be of most interest, but so is virsh
> nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}
>
>>> What/where is the 'vhost_scsi_wwpn'? IOW: It's not something defined
>>> yet - I'm assuming there's some output somewhere where that's pulled
>>> from.
>> As near as I can tell, it's not anything that can be autogenerated off
>> the bat. Somebody somewhere needs to first establish the sysfs entries
>> for the vhost target, which can then be stored and loaded on a next
>> (host) boot. The end result looks something like this:
>>
> AFAIK systemd handles the sysfs creation (ok at least for my purposes).
>
> This is what I was concerned about - we really need a way to tell
> someone how to "easily" find the wwpn that's to be used. Although I
> don't have vhost-scsi on my laptop ;-)
Me neither. Of course, I'm using an s390, so that's a little different
too. :)
> - I have to figure that by using
> 'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that
> is an HBA will show a <path> that path would then be a starting point...
>
> Googling "scsi_host HBA wwn" shows me some interesting results including
> ones that seem to indicate 'fc_host' is involved in some way even for
> the HBA (I guess that makes sense - where the vHBA is just an
> abstraction off of that). That said, a '/sys/class/fc_host/' is the
> starting point then to find things and virsh nodedev-list --cap vports
> would be the way to get capable HBA's.
/sys/class/fc_host is just showing the individual SCSI disks, but
nothing that looks to point to the vhost-scsi target that I'm passing to
QEMU. /sys/class/fc_vports is empty. (Oh wait, is NPIV turned on
here? Ugh, pick this up later.)
>
>
>> [root at xxxxxxxx ~]# ls -l
>> /sys/kernel/config/target/vhost/naa.50014056e4794195/tpgt_1/lun/lun_0/
>> total 0
> Is the 'vhost' or 'vhost/naa.%s' linked from some
> /sys/class/{scsi|fc}_host/host%d directory?
No, I can't see anybody linking it from anywhere except the couple cross
links in /sys/kernel/config/ shown below.
> I think that becomes the
> key for traversal. Or maybe finding "5001...4195" in some other link
> from /sys/class/{scsi|fc}_host/....
>
>> lrwxrwxrwx 1 root root 0 Sep 13 15:44 752e150d11 ->
>> ../../../../../../target/core/iblock_0/disk0
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_gp
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_offline
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_status
>> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_write_md
>> drwxr-xr-x 5 root root 0 Sep 13 15:44 statistics
>> [root at xxxxxxxx ~]# cat /sys/kernel/config/target/core/iblock_0/disk0/info
>> Status: ACTIVATED Max Queue Depth: 0 SectorSize: 512 HwMaxSectors: 4592
>> iBlock device: dm-0 UDEV PATH: /dev/mapper/mpathb readonly: 0
>> Major: 252 Minor: 0 CLAIMED: IBLOCK
>>
>> (Aside: note the folder name in /sys/kernel/config that contains the
>> wwn/wwpn name with the "naa." prefix.)
>>
>>
> I see... Then again "/sys/kernel" isn't the path used by nodedev driver,
> hence my question above.
Erg, so then the nodedev stuff becomes bigger still.
>
>>> One final question - what is this exposed as on the guest? "Some"
>>> scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu side
>>> seems to infer some amount of scsi_generic code (read_naa_id), so it's
>>> mostly curiosity/learning for me.
>> Anything that's put behind the target gets seen to the guest. So that
>> can be one or more sgN LUNs and their sdX equivalents, and as near as I
>> can tell a scsi_hostM as you describe. A quick test with two LUNs
>> behind one vhost-scsi target gives me the following in the guest:
>>
>> # ls -l /sys/bus/scsi/devices
>> total 0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:0 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:1 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:1
>> lrwxrwxrwx 1 root root 0 Sep 13 16:10 host0 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0
>> lrwxrwxrwx 1 root root 0 Sep 13 16:11 target0:0:1 ->
>> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1
>> # lsscsi
>> [0:0:1:0] disk LIO-ORG disk0 4.0 /dev/sda
>> [0:0:1:1] disk LIO-ORG disk1 4.0 /dev/sdb
>>
>> FYI, hotplugging/unplugging devices to a target works just fine, except
>> there's no notification that a guest can pick up on to know that a
>> device had been added. Or worse, taken away.
>>
> Hmm. Maybe I'm just not asking the right question.
>
> I guess my perception was that if you expose the 'scsi_hostN' HBA then
> you are giving the guest the whole HBA to use as it sees fit. The host
> shouldn't be touching the luns afterwards.
Correct, it's just not a "scsi_hostN" HBA that exists as part of the
normal fibre attachment, but a vHBA that's defined to the vhost-scsi
driver (via targetcli or some other manual handshaking).
>
> What you've shown is 2 LUNS in the /sys/bus/scsi/ tree; however, is
> there a /sys/bus/scsi_host/hostN that corresponds to the passed through
> HBA?
Nope... (s390 host)
# ls -l /sys/bus/
total 0
drwxr-xr-x 4 root root 0 Sep 13 23:12 ccw
drwxr-xr-x 4 root root 0 Sep 13 23:12 clockevents
drwxr-xr-x 4 root root 0 Sep 13 23:12 clocksource
drwxr-xr-x 4 root root 0 Sep 13 23:12 container
drwxr-xr-x 4 root root 0 Sep 13 23:12 cpu
drwxr-xr-x 4 root root 0 Sep 13 23:12 css
drwxr-xr-x 4 root root 0 Sep 13 23:12 etr
drwxr-xr-x 4 root root 0 Sep 13 23:12 event_source
drwxr-xr-x 4 root root 0 Sep 13 23:12 memory
drwxr-xr-x 5 root root 0 Sep 13 23:12 pci
drwxr-xr-x 4 root root 0 Sep 13 23:12 platform
drwxr-xr-x 4 root root 0 Sep 13 23:12 scm
drwxr-xr-x 4 root root 0 Sep 13 23:12 scsi
drwxr-xr-x 4 root root 0 Sep 13 23:12 stp
drwxr-xr-x 4 root root 0 Sep 13 23:12 virtio
drwxr-xr-x 4 root root 0 Sep 13 23:12 workqueue
>
> [...]
>
>>>> + <attribute name="protocol">
>>>> + <choice>
>>>> + <value>vhost</value> <!-- vhost, required -->
>>>> + </choice>
>>>> + </attribute>
>>>> + <attribute name="wwpn">
>>>> + <data type="string">
>>>> + <param name="pattern">(naa\.)[0-9a-fA-F]{16}</param>
>>>> + </data>
>>> If you go with the we'll add "naa." later, then you could just:
>>>
>>> <ref name='wwn'/>
>> This rings a bell with me, as to why I had the label be "wwpn" instead
>> of "wwn" above. My attempt at trying to distinguish libvirt internals
>> with something user-facing, maybe?
>>
> I think wwpn will still be technically correct (for some strange reason
> after skimming google results). The HBA will have a port which is how
> it's reached. I suppose the answer though is in how sysfs views things.
>
> [...]
>
>>>> "pci",
>>>> - "scsi")
>>>> + "scsi",
>>>> + "scsi_host")
>>>> VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
>>>> VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST,
>>>> @@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
>>>> "adapter",
>>>> "iscsi")
>>>> +VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol,
>>>> + VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
>>>> + "vhost")
>>>> +
>>> Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first -
>>> this should do something similar (although the preference is to use
>>> "none" - I cannot recall why that was different).
>> Because "none" == "adapter" in the case of SCSI?
>>
> Some sort of RNG magic in commit id '54ac483e6' - I forget the details
> (too long ago), but I think it had to do with being able to reuse the
> existing RNG without creating a whole new option.
Fair enough.
>
> [...]
>
>>> hmmm... If you're going to change this here, then
>>> qemuDomainAttachHostDevice would be changed at the same time. Hence why
>>> this ordering stuff to patches is important.
>>>
>>> I cannot recall if removing would cause a build error in this case.
>> It does, because of the switch statement not having a "default" case.
>>
> Perhaps a different ordering of patches removes the need.... But I know
> it's tough with this new type... I guess it's odd the *RemoveHostDevice
> needed it, but the corollary Attach didn't... Ahhh - I see why - the
> switch in Attach doesn't have the typecast (virDomainHostdevSubsysType).
>
> I can send a patch for that or you can make the modification too...
I have a patch started, but kept it separate from this series because I
got distracted/hoped to get v2 out before the last freeze. I will dust
that off.
>
> [...]
>
>> Thanks for the review. (Silent ACK on a lot of the above comments.)
>> I'll try to get these all in place for a v3 with a little runway before
>> the next freeze.
>>
> OK - hopefully I won't be neck deep in some other firefight
Thanks for taking a few minutes away from those to look over this little
pile. I appreciate the fires that pop up and get in the way of the
usual todo list. :)
> and will be
> able to give them quicker attention. At the very least getting the
> capabilities changes in which is generally the most conflict...
Apologies for that. I've already rebased/resolved the capability
conflict that occurred between posting to the list and now, and moved
that to the head of my series. Taking a day's holiday tomorrow, so will
get back to the v3 series Thursday.
- Eric
More information about the libvir-list
mailing list