[libvirt] changes to domain XML for SCSI support
Dave Allan
dallan at redhat.com
Wed Oct 26 16:15:12 UTC 2011
On Wed, Oct 26, 2011 at 05:51:55PM +0200, Paolo Bonzini wrote:
> Hi all,
>
> let's kick off the discussion on what changes are needed in domain
> XML for more complete SCSI support.
>
> There are three relevant topics:
>
> 1) providing channel/target/lun addresses for SCSI disks;
>
> 2) supporting LUN passthrough;
>
> 3) supporting SCSI host passthrough.
>
>
> A fourth topic is supporting NPIV. It is a special case of SCSI
> host passthrough, and it is important that any extension to the
> domain XML can cover it.
>
>
> Enhanced addressing for SCSI devices
> ====================================
>
> This is the simplest part. The proposal is to add a new address type
>
> <address type='scsi' host='...'
> bus='...' target='...' lun='...'/>
>
> where host selects the qdev parent device, while channel/target/lun
> are passed as qdev properties (the QEMU names are respectively
> channel, scsi-id, lun).
>
> Libvirt should check for QEMU 1.0 and, for older versions, only
> allow channel=lun=0 and 0<=target<=7.
>
>
> LUN passthrough
> ===============
>
> A SCSI block device from the host can be attached to a domain in two
> ways: as an emulated LUN with SCSI commands implemented within QEMU,
> or by passing SCSI commands down to the block device. The former is
> handled by the existing <disk type='disk'> and <disk type='cdrom'>
> XML syntax. The latter is not yet supported.
>
> On the QEMU side, LUN passthrough is implemented by one of the
> scsi-generic and scsi-block devices. Scsi-generic requires a
> /dev/sg device name, and can be applied to any device. scsi-block
> is only available in QEMU 1.0 or newer, requires a block device, can
> be applied only to block devices (sd/sr) and has better performance.
> The choice between one and the other should be as transparent as
> possible.
>
> Currently, using a block device as the backend for a virtio disk
> implements a kind of LUN passthrough, since the guest can execute
>
> There are two possible choices here:
>
> 1) add a new <hostdev> tag.
>
> <hostdev mode='subsystem' type='scsi'>
> <source>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </source>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </hostdev>
>
> Advantages:
>
> - allows using the same XML for all SCSI devices (i.e. scsi-generic
> vs. scsi-block is an internal detail of libvirt);
>
> Disadvantages:
>
> - does not make it clear which device is being passed through;
>
> - completely different from the syntax that virtio is using for the
> same purpose; perhaps virtio could be covered by
>
> <hostdev mode='subsystem' type='scsi'>
> <source>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </source>
> <target dev='vda' bus='virtio'/>
> <address type='pci' host='...' bus='...' target='...' lun='...'/>
> </hostdev>
>
> - <address> specifies the address to a <capability type='scsi'>
> device, but the device to be passed to scsi-block is its
> block_sdXX_* child (aside: it would be nice if the /dev/sgNN name
> was placed somewhere in the nodedev XML for <capability type='scsi'>
> devices);
>
> - emulated and passthrough LUNs have a completely different XML;
>
> - host numbers are not stable when hotplugging.
>
> 2) add a new <drive device='lun'> attribute.
>
> <drive type='block' device='lun'>
> <driver name='qemu' type='raw'/>
> <source dev='/dev/sda'/>
> <target dev='sda' bus='scsi'>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </drive>
>
> Advantages:
>
> - allows using the same syntax for virtio and SCSI. virtio could be
> changed to accept device='lun' too.
>
> - the passed-through device is immediately visible
>
> - a stable addressing is available via /dev/disk/by-id and /dev/disk/by-path
>
> - can easily switch a disk between emulated and passthrough modes;
>
> Disadvantages:
>
> - does not extend to scsi-generic and to host passthrough;
>
>
> 3) something between (1) and (2). If I understand correctly
> http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html
> this would use <hostdev mode='capability'>. More on this below.
>
>
> SCSI target/host passthrough: rethinking <hostdev mode='capability'>
> ====================================================================
>
> SCSI target/host passthrough passes the entire set of LUNs attached
> to a SCSI target or host. On the QEMU side, this is done manually
> by adding a scsi_block or scsi_generic device for each LUN.
>
> This can be realized using something like:
>
> <hostdev mode='subsystem' type='scsi_host'>
> <source>
> <address type='scsi' host='...'/>
> </source>
> <address type='scsi' host='...'/>
> </hostdev>
>
> <hostdev mode='subsystem' type='scsi_target'>
> <source>
> <address type='scsi' host='...' bus='...' target='...'/>
> </source>
> <address type='scsi' host='...' bus='...' target='...'/>
> </hostdev>
>
> However, as for LUN passthrough, the main problem is that Linux host
> indices are not stable. Thus, in this case using <hostdev
> mode='capability'> seems like the only reasonable possibility.
>
> That said, <hostdev mode='capability'> has never been documented and
> never even implemented. For this reason, I'm proposing to redo its
> functionality in a different way. The two examples given in
> http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html
> were the following:
>
> >A network card by name (ie for OpenVZ)
> >
> > <hostdev mode='capability'>
> > <source name='eth0'/>
> > </hostdev>
> >
> >A SCSI device by name (eg, SCSI PV passthrough), also specifying
> >the target adress
> >
> > <hostdev mode='capability' type='scsi'>
> > <source name='sg3'/>
> > <target address='0:0:0:0'/>
> > </hostdev>
>
> In my proposal:
>
> 1) the "mode" attribute is dropped (more precisely, only "subsystem"
> is allowed and never printed; everything else is rejected);
>
> 2) the "type" attribute can in principle get any value that is valid
> for a nodedev capability---more or less: for example the usb type
> maps to the usb_device capability; :(
>
> 3) the "source" element can get a name "attribute" pointing to a
> nodedev name, and a "rel" attribute that is "child" or "parent".
> "child" instructs libvirt to search for a device possessing the
> given capability, and that is a child of the named device; "parent"
> instructs libvirt to pick the parent of the indicated device. When
> the "name" attribute is included, the element must be empty.
>
> Given this, here is how the two examples above would look like:
>
> A network card for OpenVZ:
>
> - by name (has adding aliases for nodedevs ever been considered,
> such as simply "eth0" in this case?):
>
> <hostdev type='net'>
> <source name='net_eth0_00_22_68_0b_dc_ac'/>
> </hostdev>
>
> - by position:
>
> <hostdev type='net'>
> <source rel='child' name='pci_0000_00_19_0'/>
> </hostdev>
>
>
> A SCSI device:
>
> - by name:
>
> <hostdev type='scsi'>
> <source name='scsi_0_0_0_0'/>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </hostdev>
>
> - by position (aliases also would allow to specify /dev/sda easily):
>
> <hostdev type='scsi'>
> <source rel='parent' name='block_sda_ST9160411AS_5TG11QWL'/>
> <address type='scsi' host='...' bus='...' target='...' lun='...'/>
> </hostdev>
>
>
> A SCSI host:
>
> - by name:
>
> <hostdev type='scsi_host'>
> <source name='scsi_host0'/>
> <address type='scsi' host='...'/>
> </hostdev>
>
> - by position:
>
> <hostdev type='scsi_host'>
> <source rel='child' name='pci_0000_00_1f_2'/>
> <address type='scsi' host='...'/>
> </hostdev>
>
>
> NPIV support: generalizing hostdev source addresses
> ===================================================
>
> In NPIV, a virtual HBA is created using "virsh nodedev-create" and
> passed to the guest. Such virtual adapter does have a stable
> address, namely its WWN. As such, it can be addressed simply by
> generalizing the kind of source address that can be passed to
> <hostdev type='scsi_host'/>:
>
> <hostdev type='scsi_host'>
> <source>
> <address type='wwn' wwpn='...' wwnn='...'/>
> </source>
> </hostdev>
>
> (Note that this doesn't use <source name='...'/> and, as such, it
> does not rely on the ideas above).
How do you envision migration working with NPIV?
> Ideas and opinions are welcome!
>
> Paolo
>
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
More information about the libvir-list
mailing list