Libvirt NVME support

Daniel P. Berrangé berrange at redhat.com
Mon Nov 23 15:01:31 UTC 2020


On Mon, Nov 23, 2020 at 03:36:42PM +0100, Peter Krempa wrote:
> On Mon, Nov 23, 2020 at 15:32:20 +0100, Michal Privoznik wrote:
> > On 11/23/20 3:03 PM, Daniel P. Berrangé wrote:
> > > On Wed, Nov 18, 2020 at 11:24:30AM +0100, Peter Krempa wrote:
> > > > On Wed, Nov 18, 2020 at 09:57:14 +0000, Thanos Makatos wrote:
> > > > > > As a separate question, is there any performance benefit of emulating a
> > > > > > NVMe controller compared to e.g. virtio-scsi?
> > > > > 
> > > > > We haven't measured that yet; I would expect it to be slight faster and/or more
> > > > > CPU efficient but wouldn't be surprised if it isn't. The main benefit of using
> > > > > NVMe is that we don't have to install VirtIO drivers in the guest.
> > > > 
> > > > Okay, I'm not sold on the drivers bit but that is definitely not a
> > > > problem in regards of adding support for emulating NVME controllers to
> > > > libvirt.
> > > > 
> > > > As a starting point a trivial way to model this in the XML will be:
> > > > 
> > > >      <controller type='nvme' index='1' model='nvme'>
> > > > 
> > > > And then add the storage into it as:
> > > > 
> > > >      <disk type='file' device='disk'>
> > > >        <source dev='/Host/QEMUGuest1.qcow2'/>
> > > >        <target dev='sda' bus='nvme'/>
> > > >        <address type='drive' controller='1' bus='0' target='0' unit='0'/>
> > > >      </disk>
> > > > 
> > > >      <disk type='file' device='disk'>
> > > >        <source dev='/Host/QEMUGuest2.qcow2'/>
> > > >        <target dev='sdb' bus='nvme'/>
> > > >        <address type='drive' controller='1' bus='0' target='0' unit='1'/>
> > > >      </disk>
> > > > 
> > > > The 'drive' address here maps the disk to the controller. This example
> > > > uses unit= as the way to specify the namespace ID. Both 'bus' and 'target'
> > > > must be 0.
> > > 
> > > FWIW, I think that our overloeading of type=drive for FDC, IDE, and SCSI
> > > was a mistake in retrospect. We should have had type=fdc, type=ide, type=scsi,
> > > since each uses a different subset of the attributes.
> > > 
> > > Lets not continue this mistake with NVME - create a type=nvme address
> > > type.
> > 
> > Don't NVMes live on a PCI(e) bus? Can't we just threat NVMes as PCI devices?
> > Or are we targeting sata too? Bcause we also have that type of address.
> 
> No, the NVMe controller lives on PCIe. Here we are trying to emulate a
> NVMe controller (as <contoller> if you look elsewhere in the other
> subthread. The <disk> element here maps to individual emulated
> namespaces for the emulated NVMe controller.
> 
> If we'd try to map one <disk> per PCIe device, you'd prevent us from
> emulating multiple namespaces.

The odd thing here is that we're trying expose different host backing
store for each namespace, hence the need to expose multiple <disk>.

Does it even make sense if you expose a namespace "2" without first
exposing a namespace "1" ?

It makes me a little uneasy, as it feels like  trying to export an
regular disk, where we have a different host backing store for each
partition. The difference I guess is that partition tables are a purely
software construct, where as namespaces are a hardware construct.
Exposing individual partitions to a disk was done in Xen, but most
people think it was kind of a mistake, as you could get a partition
without any containing disk. At least in this case we do have a
NVME controller present so the namespace isn't orphaned, like the
old Xen partitons.



The alternative is to say only one host backing store, and then either
let the guest dynamically carve it up into namespaces, or have some
data format in the host backing store to represent the namespaces, or
have an XML element to specify the regions of host backing that
correspond to namespaces, eg

  <disk type="file" device="nvme">
     <source file="/some/file.qcow"/>
     <target bus="nvme"/>
     <namespaces>
        <region offset="0" size="1024000"/>
        <region offset="1024000" size="2024000"/>
        <region offset="2024000" size="4024000"/>
     </namespaces>
     <address type="pci" .../>
  </disk>

this is of course less flexible, and I'm not entirely serious about
suggesting this, but its an option that exists none the less.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list