[libvirt] [PATCH v3 10/30] schemas: Introduce disk type NVMe

Michal Privoznik mprivozn at redhat.com
Tue Dec 10 09:02:21 UTC 2019


On 12/2/19 3:26 PM, Michal Privoznik wrote:
> There is this class of PCI devices that act like disks: NVMe.
> Therefore, they are both PCI devices and disks. While we already
> have <hostdev/> (and can assign a NVMe device to a domain
> successfully) we don't have disk representation. There are three
> problems with PCI assignment in case of a NVMe device:
> 
> 1) domains with <hostdev/> can't be migrated
> 
> 2) NVMe device is assigned whole, there's no way to assign only a
>     namespace
> 
> 3) Because hypervisors see <hostdev/> they don't put block layer
>     on top of it - users don't get all the fancy features like
>     snapshots
> 
> NVMe namespaces are way of splitting one continuous NVDIMM memory
> into smaller ones, effectively creating smaller NVMe-s (which can
> then be partitioned, LVMed, etc.)
> 
> Because of all of this the following XML was chosen to model a
> NVMe device:
> 
>    <disk type='nvme' device='disk'>
>      <driver name='qemu' type='raw'/>
>      <source type='pci' managed='yes' namespace='1'>
>        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>      </source>
>      <target dev='vda' bus='virtio'/>
>    </disk>


Last week I've discussed this on IRC with Dan an Maxim (bot CC'ed) and 
there was a suggestion to accept /dev/nvmeXXX path instead of PCI 
address. The reasoning was that there is a tool that Maxim wrote (alas 
not merged into qemu/kvm yet) that acts like a standalone daemon which 
does VFIO magic and then serves qemus connecting to it (this allows a 
NVMe disk to be shared between multiple qemus which is now not allowed 
currently due to VFIO restriction). And if we accepted /dev/nvmeXXX here 
we could change the backend less invasively - we could either use qemu's 
-drive nvme://XXXX or the new tool.

On the other hand, /dev/nvmeXXX (even though it may be a bit more user 
friendly) wouldn't work if host kernel doesn't have NVMe driver or if 
the disk is already detached. PCI address as I have it here.

Note that sysfs offers translations both ways [PCI address, namespace] 
<-> /dev/nvmeXXX so that shouldn't be a limitation.

Thoughts?

Michal




More information about the libvir-list mailing list