[libvirt] [PATCH v3 10/30] schemas: Introduce disk type NVMe
Michal Privoznik
mprivozn at redhat.com
Tue Dec 10 09:02:21 UTC 2019
On 12/2/19 3:26 PM, Michal Privoznik wrote:
> There is this class of PCI devices that act like disks: NVMe.
> Therefore, they are both PCI devices and disks. While we already
> have <hostdev/> (and can assign a NVMe device to a domain
> successfully) we don't have disk representation. There are three
> problems with PCI assignment in case of a NVMe device:
>
> 1) domains with <hostdev/> can't be migrated
>
> 2) NVMe device is assigned whole, there's no way to assign only a
> namespace
>
> 3) Because hypervisors see <hostdev/> they don't put block layer
> on top of it - users don't get all the fancy features like
> snapshots
>
> NVMe namespaces are way of splitting one continuous NVDIMM memory
> into smaller ones, effectively creating smaller NVMe-s (which can
> then be partitioned, LVMed, etc.)
>
> Because of all of this the following XML was chosen to model a
> NVMe device:
>
> <disk type='nvme' device='disk'>
> <driver name='qemu' type='raw'/>
> <source type='pci' managed='yes' namespace='1'>
> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> </source>
> <target dev='vda' bus='virtio'/>
> </disk>
Last week I've discussed this on IRC with Dan an Maxim (bot CC'ed) and
there was a suggestion to accept /dev/nvmeXXX path instead of PCI
address. The reasoning was that there is a tool that Maxim wrote (alas
not merged into qemu/kvm yet) that acts like a standalone daemon which
does VFIO magic and then serves qemus connecting to it (this allows a
NVMe disk to be shared between multiple qemus which is now not allowed
currently due to VFIO restriction). And if we accepted /dev/nvmeXXX here
we could change the backend less invasively - we could either use qemu's
-drive nvme://XXXX or the new tool.
On the other hand, /dev/nvmeXXX (even though it may be a bit more user
friendly) wouldn't work if host kernel doesn't have NVMe driver or if
the disk is already detached. PCI address as I have it here.
Note that sysfs offers translations both ways [PCI address, namespace]
<-> /dev/nvmeXXX so that shouldn't be a limitation.
Thoughts?
Michal
More information about the libvir-list
mailing list