[libvirt] [PATCH v3 10/30] schemas: Introduce disk type NVMe
Cole Robinson
crobinso at redhat.com
Mon Dec 9 22:55:12 UTC 2019
On 12/2/19 9:26 AM, Michal Privoznik wrote:
> There is this class of PCI devices that act like disks: NVMe.
> Therefore, they are both PCI devices and disks. While we already
> have <hostdev/> (and can assign a NVMe device to a domain
> successfully) we don't have disk representation. There are three
> problems with PCI assignment in case of a NVMe device:
>
> 1) domains with <hostdev/> can't be migrated
>
> 2) NVMe device is assigned whole, there's no way to assign only a
> namespace
>
> 3) Because hypervisors see <hostdev/> they don't put block layer
> on top of it - users don't get all the fancy features like
> snapshots
>
> NVMe namespaces are way of splitting one continuous NVDIMM memory
> into smaller ones, effectively creating smaller NVMe-s (which can
> then be partitioned, LVMed, etc.)
>
> Because of all of this the following XML was chosen to model a
> NVMe device:
>
> <disk type='nvme' device='disk'>
> <driver name='qemu' type='raw'/>
> <source type='pci' managed='yes' namespace='1'>
> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> </source>
> <target dev='vda' bus='virtio'/>
> </disk>
>
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
> docs/formatdomain.html.in | 57 +++++++++++++++++++++++--
> docs/schemas/domaincommon.rng | 32 ++++++++++++++
> tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++
> 3 files changed, 149 insertions(+), 3 deletions(-)
> create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
>
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index 6df4a8b26e..fe871d933f 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -2944,6 +2944,13 @@
> </backingStore>
> <target dev='vdd' bus='virtio'/>
> </disk>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='yes' namespace='1'>
> + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> + </source>
> + <target dev='vde' bus='virtio'/>
> + </disk>
> </devices>
> ...</pre>
>
> @@ -2957,7 +2964,8 @@
> Valid values are "file", "block",
> "dir" (<span class="since">since 0.7.5</span>),
> "network" (<span class="since">since 0.8.7</span>), or
> - "volume" (<span class="since">since 1.0.5</span>)
> + "volume" (<span class="since">since 1.0.5</span>), or
> + "nvme" (<span class="since">since 5.6.0</span>)
6.0.0 or whatever version this will land in
> and refer to the underlying source for the disk.
> <span class="since">Since 0.0.3</span>
> </dd>
> @@ -3140,6 +3148,43 @@
> <span class="since">Since 1.0.5</span>
> </p>
> </dd>
> + <dt><code>nvme</code></dt>
> + <dd>
> + To specify disk source for NVMe disk the <code>source</code>
> + element has the following attributes:
> + <dl>
> + <dt><code>type</code></dt>
> + <dd>The type of address specified in <code>address</code>
> + sub-element. Currently, only <code>pci</code> value is
> + accepted.
> + </dd>
> +
> + <dt><code>managed</code></dt>
> + <dd>This attribute instructs libvirt to detach NVMe
> + controller automatically on domain startup (<code>yes</code>)
> + or expect the controller to be detached by system
> + administrator (<code>no</code>).
> + </dd>
> +
> + <dt><code>namespace</code></dt>
> + <dd>The namespace ID which should be assigned to the domain.
> + According to NVMe standard, namespace numbers start from 1,
> + including.
> + </dd>
> + </dl>
> +
> + The difference between <code><disk type='nvme'></code>
> + and <code><hostdev/></code> is that the latter is plain
> + host device assignment with all its limitations (e.g. no live
> + migration), while the former makes hypervisor to run the NVMe
> + disk through hypervisor's block layer thus enabling all
> + features provided by the layer (e.g. snapshots, domain
> + migration, etc.). Moreover, since the NVMe disk is unbinded
> + from its PCI driver, the host kernel storage stack is not
> + involved (compared to passing say <code>/dev/nvme0n1</code> via
> + <code><disk type='block'></code> and therefore lower
> + latencies can be achieved.
> + </dd>
> </dl>
> With "file", "block", and "volume", one or more optional
> sub-elements <code>seclabel</code>, <a href="#seclabel">described
> @@ -3302,11 +3347,17 @@
> initiator IQN needed to access the source via mandatory
> attribute <code>name</code>.
> </dd>
> + <dt><code>address</code></dt>
> + <dd>For disk of type <code>nvme</code> this element
> + specifies the PCI address of the host NVMe
> + controller.
> + <span class="since">Since 5.6.0</span>
Same
> + </dd>
> </dl>
>
> <p>
> - For a "file" or "volume" disk type which represents a cdrom or floppy
> - (the <code>device</code> attribute), it is possible to define
> + For a "file" or "volume" disk type which represents a cdrom or
> + floppy (the <code>device</code> attribute), it is possible to define
Stray change?
Also, tn the test XML you need to "s/qemu-system-i686/qemu-system-i386/"
or you'll hit a weird error. And VIR_TEST_REGENERATE_OUTPUT is also
busted, see my patches elsewhere on this list.
Reviewed-by: Cole Robinson <crobinso at redhat.com>
- Cole
More information about the libvir-list
mailing list