[libvirt] [PATCH v1 10/31] schemas: Introduce disk type NVMe

Michal Privoznik mprivozn at redhat.com
Tue Jul 16 16:23:32 UTC 2019


On 7/16/19 2:35 PM, Peter Krempa wrote:
> On Thu, Jul 11, 2019 at 17:53:57 +0200, Michal Privoznik wrote:
>> There is this class of PCI devices that act like disks: NVMe.
>> Therefore, they are both PCI devices and disks. While we already
>> have <hostdev/> (and can assign a NVMe device to a domain
>> successfully) we don't have disk representation. There are three
>> problems with PCI assignment in case of a NVMe device:
>>
>> 1) domains with <hostdev/> can't be migrated
>>
>> 2) NVMe device is assigned whole, there's no way to assign only a
>>     namespace
>>
>> 3) Because hypervisors see <hostdev/> they don't put block layer
>>     on top of it - users don't get all the fancy features like
>>     snapshots
>>
>> NVMe namespaces are way of splitting one continuous NVDIMM memory
> 
> s/NVDIMM memory/NVMe device/
> 
>> into smaller ones, effectively creating smaller NVMe-s (which can
>> then be partitioned, LVMed, etc.)
>>
>> Because of all of this the following XML was chosen to model a
>> NVMe device:
>>
>>    <disk type='nvme' device='disk'>
>>      <driver name='qemu' type='raw'/>
>>      <source type='pci' managed='yes' namespace='1'>
>>        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>>      </source>
>>      <target dev='vda' bus='virtio'/>
>>    </disk>
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>   docs/formatdomain.html.in            | 45 +++++++++++++++++++++--
>>   docs/schemas/domaincommon.rng        | 32 ++++++++++++++++
>>   tests/qemuxml2argvdata/disk-nvme.xml | 55 ++++++++++++++++++++++++++++
>>   3 files changed, 129 insertions(+), 3 deletions(-)
>>   create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
>>
>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>> index a7a6ec32a5..545578076d 100644
>> --- a/docs/formatdomain.html.in
>> +++ b/docs/formatdomain.html.in
>> @@ -2922,6 +2922,13 @@
>>       </backingStore>
>>       <target dev='vdd' bus='virtio'/>
>>     </disk>
>> +  <disk type='nvme' device='disk'>
>> +    <driver name='qemu' type='raw'/>
>> +    <source type='pci' managed='yes' namespace='1'>
> 
> The 'type' filed may get confusing a bit as it is supposed to be stored
> in virStorageSource->nvme->type, while virStorageSource has it's own
> type.

Well, I'm trygin to mix two things here: <disk/> and <hostdev/> (well, 
PCI devices). There are other types of NVMe than PCIe attached ones. 
They are called NVMe-oF (NVMe over Fabrics), so I've figured that we 
want to specifically tell that NVMe we are dealing with here is the PCIe 
one. But I guess it can also be added later, if we ever decide to 
support NVMe-oF (disks without @type would have 'pci' autofilled in a 
post parse callback).

> 
> Also I'm pondering whether managed='yes' belongs as a top-level
> attribute under 'source' as it's possibly specific to the 'pci' setting
> of type.

Yeah, designing future proof XML is hard. I'm open to any suggestion.

> 
>> +      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>> +    </source>
>> +    <target dev='vde' bus='virtio'/>
>> +  </disk>
>>   </devices>
>>   ...</pre>
>>   
> 
> [...]
> 
>> @@ -3118,6 +3126,31 @@
>>                 <span class="since">Since 1.0.5</span>
>>                 </p>
>>                 </dd>
>> +            <dt><code>nvme</code></dt>
>> +              <dd>
>> +              To specify disk source for NVMe disk the <code>source</code>
>> +              element has the following attributes:
>> +              <dl>
>> +                <dt><code>type</code></dt>
>> +                <dd>The type of address specified in <code>address</code>
>> +                sub-element. Currently, only <code>pci</code> value is
>> +                accepted.
>> +                </dd>
>> +
>> +                <dt><code>managed</code></dt>
>> +                <dd>This attribute instructs libvirt to detach NVMe
>> +                controller automatically on domain startup (<code>yes</code>)
>> +                or expect the controller to be detached by system
>> +                administrator (<code>no</code>).
>> +                </dd>
>> +
>> +                <dt><code>namespace</code></dt>
>> +                <dd>The namespace ID which should be assigned to the domain.
>> +                According to NVMe standard, namespace numbers start from 1,
>> +                including.
>> +                </dd>
>> +              </dl>
>> +              </dd>
>>             </dl>
>>           With "file", "block", and "volume", one or more optional
>>           sub-elements <code>seclabel</code>, <a href="#seclabel">described
>> @@ -3280,11 +3313,17 @@
>>               initiator IQN needed to access the source via mandatory
>>               attribute <code>name</code>.
>>             </dd>
>> +          <dt><code>address</code></dt>
>> +          <dd>For disk of type <code>nvme</code> this element
>> +            specifies the PCI address of the host NVMe
>> +            controller.
>> +            <span class="since">Since 5.5.0</span>
>> +          </dd>
>>           </dl>
>>   
>>           <p>
>> -        For a "file" or "volume" disk type which represents a cdrom or floppy
>> -        (the <code>device</code> attribute), it is possible to define
>> +        For a "file", "volume" or "nvme" disk type which represents a cdrom or
>> +        floppy (the <code>device</code> attribute), it is possible to define
> 
> You specifically forbid startup policy in the next commit, so what's the
> point of documenting it here?

Ah, good point. Back in the day I still wanted to make startupPolicy 
work, but then decided not to.

> 
> 
>>           policy what to do with the disk if the source file is not accessible.
>>           (NB, <code>startupPolicy</code> is not valid for "volume" disk unless
>>            the specified storage volume is of "file" type). This is done by the
>> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
>> index 31db599ab9..f367e8f6fd 100644
>> --- a/docs/schemas/domaincommon.rng
>> +++ b/docs/schemas/domaincommon.rng
> 
> [...]
> 
>> @@ -1918,6 +1919,37 @@
>>       </optional>
>>     </define>
>>   
>> +  <define name="diskSourceNvme">
>> +    <attribute name="type">
>> +      <value>nvme</value>
>> +    </attribute>
>> +    <optional>
>> +      <element name="source">
>> +        <attribute name="type">
>> +          <value>pci</value>
>> +        </attribute>
>> +        <attribute name="namespace">
>> +          <ref name="uint32"/>
>> +        </attribute>
>> +        <optional>
>> +          <attribute name="managed">
>> +            <ref name="virYesNo"/>
>> +          </attribute>
>> +        </optional>
>> +        <element name="address">
>> +          <ref name="pciaddress"/>
>> +        </element>
>> +        <ref name="diskSourceCommon"/>
>> +        <optional>
>> +          <ref name="storageStartupPolicy"/>
>> +        </optional>
>> +        <optional>
>> +          <ref name="encryption"/>
>> +        </optional>
>> +      </element>
>> +    </optional>
>> +  </define>
>> +
>>     <define name="diskTarget">
>>       <data type="string">
>>         <param name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param>
>> diff --git a/tests/qemuxml2argvdata/disk-nvme.xml b/tests/qemuxml2argvdata/disk-nvme.xml
>> new file mode 100644
>> index 0000000000..0b3dbad4eb
>> --- /dev/null
>> +++ b/tests/qemuxml2argvdata/disk-nvme.xml
>> @@ -0,0 +1,55 @@
>> +<domain type='qemu'>
>> +  <name>QEMUGuest1</name>
>> +  <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
>> +  <memory unit='KiB'>219136</memory>
>> +  <currentMemory unit='KiB'>219136</currentMemory>
>> +  <vcpu placement='static'>1</vcpu>
>> +  <os>
>> +    <type arch='i686' machine='pc'>hvm</type>
>> +    <boot dev='hd'/>
>> +  </os>
>> +  <clock offset='utc'/>
>> +  <on_poweroff>destroy</on_poweroff>
>> +  <on_reboot>restart</on_reboot>
>> +  <on_crash>destroy</on_crash>
>> +  <devices>
>> +    <emulator>/usr/bin/qemu-system-i686</emulator>
>> +    <disk type='nvme' device='disk'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source type='pci' managed='yes' namespace='1'>
>> +        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>> +      </source>
>> +      <target dev='vda' bus='virtio'/>
>> +    </disk>
>> +    <disk type='nvme' device='disk'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source type='pci' managed='yes' namespace='2'>
>> +        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> 
> Make at heast one of them use qcow2 as format.

Okay.

> 
>> +      </source>
>> +      <target dev='vdb' bus='virtio'/>
>> +    </disk>
>> +    <disk type='nvme' device='disk'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source type='pci' managed='no' namespace='1'>
>> +        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
>> +      </source>
>> +      <target dev='vdc' bus='virtio'/>
>> +    </disk>
>> +    <disk type='nvme' device='disk'>
>> +      <driver name='qemu' type='raw'/>
>> +      <source type='pci' managed='no' namespace='2'>
>> +        <address domain='0x0001' bus='0x02' slot='0x00' function='0x0'/>
>> +        <encryption format='luks'>
>> +          <secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
>> +        </encryption>
>> +      </source>
>> +      <target dev='vdd' bus='virtio'/>
>> +    </disk>
>> +    <controller type='usb' index='0'/>
>> +    <controller type='pci' index='0' model='pci-root'/>
>> +    <controller type='scsi' index='0' model='virtio-scsi'/>
>> +    <input type='mouse' bus='ps2'/>
>> +    <input type='keyboard' bus='ps2'/>
>> +    <memballoon model='none'/>
>> +  </devices>
>> +</domain>
> 
> I'm also missing any form of documentation describing the caveats (e.g.
> users should not pass in a NVMe disk the host is using)

Since I'm using virHostdevManager to track used NVMe-s, any attempt to 
start a domain with a taken NVMe disk will fail. So users will see that 
immediately :-) But okay, I will enhance the docs.

Michal




More information about the libvir-list mailing list