[libvirt] [PATCH v2 01/10] Introduce NVDIMM memory model

John Ferlan jferlan at redhat.com
Wed Aug 31 22:51:12 UTC 2016



On 08/11/2016 09:26 AM, Michal Privoznik wrote:
> NVDIMM is new type of memory introduced in qemu. The idea is that

s/in qemu/into QEMU 2.6/

> we have a DIMM module that keeps the data persistent across

s/DIMM/Non Volatile memory/

> domain reboots.

Perhaps a word of two about what is the usefulness of such a beast. I
think (from a former project) one usage is to store parameters for
firmware such as OVMF

Add extra line between paragraphs...

> At the domain XML level, we already have some representation of
> 'dimm' modules. Long story short, we have <memory/> element that
> lives under <devices/>. Now, the element even has @model
> attribute which we can use to introduce new memory type:
> 
>     <memory model='nvdimm'>
>       <source>
>         <path>/tmp/nvdimm</path>
>       </source>
>       <target>
>         <size unit='KiB'>523264</size>
>         <node>0</node>
>       </target>
>     </memory>
> 
> So far, this is just a XML parser/formatter extension. QEMU
> driver implementation is in the next commit.
> 
> For more info on NVDIMM visit the following web page:
> 
>     http://pmem.io/
> 

One could also google it ;-)... In any case, the actual details are
found the Documents link from that page, subject to move over time of
course.

> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
>  docs/formatdomain.html.in                          | 26 ++++--
>  docs/schemas/domaincommon.rng                      | 32 ++++---
>  src/conf/domain_conf.c                             | 97 ++++++++++++++++------
>  src/conf/domain_conf.h                             |  2 +
>  src/qemu/qemu_command.c                            |  6 ++
>  src/qemu/qemu_domain.c                             |  1 +
>  .../qemuxml2argv-memory-hotplug-nvdimm.xml         | 49 +++++++++++
>  7 files changed, 171 insertions(+), 42 deletions(-)
>  create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml
> 
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index bfbb0f2..657df8f 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -6740,6 +6740,15 @@ qemu-kvm -net nic,model=? /dev/null
>          <node>1</node>
>        </target>
>      </memory>
> +    <memory model='nvdimm'>
> +      <source>
> +        <path>/tmp/nvdimm</path>
> +      </source>
> +      <target>
> +        <size unit='KiB'>524287</size>
> +        <node>1</node>
> +      </target>
> +    </memory>
>    </devices>
>    ...
>  </pre>
> @@ -6747,17 +6756,19 @@ qemu-kvm -net nic,model=? /dev/null
>        <dt><code>model</code></dt>
>        <dd>
>          <p>
> -          Currently only the <code>dimm</code> model is supported in order to
> -          add a virtual DIMM module to the guest.
> +          Select <code>dimm</code> to add a virtual DIMM module to the guest.

Ugh... The 'dimm' Since tag is in the "Memory devices" section...  Which
just make the following awkward.  I think you should grab that 1.2.14
from above and place it here.

Rather than Select go with "Use" or "Provide"

> +          Alternatively, <code>nvdimm</code> model adds a Non-Volatile DIMM

Use the same format - e.g.

"Use/Provide <code>nvdimm</code> to add a Non-Volatile DIMM module."

> +          module. <span class="since">Since 2.2.0</span>

Well we won't hit 2.2.0....

>          </p>
>        </dd>
>  
>        <dt><code>source</code></dt>
>        <dd>
>          <p>
> -          The optional source element allows to fine tune the source of the
> -          memory used for the given memory device. If the element is not
> -          provided defaults configured via <code>numatune</code> are used.
> +          For model <code>dimm</code> this element is optional and allows to
> +          fine tune the source of the memory used for the given memory device.
> +          If the element is not provided defaults configured via
> +          <code>numatune</code> are used.
>          </p>
>          <p>
>            <code>pagesize</code> can optionally be used to override the default
> @@ -6770,6 +6781,11 @@ qemu-kvm -net nic,model=? /dev/null
>            <code>nodemask</code> can optionally be used to override the default
>            set of NUMA nodes where the memory would be allocated.
>          </p>

Since pagesize and nodemask are for DIMM only, so they probably need to
be converted to some sort of list or in some way indented to ensure the
visual cue is there that they are meant for dimm.  Perhaps the end of
the dimm paragraph needs a "If <code>source</code> is provided, then at
least one of the following values would be provided:".

I think the only way to get the right formatting look is :

<dd>
  <dt><code>pagesize</code></dt>
    <p> ...
    </p>
  <dt><code>nodemask</code></dt>
    <p> ...
    </p>
</dd>

> +        <p>
> +          For model <code>nvdimm</code> this element is mandatory and has a
> +          single child element <code>path</code> which value represents a path

s/which value/that/ ?

> +          in host that back the nvdimm module in the guest.

s/in host that back/in the host that backs/

Should there be any mention that this also requires "<maxMemory
slots='#'...>" to be set?

So something that isn't quite clear... For 'dimm', these are 'extra'
memory, while for 'nvdimm 'it seems to be one hunk that's really not
extra memory - rather it's memory that can be used for a specific
purpose. What's not clear to me - is the existing "physical" memory in
the guest (e.g. the numa node page) is used to map to the file or does
this appear as extra memory that is only accessible for this purpose?

Also, does the numa node have to be properly sized?  Your example shows
a 214Mb numa cell and a 511MB nvdimm.  Doesn't seem to me that fits
quite right.

BTW: That video you point to in your v1 indicates two types of NVDIMM -
"persistent memory namespace" (accessed using loads/stores, mapped to
physical memory) and "block mode namespace" (accessed via block
operations, atomicity at block level, indirect mapping thru a block
window, no mapping entier memory.

So which is KVM? Since libvirt must be the master to more than one
driver - should there be some sort of subtype="block|persistent" as
well?  Or we could just indicate KVM only for now...

> +        </p>
>        </dd>
>  
>        <dt><code>target</code></dt>
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index 052f28c..e6ad452 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -4722,6 +4722,7 @@
>        <attribute name="model">
>          <choice>
>            <value>dimm</value>
> +          <value>nvdimm</value>
>          </choice>
>        </attribute>
>        <interleave>
> @@ -4741,18 +4742,27 @@
>  
>    <define name="memorydev-source">
>      <element name="source">
> -      <interleave>
> -        <optional>
> -          <element name="pagesize">
> -            <ref name="scaledInteger"/>
> +      <choice>
> +        <group>
> +          <interleave>
> +            <optional>
> +              <element name="pagesize">
> +                <ref name="scaledInteger"/>
> +              </element>
> +            </optional>
> +            <optional>
> +              <element name="nodemask">
> +                <ref name="cpuset"/>
> +              </element>
> +            </optional>
> +          </interleave>
> +        </group>
> +        <group>
> +          <element name="path">
> +            <text/>
>            </element>
> -        </optional>
> -        <optional>
> -          <element name="nodemask">
> -            <ref name="cpuset"/>
> -          </element>
> -        </optional>
> -      </interleave>
> +        </group>
> +      </choice>
>      </element>
>    </define>
>  
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 82876f3..cb5a053 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -840,8 +840,11 @@ VIR_ENUM_DECL(virDomainBlockJob)
>  VIR_ENUM_IMPL(virDomainBlockJob, VIR_DOMAIN_BLOCK_JOB_TYPE_LAST,
>                "", "", "copy", "", "active-commit")
>  
> -VIR_ENUM_IMPL(virDomainMemoryModel, VIR_DOMAIN_MEMORY_MODEL_LAST,
> -              "", "dimm")
> +VIR_ENUM_IMPL(virDomainMemoryModel,
> +              VIR_DOMAIN_MEMORY_MODEL_LAST,
> +              "",
> +              "dimm",
> +              "nvdimm")
>  
>  static virClassPtr virDomainObjClass;
>  static virClassPtr virDomainXMLOptionClass;
> @@ -2345,6 +2348,7 @@ void virDomainMemoryDefFree(virDomainMemoryDefPtr def)
>      if (!def)
>          return;
>  
> +    VIR_FREE(def->path);
>      virBitmapFree(def->sourceNodes);
>      virDomainDeviceInfoClear(&def->info);
>      VIR_FREE(def);
> @@ -13140,20 +13144,36 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node,
>      xmlNodePtr save = ctxt->node;
>      ctxt->node = node;
>  
> -    if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt,
> -                             &def->pagesize, false, false) < 0)
> -        goto cleanup;
> -
> -    if ((nodemask = virXPathString("string(./nodemask)", ctxt))) {
> -        if (virBitmapParse(nodemask, &def->sourceNodes,
> -                           VIR_DOMAIN_CPUMASK_LEN) < 0)
> +    switch ((virDomainMemoryModel) def->model) {
> +    case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> +        if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt,
> +                                 &def->pagesize, false, false) < 0)
>              goto cleanup;
>  
> -        if (virBitmapIsAllClear(def->sourceNodes)) {
> -            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> -                           _("Invalid value of 'nodemask': %s"), nodemask);
> +        if ((nodemask = virXPathString("string(./nodemask)", ctxt))) {
> +            if (virBitmapParse(nodemask, &def->sourceNodes,
> +                               VIR_DOMAIN_CPUMASK_LEN) < 0)
> +                goto cleanup;
> +
> +            if (virBitmapIsAllClear(def->sourceNodes)) {
> +                virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> +                               _("Invalid value of 'nodemask': %s"), nodemask);
> +                goto cleanup;
> +            }
> +        }
> +        break;
> +
> +    case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> +        if (!(def->path = virXPathString("string(./path)", ctxt))) {
> +            virReportError(VIR_ERR_XML_DETAIL, "%s",
> +                           _("path is required for model nvdimm'"));
>              goto cleanup;
>          }
> +        break;
> +
> +    case VIR_DOMAIN_MEMORY_MODEL_NONE:
> +    case VIR_DOMAIN_MEMORY_MODEL_LAST:
> +        break;
>      }
>  
>      ret = 0;
> @@ -14541,12 +14561,25 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def,
>              tmp->size != mem->size)
>              continue;
>  
> -        /* source stuff -> match with device */
> -        if (tmp->pagesize != mem->pagesize)
> -            continue;
> +        switch ((virDomainMemoryModel) mem->model) {
> +        case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> +            /* source stuff -> match with device */
> +            if (tmp->pagesize != mem->pagesize)
> +                continue;
>  
> -        if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes))
> -            continue;
> +            if (!virBitmapEqual(tmp->sourceNodes, mem->sourceNodes))
> +                continue;
> +            break;
> +
> +        case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> +            if (STRNEQ(tmp->path, mem->path))
> +                continue;
> +            break;
> +
> +        case VIR_DOMAIN_MEMORY_MODEL_NONE:
> +        case VIR_DOMAIN_MEMORY_MODEL_LAST:
> +            break;
> +        }
>  
>          break;
>      }
> @@ -21705,23 +21738,35 @@ virDomainMemorySourceDefFormat(virBufferPtr buf,
>      char *bitmap = NULL;
>      int ret = -1;
>  
> -    if (!def->pagesize && !def->sourceNodes)
> +    if (!def->pagesize && !def->sourceNodes && !def->path)
>          return 0;
>  
>      virBufferAddLit(buf, "<source>\n");
>      virBufferAdjustIndent(buf, 2);
>  
> -    if (def->sourceNodes) {
> -        if (!(bitmap = virBitmapFormat(def->sourceNodes)))
> -            goto cleanup;
> +    switch ((virDomainMemoryModel) def->model) {
> +    case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> +        if (def->sourceNodes) {
> +            if (!(bitmap = virBitmapFormat(def->sourceNodes)))
> +                goto cleanup;
>  
> -        virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap);
> +            virBufferAsprintf(buf, "<nodemask>%s</nodemask>\n", bitmap);
> +        }
> +
> +        if (def->pagesize)
> +            virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n",
> +                              def->pagesize);
> +        break;
> +
> +    case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> +        virBufferAsprintf(buf, "<path>%s</path>\n", def->path);
> +        break;
> +
> +    case VIR_DOMAIN_MEMORY_MODEL_NONE:
> +    case VIR_DOMAIN_MEMORY_MODEL_LAST:
> +        break;
>      }
>  
> -    if (def->pagesize)
> -        virBufferAsprintf(buf, "<pagesize unit='KiB'>%llu</pagesize>\n",
> -                          def->pagesize);
> -
>      virBufferAdjustIndent(buf, -2);
>      virBufferAddLit(buf, "</source>\n");
>  


So do you feel there is a need for a check for different path in
virDomainMemoryDefCheckABIStability? I know it's not a target thing, but
the target/guest uses it by mapping directly...

Should there be checks for order?  That is src has dimm, dimm, dimm,
nvdim while dst has nvdimm, dimm, dimm, dimm?  Would that matter?

After going through patch 3, I'm beginning to think it wouldn't be such
a good idea to keep nvdimm's and dimm's in the same nmems list.  I think
they might just be too different.  I also recall some algorithm
somewhere which made sure addresses and sizes used fit "properly"...
Searching on DIMM in cscope brings up a few more references and questions.

And well - what about migration?


> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
> index 8b26724..4e6a9bf 100644
> --- a/src/conf/domain_conf.h
> +++ b/src/conf/domain_conf.h
> @@ -1939,6 +1939,7 @@ struct _virDomainRNGDef {
>  typedef enum {
>      VIR_DOMAIN_MEMORY_MODEL_NONE,
>      VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */
> +    VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
>  
>      VIR_DOMAIN_MEMORY_MODEL_LAST
>  } virDomainMemoryModel;
> @@ -1947,6 +1948,7 @@ struct _virDomainMemoryDef {
>      /* source */
>      virBitmapPtr sourceNodes;
>      unsigned long long pagesize; /* kibibytes */
> +    char *path;
>  
>      /* target */
>      int model; /* virDomainMemoryModel */
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 55df23d..a4cc87f 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -3393,6 +3393,12 @@ qemuBuildMemoryDeviceStr(virDomainMemoryDefPtr mem)
>  
>          break;
>  
> +    case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> +        virReportError(VIR_ERR_NO_SUPPORT, "%s",
> +                       _("nvdimm not supported yet"));
> +        return NULL;
> +        break;
> +
>      case VIR_DOMAIN_MEMORY_MODEL_NONE:
>      case VIR_DOMAIN_MEMORY_MODEL_LAST:
>          break;
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index efc46f9..28ee81d 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -5203,6 +5203,7 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem,
>  {
>      switch ((virDomainMemoryModel) mem->model) {
>      case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> +    case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:

So the assumption is that NVDIMM will fail the following check?

I would think it would want it's own error message like above "not
supported yet".  Something fortified by the patch 3 code which alters
the MemoryDeviceStr to be "pc-dimm" or "nvdimm"...

>          if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM &&
>              mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) {
>              virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml
> new file mode 100644
> index 0000000..e932241
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-nvdimm.xml
> @@ -0,0 +1,49 @@
> +<domain type='qemu'>
> +  <name>QEMUGuest1</name>
> +  <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
> +  <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory>
> +  <memory unit='KiB'>1267710</memory>
> +  <currentMemory unit='KiB'>1267710</currentMemory>
> +  <vcpu placement='static' cpuset='0-1'>2</vcpu>
> +  <os>
> +    <type arch='i686' machine='pc'>hvm</type>
> +    <boot dev='hd'/>
> +  </os>
> +  <idmap>
> +    <uid start='0' target='1000' count='10'/>
> +    <gid start='0' target='1000' count='10'/>
> +  </idmap>
> +  <cpu>
> +    <topology sockets='2' cores='1' threads='1'/>
> +    <numa>
> +      <cell id='0' cpus='0-1' memory='219136' unit='KiB'/>
> +    </numa>
> +  </cpu>
> +  <clock offset='utc'/>
> +  <on_poweroff>destroy</on_poweroff>
> +  <on_reboot>restart</on_reboot>
> +  <on_crash>destroy</on_crash>
> +  <devices>
> +    <emulator>/usr/bin/qemu</emulator>
> +    <disk type='block' device='disk'>
> +      <source dev='/dev/HostVG/QEMUGuest1'/>
> +      <target dev='hda' bus='ide'/>
> +      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
> +    </disk>
> +    <controller type='ide' index='0'/>
> +    <controller type='usb' index='0'/>
> +    <controller type='pci' index='0' model='pci-root'/>
> +    <input type='mouse' bus='ps2'/>
> +    <input type='keyboard' bus='ps2'/>
> +    <memballoon model='virtio'/>
> +    <memory model='nvdimm'>
> +      <source>
> +        <path>/tmp/nvdimm</path>
> +      </source>
> +      <target>
> +        <size unit='KiB'>523264</size>
> +        <node>0</node>
> +      </target>
> +    </memory>
> +  </devices>
> +</domain>
> 

No xml2xml tests?

I know there's some "newer" way to use file links from
tests/qemuxml2xmloutdata to ../qemuxml2argvdata/.

IIRC when I did something similar I had to create the link and git add
the link... Ah yes, look at the '*luks-disks* - commit id '9bbf0d7e'.

But now you've already done it for the rxqsz patch too, so you're the
new expert!


John




More information about the libvir-list mailing list