[libvirt] [PATCH v2 1/4] xml: add disk driver metadata_cache_size option

John Ferlan jferlan at redhat.com
Mon Dec 10 16:56:08 UTC 2018


On 11/8/18 8:02 AM, Nikolay Shirokovskiy wrote:
> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy at virtuozzo.com>
> ---
>  docs/formatdomain.html.in                          |  8 ++++
>  docs/schemas/domaincommon.rng                      | 11 +++++
>  src/conf/domain_conf.c                             | 17 ++++++++
>  src/conf/domain_conf.h                             |  9 ++++
>  .../qemuxml2argvdata/disk-metadata_cache_size.xml  | 42 +++++++++++++++++++
>  .../disk-metadata_cache_size.xml                   | 48 ++++++++++++++++++++++
>  tests/qemuxml2xmltest.c                            |  2 +
>  7 files changed, 137 insertions(+)
>  create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml
>  create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
> 

<sigh> looks like a forgotten thread...  It seems reviewer bandwidth is
limited and won't get much better during the last couple weeks of the
month when many if not most Red Hat employees are out to due company
shutdown period.

You need to add a few words to the commit message to describe what's
being changed.  Oh and you'll also need a "last" patch to docs/news.xml
to describe the new feature.

> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index 269741a..1d186ab 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -3556,6 +3556,14 @@
>              virt queues for virtio-blk. (<span class="since">Since 3.9.0</span>)
>            </li>
>            <li>
> +            The optional <code>metadata_cache_size</code> attribute specifies
> +            metadata cache size policy, possible values are "default" and "maximum".

s/policy, possible/policy. Possible/

> +            "default" leaves setting cache size to hypervisor, "maximum" makes

s/"default"/Using "default"/

s/to hypervisor,/to the hypervisor./

s/"maximum"/Using "maximum"

> +            cache size large enough to keep all metadata, this will help if workload


"Using maximum assigns the largest value possible for the cache size.
This ensures the entire disk cache remains in memory for faster I/O at
the expense of utilizing more memory."

[Editorial comment: it's really not 100% clear what all the tradeoffs
someone is making here. The phrase "large enough" just sticks out, but
since you use INT64_MAX in patch 3, I suppose that *is* the maximum.
Still in some way indicating that this allows QEMU to grow the cache and
keep everything in memory, but has the side effect that disks configured
this way will cause guest memory requirements to grow albeit limited by
the QEMU algorithms. It seems from my read the max is 32MB, so perhaps
not a huge deal, but could be useful to note. Whether QEMU shrinks the
cache when not in use wasn't 100% clear to me.]

It's too bad it's not possible to utilize some dynamic value via
qcow2_update_options_prepare. If dynamic adjustment were possible, then
saving the value in the XML wouldn't be necessary - we could just allow
dynamic adjustment similar to what I did for of a couple of IOThread
params (see commit 11ceedcda and the patches before it).

> +            needs access to whole disk all the time. (<span class="since">Since
> +            4.10.0, QEMU 3.1</span>)

This will be at least 5.0.0 now.

> +          </li>
> +          <li>
>            For virtio disks,
>            <a href="#elementsVirtio">Virtio-specific options</a> can also be
>            set. (<span class="since">Since 3.5.0</span>)
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index b9ac5df..3e406fc 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -1990,6 +1990,9 @@
>          <ref name="detect_zeroes"/>
>        </optional>
>        <optional>
> +        <ref name="metadata_cache_size"/>
> +      </optional>
> +      <optional>
>          <attribute name='queues'>
>            <ref name="positiveInteger"/>
>          </attribute>
> @@ -2090,6 +2093,14 @@
>        </choice>
>      </attribute>
>    </define>
> +  <define name="metadata_cache_size">
> +    <attribute name='metadata_cache_size'>
> +      <choice>
> +        <value>default</value>
> +        <value>maximum</value>

I didn't go back and read the previous reviews nor was I present for the
KVM Forum discussion on this, but default really means "minimum" I
think. Even that just doesn't feel "right". There's 3 QEMU knobs, but
this changes 1 knob allowing one direction. And yes, changing 3 knobs is
also confusing. I "assume" that it's been determined that this one knob
has the greatest affect on I/O performance...

Reading https://github.com/qemu/qemu/blob/master/docs/qcow2-cache.txt
some phrases stick out:

  1. "setting the right cache sizes is not a straightforward operation".
  2. "In order to choose the cache sizes we need to know how they relate
to the amount of allocated space."
  3. The limit of the l2 cache size is "32 MB by default on Linux
platforms (enough for full coverage of 256 GB images, with the default
cluster size)".
  4. "QEMU will not use more memory than needed to hold all of the
image's L2 tables, regardless of this max. value."

So at least providing INT_MAX won't hurt, although given the math in
qcow2_update_options_prepare:

    l2_cache_size /= l2_cache_entry_size;
    if (l2_cache_size < MIN_L2_CACHE_SIZE) {
        l2_cache_size = MIN_L2_CACHE_SIZE;
    }
    if (l2_cache_size > INT_MAX) {

wouldn't that mean we could pass up to "l2_cache_size *
l2_cache_entry_size"?

Still using minimum/maximum for what amounts to a boolean operation is I
think unnecessary. Rather than a list of values, wouldn't having
something like "max_metadata_cache='yes'" be just as effective?

Alternatively, reading through the above document and thinking about how
"useful" it could be to use a more specific value, why not go with
"metadata_cache_size=N", where N is in MB and describes the size of the
cache to be used. Being in MB ascribes to the need to be a power of 2
and quite frankly keeps a reasonable range assigned to the value. I'd
find it hard to fathom a "large enough" disk would improve that much I/O
performance in KB adjustments, but I haven't done any analysis of that
theory either.

I think the above document also gives a "fair" example that could be
"reworded" into the libvirt docs in order to help "size" things based on
the default QEMU values we're not allowing to be changed and then
showing the math what it means to have a 1MB cache, a 2MB cache, an 8MB,
etc. vis-a-vis the size of the disk for which the cache is being
created. For example, for a disk of up to 8G, the "default" cache would
be XXX and setting a value of YYY would help I/O.  Similarly for a 16GB,
64GB, etc. Essentially a bit of a guide - although that's starting to
border on what should go into the wiki instead.

> +      </choice>
> +    </attribute>
> +  </define>
>    <define name="controller">
>      <element name="controller">
>        <optional>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 237540b..b2185f8 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -885,6 +885,11 @@ VIR_ENUM_IMPL(virDomainDiskDetectZeroes, VIR_DOMAIN_DISK_DETECT_ZEROES_LAST,
>                "on",
>                "unmap")
>  
> +VIR_ENUM_IMPL(virDomainDiskMetadataCacheSize,
> +              VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST,
> +              "default",
> +              "maximum")
> +
>  VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
>                "none",
>                "yes",
> @@ -9375,6 +9380,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDefPtr def,
>      }
>      VIR_FREE(tmp);
>  
> +    if ((tmp = virXMLPropString(cur, "metadata_cache_size")) &&
> +        (def->metadata_cache_size = virDomainDiskMetadataCacheSizeTypeFromString(tmp)) < 0) {
> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> +                       _("unknown driver metadata_cache_size value '%s'"), tmp);
> +        goto cleanup;
> +    }
> +    VIR_FREE(tmp);
> +

This would just use the YesNo type logic instead or read in a numeric
value.  BTW: Not sure I'd bother with worrying about checking the QEMU
maximum as that could change and then we're stuck with a lower maximum
check. Just pass along the value to QEMU and let it fail.

>      ret = 0;
>  
>   cleanup:
> @@ -23902,6 +23915,10 @@ virDomainDiskDefFormatDriver(virBufferPtr buf,
>      if (disk->queues)
>          virBufferAsprintf(&driverBuf, " queues='%u'", disk->queues);
>  
> +    if (disk->metadata_cache_size)
> +        virBufferAsprintf(&driverBuf, " metadata_cache_size='%s'",
> +                          virDomainDiskMetadataCacheSizeTypeToString(disk->metadata_cache_size));
> +

My personal crusade... metadata_cache_size >
VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT (yes, > 0).

Of course this all depends on your final solution - a boolean would only
be written if set and an int value only written if > 0 (since 0 would be
the "optional" viewpoint).

>      virDomainVirtioOptionsFormat(&driverBuf, disk->virtio);
>  
>      ret = virXMLFormatElement(buf, "driver", &driverBuf, NULL);
> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
> index e30a4b2..b155058 100644
> --- a/src/conf/domain_conf.h
> +++ b/src/conf/domain_conf.h
> @@ -568,6 +568,13 @@ typedef enum {
>      VIR_DOMAIN_DISK_DETECT_ZEROES_LAST
>  } virDomainDiskDetectZeroes;
>  
> +typedef enum {
> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT = 0,
> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM,
> +
> +    VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST
> +} virDomainDiskMetadataCacheSize;
> +
>  typedef struct _virDomainBlockIoTuneInfo virDomainBlockIoTuneInfo;
>  struct _virDomainBlockIoTuneInfo {
>      unsigned long long total_bytes_sec;
> @@ -672,6 +679,7 @@ struct _virDomainDiskDef {
>      int discard; /* enum virDomainDiskDiscard */
>      unsigned int iothread; /* unused = 0, > 0 specific thread # */
>      int detect_zeroes; /* enum virDomainDiskDetectZeroes */
> +    int metadata_cache_size; /* enum virDomainDiskMetadataCacheSize */
>      char *domain_name; /* backend domain name */
>      unsigned int queues;
>      virDomainVirtioOptionsPtr virtio;
> @@ -3388,6 +3396,7 @@ VIR_ENUM_DECL(virDomainDeviceSGIO)
>  VIR_ENUM_DECL(virDomainDiskTray)
>  VIR_ENUM_DECL(virDomainDiskDiscard)
>  VIR_ENUM_DECL(virDomainDiskDetectZeroes)
> +VIR_ENUM_DECL(virDomainDiskMetadataCacheSize)
>  VIR_ENUM_DECL(virDomainDiskMirrorState)
>  VIR_ENUM_DECL(virDomainController)
>  VIR_ENUM_DECL(virDomainControllerModelPCI)
> diff --git a/tests/qemuxml2argvdata/disk-metadata_cache_size.xml b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
> new file mode 100644
> index 0000000..8ac2599
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/disk-metadata_cache_size.xml
> @@ -0,0 +1,42 @@
> +<domain type='qemu'>
> +  <name>test</name>
> +  <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
> +  <memory unit='KiB'>1048576</memory>
> +  <currentMemory unit='KiB'>1048576</currentMemory>
> +  <vcpu placement='static'>1</vcpu>
> +  <os>
> +    <type arch='x86_64' machine='pc-0.13'>hvm</type>

Nothing more recent than pc-0.13 ;-) for this copy-pasta.

> +    <boot dev='cdrom'/>
> +    <boot dev='hd'/>
> +    <bootmenu enable='yes'/>
> +  </os>
> +  <clock offset='utc'/>
> +  <on_poweroff>destroy</on_poweroff>
> +  <on_reboot>restart</on_reboot>
> +  <on_crash>restart</on_crash>
> +  <devices>
> +    <emulator>/usr/bin/qemu-system-x86_64</emulator>
> +    <disk type='file' device='disk'>
> +      <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/>
> +      <source file='/var/lib/libvirt/images/f14.img'/>
> +      <target dev='vda' bus='virtio'/>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
> +    </disk>
> +    <disk type='file' device='cdrom'>
> +      <driver name='qemu' type='raw'/>
> +      <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
> +      <target dev='hdc' bus='ide'/>
> +      <readonly/>
> +      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
> +    </disk>

This second disk isn't necessary

John

> +    <controller type='usb' index='0'/>
> +    <controller type='virtio-serial' index='0'>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
> +    </controller>
> +    <controller type='ide' index='0'/>
> +    <controller type='pci' index='0' model='pci-root'/>
> +    <input type='mouse' bus='ps2'/>
> +    <input type='keyboard' bus='ps2'/>
> +    <memballoon model='virtio'/>
> +  </devices>
> +</domain>
> diff --git a/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
> new file mode 100644
> index 0000000..5fed22b
> --- /dev/null
> +++ b/tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml
> @@ -0,0 +1,48 @@
> +<domain type='qemu'>
> +  <name>test</name>
> +  <uuid>468404ad-d49c-40f2-9e14-02294f9c1be3</uuid>
> +  <memory unit='KiB'>1048576</memory>
> +  <currentMemory unit='KiB'>1048576</currentMemory>
> +  <vcpu placement='static'>1</vcpu>
> +  <os>
> +    <type arch='x86_64' machine='pc-0.13'>hvm</type>
> +    <boot dev='cdrom'/>
> +    <boot dev='hd'/>
> +    <bootmenu enable='yes'/>
> +  </os>
> +  <clock offset='utc'/>
> +  <on_poweroff>destroy</on_poweroff>
> +  <on_reboot>restart</on_reboot>
> +  <on_crash>restart</on_crash>
> +  <devices>
> +    <emulator>/usr/bin/qemu-system-x86_64</emulator>
> +    <disk type='file' device='disk'>
> +      <driver name='qemu' type='qcow2' metadata_cache_size='maximum'/>
> +      <source file='/var/lib/libvirt/images/f14.img'/>
> +      <target dev='vda' bus='virtio'/>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
> +    </disk>
> +    <disk type='file' device='cdrom'>
> +      <driver name='qemu' type='raw'/>
> +      <source file='/var/lib/libvirt/Fedora-14-x86_64-Live-KDE.iso'/>
> +      <target dev='hdc' bus='ide'/>
> +      <readonly/>
> +      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
> +    </disk>
> +    <controller type='usb' index='0'>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
> +    </controller>
> +    <controller type='virtio-serial' index='0'>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
> +    </controller>
> +    <controller type='ide' index='0'>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
> +    </controller>
> +    <controller type='pci' index='0' model='pci-root'/>
> +    <input type='mouse' bus='ps2'/>
> +    <input type='keyboard' bus='ps2'/>
> +    <memballoon model='virtio'>
> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
> +    </memballoon>
> +  </devices>
> +</domain>
> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
> index 89640f6..c44e0fe 100644
> --- a/tests/qemuxml2xmltest.c
> +++ b/tests/qemuxml2xmltest.c
> @@ -1235,6 +1235,8 @@ mymain(void)
>      DO_TEST("riscv64-virt",
>              QEMU_CAPS_DEVICE_VIRTIO_MMIO);
>  
> +    DO_TEST("disk-metadata_cache_size", NONE);
> +
>      if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL)
>          virFileDeleteTree(fakerootdir);
>  
> 




More information about the libvir-list mailing list