[libvirt] [PATCH 6/8] Docs: Add description and validation for blkdeviotune

Eric Blake eblake at redhat.com
Wed Nov 23 05:28:42 UTC 2011


On 11/15/2011 02:02 AM, Lei Li wrote:
> Signed-off-by: Lei Li <lilei at linux.vnet.ibm.com>
> Signed-off-by: Zhi Yong Wu <wuzhy at linux.vnet.ibm.com>
> ---
>  docs/formatdomain.html.in     |   31 +++++++++++++++++++++++++++++++
>  docs/schemas/domaincommon.rng |   24 ++++++++++++++++++++++++
>  2 files changed, 55 insertions(+), 0 deletions(-)

I plan to squash this in with 4/8, and apply it prior to 3/8, although
I'll review it independently since you submitted it alone.  I find it
slightly easier to read domain_conf.c parser/printer changes if you have
.rng changes to compare them to (both now during series review, and
later in git researching to see when a feature was added).

> 
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index cbad196..733062d 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -893,6 +893,14 @@
>        <driver name="tap" type="aio" cache="default"/>
>        <source file='/var/lib/xen/images/fv0'/ startupPolicy='optional'>
>        <target dev='hda' bus='ide'/>
> +      <iotune>
> +        <total_bytes_sec>n</total_bytes_sec>

These should be typical numbers, not a placeholder "n", so that someone
can get an idea of what to copy-and-paste into their XML.

> +        <read_bytes_sec>n</read_bytes_sec>
> +        <write_bytes_sec>n</write_bytes_sec>
> +        <total_iops_sec>n</total_iops_sec>
> +        <read_bytes_sec>n</read_bytes_sec>
> +        <write_bytes_sec>n</write_bytes_sec>

You said here:

https://www.redhat.com/archives/libvir-list/2011-November/msg00442.html

that the code enforces some additional constraints: total cannot be set
at the same time as read or write limits.  This example violates those
constraints.

> +      </iotune>
>        <boot order='2'/>
>        <encryption type='...'>
>          ...
> @@ -1010,6 +1018,29 @@
>          <span class="since">Since 0.0.3; <code>bus</code> attribute since 0.4.3;
>          "usb" attribute value since after 0.4.4; "sata" attribute value since
>          0.9.7</span></dd>
> +      <dt><code>iotune</code></dt>
> +      <dd>The optional <code>iotune</code> element provides the ability
> +        to set or get block I/O throttling for the device. Block I/O
> +        throtting be implemented by qemu, is specified per-disk and can

s/throtting/throttling/

> +        vary across multiple disks.</dd>

This needs to be reworded to reflect the generic nature of the API (IO
throttling is currently the only tuning parameter, but we may expand to
other tuning parameters in the future).

> +      <dt><code>total_bytes_sec</code></dt>
> +      <dd>The optinal <code>total_bytes_sec</code> element is the total throughput

s/optinal/optional/

> +        limit in bytes per second.</dd>
> +      <dt><code>read_bytes_sec</code></dt>
> +      <dd>The optinal <code>read_bytes_sec</code> element is the read throughput
> +        limit in bytes per second.</dd>
> +      <dt><code>write_bytes_sec</code</dt>

Missing a >.

> +      <dd>The optinal <code>write_bytes_sec</code> element is the write throughput
> +        limit in bytes per second.</dd>
> +      <dt><code>total_iops_sec</code></dt>
> +      <dd>The optional <code>total_iops_sec</code> element is the total I/O operations
> +        per second.</dd>
> +      <dt><code>read_iops_sec</code></dt>
> +      <dd>The optional <code>read_iops_sec</code> element is the read I/O operations
> +        per second.</dd>
> +      <dt><code>write_iops_sec</code></dt>
> +      <dd>The optional <code>write_iops_sec</code> element is the write I/O operations
> +        per second.</dd>

The additional constraints in the code should also be documented here.
I'm assuming that '0' has a special meaning of no limit; and
back-compatibility demands that an omitted tuning has the same effect,
which means if the user inputs '0' it is easier if we just omit it on
output (and only output the non-zero values).  Handling of '0' should be
documented.

>        <dt><code>driver</code></dt>
>        <dd>
>          The optional driver element allows specifying further details
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index b6f858e..c6873a0 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -698,6 +698,30 @@
>        <optional>
>          <ref name="snapshot"/>
>        </optional>
> +      <optional>
> +        <element name="iotune">

Wrong location; it is better to list this among other sub-elements,
rather than among the attributes of <disk>.

> +          <choice>
> +            <element name="total_bytes_sec">
> +              <ref name="unsignedLongLong">

There is no <define name='unsignedLongLong'> for this <ref> to resolve
to; however,
http://www.cafeconleche.org/slides/sd2006west/relaxng/RELAX__Schemas_Don%27t_Have_to_be_Hard.html
documents unsignedLong as an 8-byte primitive, which matches our use case.

> +            </element>
> +            <element name="read_bytes_sec">
> +              <ref name="unsignedLongLong">
> +            </element>

Won't work.  <choice> means that you can have exactly one tuning
element, when in reality you want <interleave><oneOrMore> to allow
combinations of elements.  We can also make the XML enforce the
constraints (total and read/write are mutually exclusive).

> +            <element name="write_bytes_sec">
> +              <ref name="unsignedLongLong">
> +            </element>
> +            <element name="total_iops_sec">
> +              <ref name="unsignedLongLong">
> +            </element>
> +            <element name="read_iops_sec">
> +              <ref name="unsignedLongLong">
> +            </element>
> +            <element name="write_iopw_sec">

Another typo.

> +              <ref name="unsignedLongLong">
> +            </emement>

This typo here, along with several missing /,...

> +          </choice>
> +        </element>
> +      </optional>
>        <choice>
>          <group>
>            <attribute name="type">

completely breaks 'make check' on every test in domainschematest and
domainsnapshotschematest.  A sample of 'make -C tests check
TESTS=domainsnapshotschematest VIR_TEST_DEBUG=1' shows:

 ref: Relax-NG parser error : Internal found no define for ref domain
Relax-NG schema
/home/remote/eblake/libvirt/tests/../docs/schemas/domainsnapshot.rng
failed to compile

It's hard to trust this series if basic things like 'make check' and
'make syntax-check' fail.

Here's what I'm squashing in, before I merge this with 4/8 (hmm, I
practically rewrote this entire patch):

diff --git i/docs/formatdomain.html.in w/docs/formatdomain.html.in
index 1bd263e..f45d0ce 100644
--- i/docs/formatdomain.html.in
+++ w/docs/formatdomain.html.in
@@ -923,12 +923,9 @@
       <source file='/var/lib/xen/images/fv0'/
startupPolicy='optional'>
       <target dev='hda' bus='ide'/>
       <iotune>
-        <total_bytes_sec>n</total_bytes_sec>
-        <read_bytes_sec>n</read_bytes_sec>
-        <write_bytes_sec>n</write_bytes_sec>
-        <total_iops_sec>n</total_iops_sec>
-        <read_bytes_sec>n</read_bytes_sec>
-        <write_bytes_sec>n</write_bytes_sec>
+        <total_bytes_sec>10000000</total_bytes_sec>
+        <read_iops_sec>400000</read_iops_sec>
+        <write_iops_sec>100000</write_iops_sec>
       </iotune>
       <boot order='2'/>
       <encryption type='...'>
@@ -1048,28 +1045,39 @@
         "usb" attribute value since after 0.4.4; "sata" attribute value
since
         0.9.7</span></dd>
       <dt><code>iotune</code></dt>
-      <dd>The optional <code>iotune</code> element provides the ability
-        to set or get block I/O throttling for the device. Block I/O
-        throtting be implemented by qemu, is specified per-disk and can
-        vary across multiple disks.</dd>
-      <dt><code>total_bytes_sec</code></dt>
-      <dd>The optinal <code>total_bytes_sec</code> element is the total
throughput
-        limit in bytes per second.</dd>
-      <dt><code>read_bytes_sec</code></dt>
-      <dd>The optinal <code>read_bytes_sec</code> element is the read
throughput
-        limit in bytes per second.</dd>
-      <dt><code>write_bytes_sec</code</dt>
-      <dd>The optinal <code>write_bytes_sec</code> element is the write
throughput
-        limit in bytes per second.</dd>
-      <dt><code>total_iops_sec</code></dt>
-      <dd>The optional <code>total_iops_sec</code> element is the total
I/O operations
-        per second.</dd>
-      <dt><code>read_iops_sec</code></dt>
-      <dd>The optional <code>read_iops_sec</code> element is the read
I/O operations
-        per second.</dd>
-      <dt><code>write_iops_sec</code></dt>
-      <dd>The optional <code>write_iops_sec</code> element is the write
I/O operations
-        per second.</dd>
+      <dd>The optional <code>iotune</code> element provides the
+        ability to provide additional per-device I/O tuning, with
+        values that can vary for each device (contrast this to
+        the <a
href="#elementsBlockTuning"><code><blkiotune></code></a>
+        element, which applies globally to the domain).  Currently,
+        the only tuning available is Block I/O throttling for qemu.
+        This element has optional sub-elements; any sub-element not
+        specified or given with a value of 0 implies no
+        limit.  <span class="since">Since 0.9.8</span>
+        <dl>
+          <dt><code>total_bytes_sec</code></dt>
+          <dd>The optional <code>total_bytes_sec</code> element is the
+            total throughput limit in bytes per second.  This cannot
+            appear with <code>read_bytes_sec</code>
+            or <code>write_bytes_sec</code>.</dd>
+          <dt><code>read_bytes_sec</code></dt>
+          <dd>The optional <code>read_bytes_sec</code> element is the
+            read throughput limit in bytes per second.</dd>
+          <dt><code>write_bytes_sec</code></dt>
+          <dd>The optional <code>write_bytes_sec</code> element is the
+            write throughput limit in bytes per second.</dd>
+          <dt><code>total_iops_sec</code></dt>
+          <dd>The optional <code>total_iops_sec</code> element is the
+            total I/O operations per second.  This cannot
+            appear with <code>read_iops_sec</code>
+            or <code>write_iops_sec</code>.</dd>
+          <dt><code>read_iops_sec</code></dt>
+          <dd>The optional <code>read_iops_sec</code> element is the
+            read I/O operations per second.</dd>
+          <dt><code>write_iops_sec</code></dt>
+          <dd>The optional <code>write_iops_sec</code> element is the
+            write I/O operations per second.</dd>
+        </dl>
       <dt><code>driver</code></dt>
       <dd>
         The optional driver element allows specifying further details
diff --git i/docs/schemas/domaincommon.rng w/docs/schemas/domaincommon.rng
index 965e033..7adf9a8 100644
--- i/docs/schemas/domaincommon.rng
+++ w/docs/schemas/domaincommon.rng
@@ -635,6 +635,9 @@
     <optional>
       <ref name="address"/>
     </optional>
+    <optional>
+      <ref name="diskIoTune"/>
+    </optional>
   </define>
   <define name="snapshot">
     <attribute name="snapshot">
@@ -698,30 +701,6 @@
       <optional>
         <ref name="snapshot"/>
       </optional>
-      <optional>
-        <element name="iotune">
-          <choice>
-            <element name="total_bytes_sec">
-              <ref name="unsignedLongLong">
-            </element>
-            <element name="read_bytes_sec">
-              <ref name="unsignedLongLong">
-            </element>
-            <element name="write_bytes_sec">
-              <ref name="unsignedLongLong">
-            </element>
-            <element name="total_iops_sec">
-              <ref name="unsignedLongLong">
-            </element>
-            <element name="read_iops_sec">
-              <ref name="unsignedLongLong">
-            </element>
-            <element name="write_iopw_sec">
-              <ref name="unsignedLongLong">
-            </emement>
-          </choice>
-        </element>
-      </optional>
       <choice>
         <group>
           <attribute name="type">
@@ -809,7 +788,11 @@
             <ref name="diskspec"/>
           </interleave>
         </group>
-        <ref name="diskspec"/>
+        <group>
+          <interleave>
+            <ref name="diskspec"/>
+          </interleave>
+        </group>
       </choice>
     </element>
   </define>
@@ -2620,6 +2603,47 @@
     </element>
   </define>

+  <define name='diskIoTune'>
+    <element name="iotune">
+      <interleave>
+        <choice>
+          <element name="total_bytes_sec">
+            <data type="unsignedLong"/>
+          </element>
+          <group>
+            <optional>
+              <element name="read_bytes_sec">
+                <data type="unsignedLong"/>
+              </element>
+            </optional>
+            <optional>
+              <element name="write_bytes_sec">
+                <data type="unsignedLong"/>
+              </element>
+            </optional>
+          </group>
+        </choice>
+        <choice>
+          <element name="total_iops_sec">
+            <data type="unsignedLong"/>
+          </element>
+          <group>
+            <optional>
+              <element name="read_iops_sec">
+                <data type="unsignedLong"/>
+              </element>
+            </optional>
+            <optional>
+              <element name="write_iops_sec">
+                <data type="unsignedLong"/>
+              </element>
+            </optional>
+          </group>
+        </choice>
+      </interleave>
+    </element>
+  </define>
+
   <!--
        Optional hypervisor extensions in their own namespace:
          QEmu


-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20111122/3040fdcf/attachment-0001.sig>


More information about the libvir-list mailing list