[libvirt] RFC: exposing backing store allocation in domain xml

Adam Litke alitke at redhat.com
Wed Aug 6 17:31:04 UTC 2014


On 06/08/14 10:36 -0600, Eric Blake wrote:
>Adam Litke has been asking if I can expose watermark information from
>qemu when doing block commit.  Qemu still doesn't expose that
>information when doing 'virsh blockcopy' (QMP drive-mirror), but DOES
>expose it for regular and active 'virsh blockcommit'.  The idea is that
>when you are writing to more than one file at a time, management needs
>to know if the file is nearing a watermark for usage that necessitates
>growing the storage volume before hitting an ENOSPC error.  In
>particular, Adam's use is running qcow2 format on top of block devices,
>where it is easy to enlarge the block device.
>
>The current libvirt API virDomainBlockInfo() can only get watermark
>information for the active image in a disk chain.  It shows three numbers:
> capacity: the disk size seen by the guest (can be grown via
>virt-resize) - usually larger than the host block device if the guest
>has not used the complete disk, but can also be smaller than the host
>block device due to overhead of qcow2 and the disk is mostly in use
> allocation: the known usage of the host file/block device, should never
>be larger than the physical size (other than rounding up to file sector
>sizing). For sparse files, this number is smaller than total size based
>by the amount of holes in the file. For block devices with qcow2 format,
>this number is reported by qemu as the maximum offset in use by the
>qcow2 file (without regards to whether earlier offsets are holes that
>could be reused). Compare this to what 'du' would report.
> physical: the total size of the host file/block device. Compare this to
>what 'ls' would report.
>
>Also, the libvirt API virStorageVolGetXMLDesc reports two of those
>numbers for a top-level image: <capacity> and <allocation> are listed as
>siblings of <target>.  But it is not present for a <backingStore>; you
>have to use the API twice.
>
>Now that we have a common virStorageSourcePtr type in the C code, we
>could do a better job of exposing full information for the entire chain
>in a single API call.
>
>I've got a couple ideas of where we can extend existing APIs (and the
>extensions do not involve bumping the .so versioning, so it can also be
>backported, although it gets MUCH harder to backport without
>virStorageSourcePtr).
>
>First, I think the virStorageVolGetXMLDesc should show all three
>numbers, by adding a <physical unit='bytes'>...</physical> element
>alongside the existing <capacity> and <allocation> elements.  Also, I

This seems like an obvious improvement especially to make the APIs
more symetric.

>think it might be nice if we could enhance the API to do a full chain
>recursion (probably requires an explicit flag to turn on) where it shows
>details on the full backing chain, rather than just partial details on
>the immediate backing file; in doing that, the <backingStore> element
>would gain recursive <backingStore> (similar to what we recently did in
><domain> XML).  In that mode, each layer of <backingStore> would also
>report <capacity>, <allocation>, and <physical>.  Something like:

+1

># virsh vol-dumpxml --pool default f20.snap2
><volume type='file'>
>  <name>f20.snap2</name>
>  <key>/var/lib/libvirt/images/f20.snap2</key>
>  <source>
>  </source>
>  <capacity unit='bytes'>12884901888</capacity>
>  <allocation unit='bytes'>2503548928</allocation>
>  <physical unit='bytes'>2503548928</allocation>
>  <target>
>    <path>/var/lib/libvirt/images/f20.snap2</path>
>    <format type='qcow2'/>
>    <permissions>
>      <mode>0600</mode>
>      <owner>0</owner>
>      <group>0</group>
>      <label>system_u:object_r:virt_image_t:s0</label>
>    </permissions>
>    <timestamps>
>      <atime>1407295598.583411967</atime>
>      <mtime>1403064822.622766566</mtime>
>      <ctime>1404318525.899951254</ctime>
>    </timestamps>
>    <compat>1.1</compat>
>    <features/>
>  </target>
>  <backingStore>
>    <path>/var/lib/libvirt/images/f20.snap1</path>
>    <capacity unit='bytes'>12884901888</capacity>
>    <allocation unit='bytes'>2503548928</allocation>
>    <physical unit='bytes'>2503548928</allocation>
>    <format type='qcow2'/>
>    <permissions>
>      <mode>0600</mode>
>      <owner>107</owner>
>      <group>107</group>
>      <label>system_u:object_r:virt_content_t:s0</label>
>    </permissions>
>    <timestamps>
>      <atime>1407295598.623411816</atime>
>      <mtime>1402005765.810488875</mtime>
>      <ctime>1404318523.313955796</ctime>
>    </timestamps>
>    <compat>1.1</compat>
>    <features/>
>    <backingStore>
>      <path>/var/lib/libvirt/images/f20.base</path>
>      <capacity unit='bytes'>10737418240</capacity>
>      <allocation unit='bytes'>2503548928</allocation>
>      <physical unit='bytes'>10737418240</allocation>
>      <format type='raw'/>
>      <permissions>
>        <mode>0600</mode>
>        <owner>107</owner>
>        <group>107</group>
>        <label>system_u:object_r:virt_content_t:s0</label>
>      </permissions>
>      <timestamps>
>        <atime>1407295598.623411816</atime>
>        <mtime>1402005765.810488875</mtime>
>        <ctime>1404318523.313955796</ctime>
>      </timestamps>
>      <backingStore/>
>    </backingStore>
>  </backingStore>
></volume>
>
>Also, the current storage volume API is rather hard-coded to assume that
>backing elements are in the same storage pool, which is not always true.
> It may be time to introduce <backingStore type='file'> or <backingStore
>type='network'> to allow better details of cross-pool backing elements,

Would you also need to add a <pool>default</pool> element in each one
so that you can name the pool explicitly?

>while leaving plain <backingStore> as a back-compat synonym for
><backingStore type='volume'> for the current hard-coded layout that
>assumes the backing element is in the same storage pool.
>
>The other idea I've had is to expand the <domain> XML to expose more
>information about backing chains, including to make it expose details
>that are redundant with virDomainBlockInfo() for the top level, or maybe
>even what virDomainBlockStatsFlags() reports.  Here, we have a bit of a
>choice - storage volume XML was inconsistent on which attributes were
>siblings to <target> (such as <capacity>) vs. children (such as
><timestamps>); it might be nicer to stick all per-file elements at the
>same level in <disk> XML (probably as siblings to <source>).  On the
>other hand, I strongly feel that <compat> is a feature of the <format>,
>so it should have been a child rather than a sibling.  So, as an example
>of what the XML might look like:
>
>    <disk type='file' device='disk'>
>      <driver name='qemu' type='qcow2'>
>        <compat>1.1</compat>
>        <features/>
>      </driver>
>      <source file='/tmp/snap2.img'/>
>      <capacity unit='bytes'>12884901888</capacity>
>      <allocation unit='bytes'>2503548928</allocation>
>      <physical unit='bytes'>2503548928</allocation>
>      <permissions>
>        <mode>0600</mode>
>        <owner>107</owner>
>        <group>107</group>
>        <label>system_u:object_r:virt_content_t:s0</label>
>      </permissions>
>      <timestamps>
>        <atime>1407295598.623411816</atime>
>        <mtime>1402005765.810488875</mtime>
>        <ctime>1404318523.313955796</ctime>
>      </timestamps>
>      <backingStore type='file' index='1'>
>        <format type='qcow2'>
>          <compat>1.1</compat>
>          <features/>
>        </format>
>        <source file='/tmp/snap1.img'/>
>        <capacity unit='bytes'>12884901888</capacity>
>        <allocation unit='bytes'>2503548928</allocation>
>        <physical unit='bytes'>2503548928</allocation>
>        <permissions>
>          <mode>0600</mode>
>          <owner>0</owner>
>          <group>0</group>
>          <label>system_u:object_r:virt_image_t:s0</label>
>        </permissions>
>        <timestamps>
>          <atime>1407295598.583411967</atime>
>          <mtime>1403064822.622766566</mtime>
>          <ctime>1404318525.899951254</ctime>
>        </timestamps>
>        <backingStore type='file' index='2'>
>          <format type='raw'/>
>          <capacity unit='bytes'>10737418240</capacity>
>          <allocation unit='bytes'>2503548928</allocation>
>          <physical unit='bytes'>10737418240</allocation>
>          <source file='/tmp/base.img'/>
>          <permissions>
>            <mode>0600</mode>
>            <owner>107</owner>
>            <group>107</group>
>            <label>system_u:object_r:virt_content_t:s0</label>
>          </permissions>
>          <timestamps>
>            <atime>1407295598.623411816</atime>
>            <mtime>1402005765.810488875</mtime>
>            <ctime>1404318523.313955796</ctime>
>          </timestamps>
>          <backingStore/>
>        </backingStore>
>      </backingStore>
>      <target dev='vda' bus='virtio'/>
>      <alias name='virtio-disk0'/>
>      <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>function='0x0'/>
>    </disk>
>
>Again, this is a lot of new information, so it may be wise to add a new
>flag that must be turned on to request the information.  But adding this
>information would allow watermark tracking for a blockcommit operation -
>when collapsing 'base <- snap1 <- snap2' into 'base <- snap2' by
>committing snap1 into base, the <allocation> sublement of the
>appropriate <backingStore> level will do live tracking of the qemu
>values as more data is being written into base, and thus be usable to
>determine if the block device behind base needs to be externally
>expanded before hitting an ENOSPC situation.

Yes, +1 and this would satisfy my use case.

-- 
Adam Litke




More information about the libvir-list mailing list