[libvirt] RFC: exposing backing store allocation in domain xml

Peter Krempa pkrempa at redhat.com
Thu Aug 7 09:57:13 UTC 2014


On 08/06/14 18:36, Eric Blake wrote:
> Adam Litke has been asking if I can expose watermark information from\

<bikeshedding>
I'd be glad if we stopped calling this watermark. The wiki
disambiguation article states:

<citation>
A watermark is a recognizable image or pattern in paper used to identify
authenticity.

Watermark or watermarking can also refer to:

In digital watermarks and digital security[edit]
Watermark (data file), a method for ensuring data integrity which
combines aspects of data hashing and digital watermarking
Watermark (data synchronization), directory synchronization related
programming terminology
High-water mark (computer security), network security terminology
Audio watermark, techniques for detecting hidden information from
watermarked signal
Digital watermarking, a technique to embed data in digital audio, images
or video
Watermarking attack, an attack on disk encryption methods
</citation>

As this usage is neither of those I always have to translate it to
something more sane when discussing this topic. I actually like the
subject of this mail to refer to what's discussed here. I'm not sure
though if we can come up with a shorter name that will not be ambiguous
with something else.
</bikeshedding>


> qemu when doing block commit.  Qemu still doesn't expose that
> information when doing 'virsh blockcopy' (QMP drive-mirror), but DOES
> expose it for regular and active 'virsh blockcommit'.  The idea is that
> when you are writing to more than one file at a time, management needs
> to know if the file is nearing a watermark for usage that necessitates
> growing the storage volume before hitting an ENOSPC error.  In
> particular, Adam's use is running qcow2 format on top of block devices,
> where it is easy to enlarge the block device.
> 
> The current libvirt API virDomainBlockInfo() can only get watermark
> information for the active image in a disk chain.  It shows three numbers:
>  capacity: the disk size seen by the guest (can be grown via
> virt-resize) - usually larger than the host block device if the guest
> has not used the complete disk, but can also be smaller than the host
> block device due to overhead of qcow2 and the disk is mostly in use
>  allocation: the known usage of the host file/block device, should never
> be larger than the physical size (other than rounding up to file sector
> sizing). For sparse files, this number is smaller than total size based
> by the amount of holes in the file. For block devices with qcow2 format,
> this number is reported by qemu as the maximum offset in use by the
> qcow2 file (without regards to whether earlier offsets are holes that
> could be reused). Compare this to what 'du' would report.
>  physical: the total size of the host file/block device. Compare this to
> what 'ls' would report.
> 
> Also, the libvirt API virStorageVolGetXMLDesc reports two of those
> numbers for a top-level image: <capacity> and <allocation> are listed as
> siblings of <target>.  But it is not present for a <backingStore>; you
> have to use the API twice.
> 
> Now that we have a common virStorageSourcePtr type in the C code, we
> could do a better job of exposing full information for the entire chain
> in a single API call.
> 
> I've got a couple ideas of where we can extend existing APIs (and the
> extensions do not involve bumping the .so versioning, so it can also be
> backported, although it gets MUCH harder to backport without
> virStorageSourcePtr).
> 
> First, I think the virStorageVolGetXMLDesc should show all three
> numbers, by adding a <physical unit='bytes'>...</physical> element
> alongside the existing <capacity> and <allocation> elements.  Also, I
> think it might be nice if we could enhance the API to do a full chain
> recursion (probably requires an explicit flag to turn on) where it shows
> details on the full backing chain, rather than just partial details on
> the immediate backing file; in doing that, the <backingStore> element
> would gain recursive <backingStore> (similar to what we recently did in
> <domain> XML).  In that mode, each layer of <backingStore> would also
> report <capacity>, <allocation>, and <physical>.  Something like:

While this is certainly a improvement to the storage volume API, it will
not help Adam much as oVirt isn't actually using the storage driver.

> 
> # virsh vol-dumpxml --pool default f20.snap2
> <volume type='file'>

...

> 
> Also, the current storage volume API is rather hard-coded to assume that
> backing elements are in the same storage pool, which is not always true.
>  It may be time to introduce <backingStore type='file'> or <backingStore
> type='network'> to allow better details of cross-pool backing elements,
> while leaving plain <backingStore> as a back-compat synonym for
> <backingStore type='volume'> for the current hard-coded layout that
> assumes the backing element is in the same storage pool.

That would certainly improve the usability, but as said it would not
help oVirt that much.

> 
> The other idea I've had is to expand the <domain> XML to expose more
> information about backing chains, including to make it expose details
> that are redundant with virDomainBlockInfo() for the top level, or maybe
> even what virDomainBlockStatsFlags() reports.  Here, we have a bit of a
> choice - storage volume XML was inconsistent on which attributes were
> siblings to <target> (such as <capacity>) vs. children (such as
> <timestamps>); it might be nicer to stick all per-file elements at the
> same level in <disk> XML (probably as siblings to <source>).  On the
> other hand, I strongly feel that <compat> is a feature of the <format>,
> so it should have been a child rather than a sibling.  So, as an example
> of what the XML might look like:
> 
>     <disk type='file' device='disk'>
>       <driver name='qemu' type='qcow2'>
>         <compat>1.1</compat>
>         <features/>
>       </driver>
>       <source file='/tmp/snap2.img'/>
>       <capacity unit='bytes'>12884901888</capacity>
>       <allocation unit='bytes'>2503548928</allocation>
>       <physical unit='bytes'>2503548928</allocation>
>       <permissions>
>         <mode>0600</mode>
>         <owner>107</owner>
>         <group>107</group>
>         <label>system_u:object_r:virt_content_t:s0</label>
>       </permissions>
>       <timestamps>
>         <atime>1407295598.623411816</atime>
>         <mtime>1402005765.810488875</mtime>
>         <ctime>1404318523.313955796</ctime>
>       </timestamps>

Both <permissions> and <timestamps> are not entirely useful information
in runtime.

>       <backingStore type='file' index='1'>
>         <format type='qcow2'>
>           <compat>1.1</compat>
>           <features/>
>         </format>
>         <source file='/tmp/snap1.img'/>
>         <capacity unit='bytes'>12884901888</capacity>
>         <allocation unit='bytes'>2503548928</allocation>
>         <physical unit='bytes'>2503548928</allocation>
>         <permissions>
>           <mode>0600</mode>
>           <owner>0</owner>
>           <group>0</group>
>           <label>system_u:object_r:virt_image_t:s0</label>
>         </permissions>
>         <timestamps>
>           <atime>1407295598.583411967</atime>
>           <mtime>1403064822.622766566</mtime>
>           <ctime>1404318525.899951254</ctime>
>         </timestamps>
>         <backingStore type='file' index='2'>
>           <format type='raw'/>
>           <capacity unit='bytes'>10737418240</capacity>
>           <allocation unit='bytes'>2503548928</allocation>
>           <physical unit='bytes'>10737418240</allocation>
>           <source file='/tmp/base.img'/>
>           <permissions>
>             <mode>0600</mode>
>             <owner>107</owner>
>             <group>107</group>
>             <label>system_u:object_r:virt_content_t:s0</label>
>           </permissions>
>           <timestamps>
>             <atime>1407295598.623411816</atime>
>             <mtime>1402005765.810488875</mtime>
>             <ctime>1404318523.313955796</ctime>
>           </timestamps>
>           <backingStore/>
>         </backingStore>
>       </backingStore>
>       <target dev='vda' bus='virtio'/>
>       <alias name='virtio-disk0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </disk>
> 
> Again, this is a lot of new information, so it may be wise to add a new
> flag that must be turned on to request the information.  But adding this

This definitely needs a flag. We are polluting the XML enough by the
backing chain now.

> information would allow watermark tracking for a blockcommit operation -
> when collapsing 'base <- snap1 <- snap2' into 'base <- snap2' by
> committing snap1 into base, the <allocation> sublement of the
> appropriate <backingStore> level will do live tracking of the qemu
> values as more data is being written into base, and thus be usable to
> determine if the block device behind base needs to be externally
> expanded before hitting an ENOSPC situation.
> 

Peter


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140807/675942d1/attachment-0001.sig>


More information about the libvir-list mailing list