[libvirt] RFC: exposing backing store allocation in domain xml
Peter Krempa
pkrempa at redhat.com
Thu Aug 7 09:57:13 UTC 2014
On 08/06/14 18:36, Eric Blake wrote:
> Adam Litke has been asking if I can expose watermark information from\
<bikeshedding>
I'd be glad if we stopped calling this watermark. The wiki
disambiguation article states:
<citation>
A watermark is a recognizable image or pattern in paper used to identify
authenticity.
Watermark or watermarking can also refer to:
In digital watermarks and digital security[edit]
Watermark (data file), a method for ensuring data integrity which
combines aspects of data hashing and digital watermarking
Watermark (data synchronization), directory synchronization related
programming terminology
High-water mark (computer security), network security terminology
Audio watermark, techniques for detecting hidden information from
watermarked signal
Digital watermarking, a technique to embed data in digital audio, images
or video
Watermarking attack, an attack on disk encryption methods
</citation>
As this usage is neither of those I always have to translate it to
something more sane when discussing this topic. I actually like the
subject of this mail to refer to what's discussed here. I'm not sure
though if we can come up with a shorter name that will not be ambiguous
with something else.
</bikeshedding>
> qemu when doing block commit. Qemu still doesn't expose that
> information when doing 'virsh blockcopy' (QMP drive-mirror), but DOES
> expose it for regular and active 'virsh blockcommit'. The idea is that
> when you are writing to more than one file at a time, management needs
> to know if the file is nearing a watermark for usage that necessitates
> growing the storage volume before hitting an ENOSPC error. In
> particular, Adam's use is running qcow2 format on top of block devices,
> where it is easy to enlarge the block device.
>
> The current libvirt API virDomainBlockInfo() can only get watermark
> information for the active image in a disk chain. It shows three numbers:
> capacity: the disk size seen by the guest (can be grown via
> virt-resize) - usually larger than the host block device if the guest
> has not used the complete disk, but can also be smaller than the host
> block device due to overhead of qcow2 and the disk is mostly in use
> allocation: the known usage of the host file/block device, should never
> be larger than the physical size (other than rounding up to file sector
> sizing). For sparse files, this number is smaller than total size based
> by the amount of holes in the file. For block devices with qcow2 format,
> this number is reported by qemu as the maximum offset in use by the
> qcow2 file (without regards to whether earlier offsets are holes that
> could be reused). Compare this to what 'du' would report.
> physical: the total size of the host file/block device. Compare this to
> what 'ls' would report.
>
> Also, the libvirt API virStorageVolGetXMLDesc reports two of those
> numbers for a top-level image: <capacity> and <allocation> are listed as
> siblings of <target>. But it is not present for a <backingStore>; you
> have to use the API twice.
>
> Now that we have a common virStorageSourcePtr type in the C code, we
> could do a better job of exposing full information for the entire chain
> in a single API call.
>
> I've got a couple ideas of where we can extend existing APIs (and the
> extensions do not involve bumping the .so versioning, so it can also be
> backported, although it gets MUCH harder to backport without
> virStorageSourcePtr).
>
> First, I think the virStorageVolGetXMLDesc should show all three
> numbers, by adding a <physical unit='bytes'>...</physical> element
> alongside the existing <capacity> and <allocation> elements. Also, I
> think it might be nice if we could enhance the API to do a full chain
> recursion (probably requires an explicit flag to turn on) where it shows
> details on the full backing chain, rather than just partial details on
> the immediate backing file; in doing that, the <backingStore> element
> would gain recursive <backingStore> (similar to what we recently did in
> <domain> XML). In that mode, each layer of <backingStore> would also
> report <capacity>, <allocation>, and <physical>. Something like:
While this is certainly a improvement to the storage volume API, it will
not help Adam much as oVirt isn't actually using the storage driver.
>
> # virsh vol-dumpxml --pool default f20.snap2
> <volume type='file'>
...
>
> Also, the current storage volume API is rather hard-coded to assume that
> backing elements are in the same storage pool, which is not always true.
> It may be time to introduce <backingStore type='file'> or <backingStore
> type='network'> to allow better details of cross-pool backing elements,
> while leaving plain <backingStore> as a back-compat synonym for
> <backingStore type='volume'> for the current hard-coded layout that
> assumes the backing element is in the same storage pool.
That would certainly improve the usability, but as said it would not
help oVirt that much.
>
> The other idea I've had is to expand the <domain> XML to expose more
> information about backing chains, including to make it expose details
> that are redundant with virDomainBlockInfo() for the top level, or maybe
> even what virDomainBlockStatsFlags() reports. Here, we have a bit of a
> choice - storage volume XML was inconsistent on which attributes were
> siblings to <target> (such as <capacity>) vs. children (such as
> <timestamps>); it might be nicer to stick all per-file elements at the
> same level in <disk> XML (probably as siblings to <source>). On the
> other hand, I strongly feel that <compat> is a feature of the <format>,
> so it should have been a child rather than a sibling. So, as an example
> of what the XML might look like:
>
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'>
> <compat>1.1</compat>
> <features/>
> </driver>
> <source file='/tmp/snap2.img'/>
> <capacity unit='bytes'>12884901888</capacity>
> <allocation unit='bytes'>2503548928</allocation>
> <physical unit='bytes'>2503548928</allocation>
> <permissions>
> <mode>0600</mode>
> <owner>107</owner>
> <group>107</group>
> <label>system_u:object_r:virt_content_t:s0</label>
> </permissions>
> <timestamps>
> <atime>1407295598.623411816</atime>
> <mtime>1402005765.810488875</mtime>
> <ctime>1404318523.313955796</ctime>
> </timestamps>
Both <permissions> and <timestamps> are not entirely useful information
in runtime.
> <backingStore type='file' index='1'>
> <format type='qcow2'>
> <compat>1.1</compat>
> <features/>
> </format>
> <source file='/tmp/snap1.img'/>
> <capacity unit='bytes'>12884901888</capacity>
> <allocation unit='bytes'>2503548928</allocation>
> <physical unit='bytes'>2503548928</allocation>
> <permissions>
> <mode>0600</mode>
> <owner>0</owner>
> <group>0</group>
> <label>system_u:object_r:virt_image_t:s0</label>
> </permissions>
> <timestamps>
> <atime>1407295598.583411967</atime>
> <mtime>1403064822.622766566</mtime>
> <ctime>1404318525.899951254</ctime>
> </timestamps>
> <backingStore type='file' index='2'>
> <format type='raw'/>
> <capacity unit='bytes'>10737418240</capacity>
> <allocation unit='bytes'>2503548928</allocation>
> <physical unit='bytes'>10737418240</allocation>
> <source file='/tmp/base.img'/>
> <permissions>
> <mode>0600</mode>
> <owner>107</owner>
> <group>107</group>
> <label>system_u:object_r:virt_content_t:s0</label>
> </permissions>
> <timestamps>
> <atime>1407295598.623411816</atime>
> <mtime>1402005765.810488875</mtime>
> <ctime>1404318523.313955796</ctime>
> </timestamps>
> <backingStore/>
> </backingStore>
> </backingStore>
> <target dev='vda' bus='virtio'/>
> <alias name='virtio-disk0'/>
> <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
> </disk>
>
> Again, this is a lot of new information, so it may be wise to add a new
> flag that must be turned on to request the information. But adding this
This definitely needs a flag. We are polluting the XML enough by the
backing chain now.
> information would allow watermark tracking for a blockcommit operation -
> when collapsing 'base <- snap1 <- snap2' into 'base <- snap2' by
> committing snap1 into base, the <allocation> sublement of the
> appropriate <backingStore> level will do live tracking of the qemu
> values as more data is being written into base, and thus be usable to
> determine if the block device behind base needs to be externally
> expanded before hitting an ENOSPC situation.
>
Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140807/675942d1/attachment-0001.sig>
More information about the libvir-list
mailing list