[libvirt] [RFC v2] external (pull) backup API

Wed Apr 25 18:27:53 UTC 2018

On 04/25/2018 03:19 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 24.04.2018 23:02, John Snow wrote:
>>
>>
>> On 04/23/2018 06:38 AM, Nikolay Shirokovskiy wrote:
>>>
>>>
>>> On 21.04.2018 00:26, Eric Blake wrote:
>>>> On 04/20/2018 01:24 PM, John Snow wrote:
>>>>
>>>>>>> Why is option 3 unworkable, exactly?:
>>>>>>>
>>>>>>> (3) Checkpoints exist as structures only with libvirt. They are saved
>>>>>>> and remembered in the XML entirely.
>>>>>>>
>>>>>>> Or put another way:
>>>>>>>
>>>>>>> Can you explain to me why it's important for libvirt to be able to
>>>>>>> reconstruct checkpoint information from a qcow2 file?
>>>>>>>
>>>>>>
>>>>>> In short it take extra effort for metadata to be consistent when 
>>>>>> libvirtd crashes occurs. See for more detailed explanation 
>>>>>> in [1] starting from words "Yes it is possible".
>>>>>>
>>>>>> [1] https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html
>>>>
>>>> I'd argue the converse. Libvirt already knows how to do atomic updates
>>>> of XML files that it tracks.  If libvirtd crashes/restarts in the middle
>>>> of an API call, you already have indeterminate results of whether the
>>>> API worked or failed; once libvirtd is restarted, you'll have to
>>>> probably retry the command.  For all other cases, the API call
>>>> completes, and either no XML changes were made (the command failed and
>>>> reports the failure properly), or all XML changes were made (the command
>>>> created the appropriate changes to track the new checkpoint, including
>>>> whatever bitmap names have to be recorded to map the relation between
>>>> checkpoints and bitmaps).
>>>
>>> We can fail to save XML... Consider we have B1, B2 and create B3 bitmap
>>> in the process of creating checkpoint C3. Next qemu creates snapshot
>>> and bitmap successfully then libvirt fail to update XML and after some
>>> time libvirt restarts (not even crashes). Now libvirt nows of B1 and B2 but not B3.
>>> What can be the consequences? For example if we ask bitmap from C2 we
>>> miss all changes from C3 as we don't know of B3. This will lead to corrupted
>>> backups.
>>>
>>> This can be fixed:
>>>
>>> - in qemu. If bitmaps have child/parent realtionship then on libvirt restart
>>>   we can recover (we ask qemu for bitmaps, discover B3 and then discover
>>>   B3 is child of B2). This is how basically implementation with naming
>>>   scheme works. Well on this way we don't need special metadata in 
>>>   libvirt (besides may be domain xml attached to checkpoiint etc)
>>>
>>> - in libvirt. If we save XML before creating a snapshot with checkpoint.
>>>   This fixes the issue with successful operation but saving XML failure.
>>>   But now we have another issue :) We can save XML successfully but then operation
>>>   itself can fail and we fail to revert XML back. Well we can recover
>>>   even without child/parent metadata in qemu in this case. Just ask
>>>   qemu for bitmaps on libvirt restart and if bitmap is missing kick
>>>   it out as it is a case described above (successful saving XML then
>>>   unsuccessfull qemu operation)
>>>
>>
>> This option seems perfectly workable to me...
>>
>>> So it is possible to track bitmaps in libvirt. We just need to be extra carefull
>>> not to produce invalid backups.
>>>
>>>>
>>>> Consider the case of internal snapshots.  Already, we have the case
>>>> where qemu itself does not track enough useful metadata about internal
>>>> snapshots (right now, just a name and timestamp of creation); so libvirt
>>>> additionally tracks further information in <domainsnapshot>: the name,
>>>> timestamp, relationship to any previous snapshot (libvirt can then
>>>> reconstruct a tree relationship between all snapshots; where a parent
>>>> can have more than one child if you roll back to a snapshot and then
>>>> execute the guest differently), the set of disks participating in the
>>>> snapshot, and the <domain> description at the time of the snapshot (if
>>>> you hotplug devices, or even the fact that creating external snapshots
>>>> changes which file is the active qcow2 in a backing chain, you'll need
>>>> to know how to roll back to the prior domain state as part of
>>>> reverting).  This is approximately the same set of information that a
>>>> <domaincheckpoint> will need to track.
>>>
>>> I would differentiate checkpoints and backups. For example in case
>>> of push backups we can store additional metadata in <domainbackup>
>>> so later we can revert back to previous state. But checkpoints 
>>> (bitmaps technically) are only to make incremental backups(restores?). 
>>> We can attach extra metadata to checkpoints but it looks accidental just because
>>> bitmaps and backups relate to some same point in time. To me a backup (push)
>>> can carry all the metadata and as to checkpoints a backup can have
>>> associated checkpoint or not. For example if we choose to always
>>> make full backups we don't need checkpoints at all (at least if we are
>>> not going to use them for restore).
>>>
>>
>> Well ... if we create checkpoints alongside full backups, then you have
>> points to reference to create future incremental backups. You don't need
>> checkpoints if you *NEVER* use an incremental backup. If we want the
>> feature enabled, so to speak, you likely need to be making checkpoints
>> alongside full backups.
>>
>> I'd say the cases in which we don't want them -- once the feature is
>> enabled -- are hard to find.
>>
>>>>
>>>> I'm slightly tempted to just overload <domainsnapshot> to track three
>>>> modes instead of two (internal, external, and now checkpoint); but think
>>>> that will probably be a bit too confusing, so more likely I will create
>>>> <domaincheckpoint> as a new object, but copy a lot of coding paradigms
>>>> from <domainsnapshot>.
>>>
>>> I wonder if you are going to use tree or list structure for backups.
>>> To me it is much easier to think of backups just as sequence of states
>>> in time. For example consider Grandfather-Father-Son scheme of Acronis backups [1].
>>> Typical backup can look like:
>>>
>>> F - I - I - I - I - D - I - I - I - I - D
>>>
>>> Where F is full monthly backup, I incremental daily backup and D is
>>> diferrential weekly backup (no backups on Sunday and Saturday).
>>> This is representation from time POV. From backup dependencies POV it look likes next:
>>>
>>> F - I - I - I - I   D - I - I - I - I   D 
>>> \-------------------|                   |
>>>  \--------------------------------------|
>>>
>>> or more common representation:
>>>
>>> F - I - I - I - I 
>>>  \- D - I - I - I - I   
>>>   \- D - I - I - I - I
>>>
>>> To me using tree structure in snapshots is aproppriate because each branching
>>> point is some semantic state ("basic OS installed") and branches are different
>>> trials from that point. In backup case I guess we don't want branching on recovery
>>> to some backup, we just want to keep selected backup scheme going. So for example
>>> if we recover on Wednesday to previous week's Friday then later on Wednesday we
>>> will have regular Wednesday backup as if we have not been recovered. This makes
>>> things simple for client or he will drawn in dependencies (especially after
>>> a couple of recoverings).
>>>
>>
>> But your representation is itself a tree -- is this a good argument
>> against hierarchical information ... ?
>>
>> If you don't utilize the hierarchy, the degenerate form is indeed just a
>> list:
>>
>> F - I - I - I - I - I - I - I - I - I ...
>>
>> everything has just one successor.
>>
>> I think Eric just feels he can get good code re-use out of the
>> <domainsnapshot> element -- since each <snapshot> element itself
>> references a parent ID; there's no real "cost" to tracking a tree
>> instead of a list.
>>
>> There's nothing stopping you from adding three checkpoints that have the
>> same parent, so to speak.
>>
>> I think this is just something that might wind up happening "for free"
>> due to the nature of how libvirt stores relational data at all.
> 
> I mean we have to store tree structure for backups of course. I suggest
> 
> - not to expose tree structure thru API in the first place. For example
>   we can have API like
> 
>   - virDomainBackupList(time_t from, time_t to,
>                         virDomainBackupPtr **backups,
>                         unsigned int flags)
> 
>     to list backups in some period of time with flags like
>       -'only full backups',
>       -'include parent backups if they don't fit into interval'
>       -'include children backups if they don't fit into interval'
> 
>   - virDomainBackupListChildren(virDomainBackupPtr parent,
> 				virDomainBackupPtr **backups,
> 			        unsigned int flags)
> 
>      to list backup childrens
> 
> - in case of restore don't branch from restored state instead just continue
>   to backup as if changes brought by restore are produced by guest
> 
> So API has means to explore tree structure eventually (virDomainBackupListChildren)
> but I suggest to think of and provide means to work with backups as a sequence
> in time not tree in the first place.
> 

Oh, sure. That might be reasonable, but I'll probably defer to Eric's
opinion here. The XML storage can be tree-based (as a natural
occurrence) but I don't know if we need to make the API tree-based, right.

I don't have a really strong stance here -- I'd say whatever makes the
most sense with the implementation that best facilitates code re-use in
libvirt.

--js

>>
>>> Of course internally we need to track backup dependencies in order to
>>> properly delete backups or recover from them.
>>>
>>> [1] https://www.acronis.com/en-us/support/documentation/AcronisBackup_11.5/index.html#760.html
>>>>
>>>> So, from that point of view, libvirt tracking the relationship between
>>>> qcow2 bitmaps in order to form checkpoint information can be done ALL
>>>> with libvirt, and without NEEDING the qcow2 file to track any relations
>>>> between bitmaps.  BUT, libvirt's job can probably be made easier if
>>>> qcow2 would, at the least, allow bitmaps to track their parent, and/or
>>>> provide APIs to easily merge a parent..intermediate..child chain of
>>>> related bitmaps to be merged into a single bitmap, for easy runtime
>>>> creation of the temporary bitmap used to express the delta between two
>>>> checkpoints.
>>>>
>>>>
>>>
>>> [snip]
>>>
>>> Nikolay
>>>