[libvirt] [PATCH v4 11/20] wip: backup: Parse and output checkpoint XML

Eric Blake eblake at redhat.com
Mon Feb 25 21:06:17 UTC 2019


On 2/12/19 11:23 AM, John Ferlan wrote:
> 
> 
> On 2/6/19 2:18 PM, Eric Blake wrote:
>> Work in progress - the checkpoint code is not quite passing
>> tests (part of that is figuring out the minimal XML that is
>> still valid as a <domain> element, or just use --no-domain flag).
>>
>> Signed-off-by: Eric Blake <eblake at redhat.com>
>> ---
>>  src/conf/checkpoint_conf.h          |  150 ++++
>>  src/conf/domain_conf.h              |   11 +-
>>  po/POTFILES                         |    1 +
>>  src/conf/Makefile.inc.am            |    2 +
>>  src/conf/checkpoint_conf.c          | 1030 +++++++++++++++++++++++++++
>>  src/conf/domain_conf.c              |    7 +-
>>  src/libvirt_private.syms            |   21 +
>>  tests/Makefile.am                   |    9 +-
>>  tests/domaincheckpointxml2xmltest.c |  231 ++++++
>>  9 files changed, 1458 insertions(+), 4 deletions(-)
>>  create mode 100644 src/conf/checkpoint_conf.h
>>  create mode 100644 src/conf/checkpoint_conf.c
>>  create mode 100644 tests/domaincheckpointxml2xmltest.c
>>
> 
> Starting to lose some steam - seeing wip means I don't want to spend too
> much time on some algorithms for fear they'll change, but then again I
> know you're looking for feedback...

I understand. The 'wip' tags should be gone for v5.

> 
>> diff --git a/src/conf/checkpoint_conf.h b/src/conf/checkpoint_conf.h
>> new file mode 100644
>> index 0000000000..994a8bd083
>> --- /dev/null
>> +++ b/src/conf/checkpoint_conf.h
>> @@ -0,0 +1,150 @@
>> +/*
>> + * checkpoint_conf.h: domain checkpoint XML processing
>> + *
>> + * Copyright (C) 2006-2019 Red Hat, Inc.
>> + * Copyright (C) 2006-2008 Daniel P. Berrange
> 
> This is new, right?  Not a copy of...

New file, but so heavily copied from snapshot_conf.h that I felt safer
preserving all existing copyrights.


>> +struct _virDomainCheckpointDef {
>> +    /* Public XML.  */
>> +    char *name;
>> +    char *description;
>> +    char *parent;
>> +    long long creationTime; /* in seconds */
>> +
>> +    size_t ndisks; /* should not exceed dom->ndisks */
>> +    virDomainCheckpointDiskDef *disks;
>> +
>> +    virDomainDefPtr dom;
>> +
>> +    /* Internal use.  */
>> +    bool current; /* At most one checkpoint in the list should have this set */
> 
> Typically these are then in the *Obj...

Copy-and-paste from virDomainSnapshotDef, but my recent work there has
made me want to reconsider where the current object is tracked.

The problem is that the current way we store things on disk is via a
separate <domainsnapshot> XML per snapshot, with no other obvious place
for the internal representation of a <domain> to track which snapshot is
current.  BUT, as I have just posted patches to add a bulk dump/redefine
with VIR_DOMAIN_XML_SNAPSHOTS, we COULD just ditch our current
separate-file storage of snapshots, and instead store the snapshots
directly with the <domain> (for all new saves; when loading libvirtd,
we'd still have to keep the code to load from older separate snapshot
files for back-compat reasons, even if we no longer track separate files
after the one-time conversion).  And if I make snapshots cleaner like
that, then checkpoints can just start life with the saner approach,
rather than being stuck with copying the older one-file-per-checkpoint
approach that I implemented using copy-and-paste.

> 
>> +};
>> +
>> +struct _virDomainCheckpointObj {
>> +    virDomainCheckpointDefPtr def; /* non-NULL except for metaroot */
>> +
>> +    virDomainCheckpointObjPtr parent; /* non-NULL except for metaroot, before
>> +                                         virDomainCheckpointUpdateRelations, or
>> +                                         after virDomainCheckpointDropParent */
>> +    virDomainCheckpointObjPtr sibling; /* NULL if last child of parent */
>> +    size_t nchildren;
>> +    virDomainCheckpointObjPtr first_child; /* NULL if no children */
> 
> The whole relationship thing is I think overly complex and a bit
> difficult to follow. Having @sibling and @first_child seems to make the
> code even more complex. I would think you have a parent and maybe a
> child. If this is the "first" checkpoint, then there is no parent. I
> would think that means it's the root.
> 
> There's just much more to be see when it comes to how these
> relationships play out as checkpoints are made and removed. I guess I
> just have a very linear model in mind, but reading the code seems to go
> beyond linearality.
> 
> The use of hash tables just makes it easier to ensure no def->name is
> reused. Not having locks in the object means some parent lock is
> managing to make sure two checkpoint operations cannot occur at the same
> time from different threads (e.g. a domain object lock), but that is not
> 100% clear.

Again, heavy copy-and-paste from snapshots, which DO lend themselves to
non-linear DAGs of related snapshots. (If you create snapshot A, then
run the guest, then create B, then revert to A, then run the guest and
create C, then both B and C are children of A).  I don't readily know if
checkpoints lend themselves as easily to non-linear relations, or if my
copy-and-paste is an instance of YAGNI.  And it also makes me wonder if
I would be better served by factoring out the snapshot non-linear
relationships into some more reusable helper functions rather than just
open-coding things in both files.

The short answer is that there is a hash table (for O(1) name lookup)
AND a linked-list relationship listing (for tracking DAG relationships,
where one child is common, but more than one child is possible); and
that the domain tracks a metaroot (a static instance with name==NULL) so
that all other snapshots/checkpoints have nice properties of always
having a parent (either another real snapshot as the parent, or the
metaroot as the parent when the presentation to the user is that the
snapshot is the root of a DAG with no user-created parent).  If I
improve anything here with better comments, I should also apply similar
improvements to the snapshot code.

> 
> 
>> +};
> 
> When objectifying module to to avoid leaking *obj entries into other
> modules, this struct would go inside .c file w/ accessors to whatever is
> needed externally. I purposefully avoided domain & domain snapshot as
> there was just too much rework.  Whether or not that can be done here -
> not sure yet, still reading, but figured I'd point it out at least ;-)

Indeed, since there is such heavy copy-and-paste between the two, first
refactoring virDomainSnapshot to be Object-ified with better helper
routines, and then making virDomainCheckpoint share a common parent
class with virDomainSnapshot to reuse those helper routines, rather than
open-coded copy-and-paste is probably worth the effort.  (I didn't do it
initially in order to get a working demo out the door, but I also admit
that adds technical debt, and I should really be trying to reduce it
rather than adding to it).


>> +int virDomainCheckpointUpdateRelations(virDomainCheckpointObjListPtr checkpoints);
>> +void virDomainCheckpointDropParent(virDomainCheckpointObjPtr checkpoint);
> 
> More recently in the .h prototypes, been trying to follow the .c entry
> as well... where it's
> 
> type
> function(args...);
> 
> makes copy-pasta easier...

Yeah, I'll have to double-check what improvements have been made to
snapshot in the meantime, and make sure I'm not regressing back to
things the way they were before cleanups (since my copy-and-paste point
is mostly from sometime around Sep-Oct 2018).


> 
> Add a:
> 
> VIR_DEFINE_AUTOPTR_FUNC(virDomainCheckpointDef, virDomainCheckpointDefFree);
> 
> details coming...

Yeah, I know it's coming.  From what I've seen, I'll like it, too.


>> @@ -2631,6 +2637,9 @@ struct _virDomainObj {
>>
>>      bool hasManagedSave;
>>
>> +    virDomainCheckpointObjListPtr checkpoints;
>> +    virDomainCheckpointObjPtr current_checkpoint;
>> +
> 
> Note to self or you since I could forget by the time I find this
> usage... wouldn't update to current_checkpoint need a lock?

Same question for snapshots. So far, all modification of snapshots vs.
current_snapshot have been under an appropriate API lock, but you do
raise the point that I should be careful about it. In fact, I'm
half-tempted to first fix virDomainSnapshotObjListPtr to subsume the
current_snapshot designation rather than letting virDomainPtr track it
independently, in which case this would need the same treatment.

> 
> What's really not clear is whether the domain object lock is in place or
> not when we get here.  I think the domainobj's are just plain write
> locks and not the domainobjlist fancy rwlocks.

All the more reason to put the current object as part of the ObjListPtr,
so that locking for adding/removing members from the list is guaranteed
to be the same as locking for updating which member is current.


>> +#include <config.h>
>> +
>> +#include <fcntl.h>
>> +#include <sys/stat.h>
>> +#include <sys/time.h>
>> +#include <unistd.h>
>> +
>> +#include "internal.h"
>> +#include "virbitmap.h"
>> +#include "virbuffer.h"
>> +#include "count-one-bits.h"
>> +#include "datatypes.h"
>> +#include "domain_conf.h"
>> +#include "virlog.h"
>> +#include "viralloc.h"
>> +#include "netdev_bandwidth_conf.h"
>> +#include "netdev_vport_profile_conf.h"
>> +#include "nwfilter_conf.h"
>> +#include "secret_conf.h"
>> +#include "checkpoint_conf.h"
>> +#include "virstoragefile.h"
>> +#include "viruuid.h"
>> +#include "virfile.h"
>> +#include "virerror.h"
>> +#include "virxml.h"
>> +#include "virstring.h"
> 
> Are all of these really needed?

Possibly not. I'll play the trimming game.

> 
>> +
>> +#define VIR_FROM_THIS VIR_FROM_DOMAIN_CHECKPOINT
>> +
>> +VIR_LOG_INIT("conf.checkpoint_conf");
>> +
>> +VIR_ENUM_IMPL(virDomainCheckpoint, VIR_DOMAIN_CHECKPOINT_TYPE_LAST,
>> +              "default", "no", "bitmap");
>> +
>> +struct _virDomainCheckpointObjList {
>> +    /* name string -> virDomainCheckpointObj mapping
>> +     * for O(1), lockless lookup-by-name */
>> +    virHashTable *objs;
>> +
>> +    virDomainCheckpointObj metaroot; /* Special parent of all root checkpoints */'
> 
> So not a pointer, but copied object from somewhere?

No, statically allocated right here.

It's a trick that let's me convert a user's possible multi-root forest:

A
 - B
 - C
D
 - E

into a single-root DAG:

metarooot
 - A
    - B
    - C
 - D
    - E

and various operations that need to quickly find all the user-visible
roots can just traverse the children of the metaroot.

Again, heavily copy-pasted from snapshots, and probably worth first
refactoring it into a common parent-class object with better comments,
so that both snapshots and checkpoints can reuse the same parent, rather
than open-coding things twice.

> 
>> +};
>> +
>> +/* Checkpoint Def functions */
>> +static void
>> +virDomainCheckpointDiskDefClear(virDomainCheckpointDiskDefPtr disk)
>> +{
>> +    VIR_FREE(disk->name);
>> +    VIR_FREE(disk->bitmap);
> 
> Should we clear idx, size, and type too? Depends on consumer usage which
> could clear and reuse, thus using something old.

I'll have to double-check if memset(,0) as a final step in clear would
make any difference.


>> +    bitmap = virXMLPropString(node, "bitmap");
>> +    if (bitmap) {
>> +        if (def->type != VIR_DOMAIN_CHECKPOINT_TYPE_BITMAP) {
>> +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                           _("disk checkpoint bitmap '%s' requires "
>> +                             "type='bitmap'"),
>> +                           bitmap);
>> +            goto cleanup;
>> +        }
>> +        VIR_STEAL_PTR(def->bitmap, bitmap);
>> +    }
> 
> Size is not parsed.  Restore from restart will lose it.

Intentional. Size is not persistent, but dynamically computed every time
you pass the VIR_DOMAIN_CHECKPOINT_XML_SIZE flag (and potentially
constantly changing, for a running domain).  In other words, it is an
output-only flag, and not tied to the checkpoint itself, so much as the
checkpoint XML being the easiest way to query that particular piece of
per-disk runtime information in bulk.


>> +static virDomainCheckpointDefPtr
>> +virDomainCheckpointDefParse(xmlXPathContextPtr ctxt,
>> +                            virCapsPtr caps,
>> +                            virDomainXMLOptionPtr xmlopt,
>> +                            unsigned int flags)
>> +{
>> +    virDomainCheckpointDefPtr def = NULL;
>> +    virDomainCheckpointDefPtr ret = NULL;
>> +    xmlNodePtr *nodes = NULL;
>> +    size_t i;
>> +    int n;
>> +    char *creation = NULL;
> 
> Never used.
> 
>> +    struct timeval tv;
>> +    int active;
>> +    char *tmp;
> 
>     VIR_AUTOPTR(virDomainCheckpointDef) def = NULL;
>     VIR_AUTOFREE(xmlNodePtr *) nodes = NULL;
>     VIR_AUTOFREE(char *) tmp = NULL;
> 
> Erik likes 'em at the bottom too.
> 
> With @ret autofree'd, the goto cleanup is unnecessary and replaced by
> return NULL
> 
> Use of the AUTOFREE stuff really cleans up a lot.

Yeah, snapshot should get it first, and then I should make sure my code
here gets the same improvements.


>> +/* Align def->disks to def->domain.  Sort the list of def->disks,
>> + * filling in any missing disks with appropriate default.  Convert
>> + * paths to disk targets for uniformity.  Issue an error and return -1
>> + * if any def->disks[n]->name appears more than once or does not map
>> + * to dom->disks. */
> 
> Things start to get more complex and confusing. What doesn't make sense
> here is why/how things/order could be different. If this is a checkpoint
> in time, then wouldn't the order saved (somewhere) be the same? So what
> causes something to get out of order?

The user's XML for creating the checkpoint can do:

<domaincheckpoint>
  <disks>
    <disk name='vdb'/>
    <disk name='vda'/>
  </disk>
</domaincheckpoint>

even though the corresponding <domain> at the same time declared things
in the other order (vda before vdb); also, like snapshots, you may only
want to take checkpoints on a subset of your disks (for example, if you
don't care about changes to the OS disk because you can recreate the OS
configuration, but DO care about changes to the user's database disk);
if the <domaincheckpoint> lists only a subset of all disk names, this
function exists to fill in the remaining array elements with sane
defaults, so that the rest of the code can blindly process an entire
array of disks (and just skip the disks that are not participating in
the snapshot), rather than having to repeat checks for which disks are
being manipulated.

> 
> I guess this is part of the "problem" with posting code that's been in
> development for a long time. I'm sure this solves some issue, but what
> that is, I'm not sure.

It's also a drawback of copying heavily from snapshots, which has the
same sort of align function.

> 
> I'm beginning to think splitting things up to get the basic add objects,
> list objects, etc. and then adding in the more complex well we need to
> output a list in this manner or that manner or we have discovered an
> issue with order as a result of some operation and this fixes it would
> be more prudent. Means quite a bit of patch reordering to not expose the
> external API's, but I think will make things easier to review and I
> would think easier to debug and finish up the work.

So, back to that argument of having a common base class rather than
open-coding duplicated efforts :)

> 
>> +int
>> +virDomainCheckpointAlignDisks(virDomainCheckpointDefPtr def)
>> +{
>> +    int ret = -1;
>> +    virBitmapPtr map = NULL;
>> +    size_t i;
>> +    int ndisks;
>> +    int checkpoint_default = VIR_DOMAIN_CHECKPOINT_TYPE_NONE;
>> +
> 
> I assume we'd enter here with sort of lock, right? to prevent some other
> unexpected action from undoing us.

Yes (and same for snapshots), but a comment wouldn't hurt.

> 
>> +    if (!def->dom) {
>> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +                       _("missing domain in checkpoint"));
>> +        goto cleanup;
>> +    }
>> +
>> +    if (def->ndisks > def->dom->ndisks) {
>> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> +                       _("too many disk checkpoint requests for domain"));
>> +        goto cleanup;
>> +    }
>> +
>> +    /* Unlikely to have a guest without disks but technically possible.  */
>> +    if (!def->dom->ndisks) {
>> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
>> +                       _("domain must have at least one disk to perform "
>> +                         "checkpoints"));
>> +        goto cleanup;
>> +    }
> 
> I'm still trying to come to grips with how def->ndisks != def->dom->ndisks.

Users wanting a checkpoint on only a subset of their disks, or a user
messing with XML output and then feeding it back in through the REDEFINE
flag.

> 
>> +
>> +    /* If <disks> omitted, do bitmap on all disks; otherwise, do nothing
>> +     * for omitted disks */
>> +    if (!def->ndisks)
>> +        checkpoint_default = VIR_DOMAIN_CHECKPOINT_TYPE_BITMAP;
>> +
>> +    if (!(map = virBitmapNew(def->dom->ndisks)))
>> +        goto cleanup;
>> +
>> +    /* Double check requested disks.  */
>> +    for (i = 0; i < def->ndisks; i++) {
>> +        virDomainCheckpointDiskDefPtr disk = &def->disks[i];
>> +        int idx = virDomainDiskIndexByName(def->dom, disk->name, false);
>> +
>> +        if (idx < 0) {
>> +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                           _("no disk named '%s'"), disk->name);
>> +            goto cleanup;
>> +        }
>> +
>> +        if (virBitmapIsBitSet(map, idx)) {
>> +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                           _("disk '%s' specified twice"),
>> +                           disk->name);
>> +            goto cleanup;
>> +        }
>> +        ignore_value(virBitmapSetBit(map, idx));
>> +        disk->idx = idx;
>> +
>> +        if (STRNEQ(disk->name, def->dom->disks[idx]->dst)) {
>> +            VIR_FREE(disk->name);
>> +            if (VIR_STRDUP(disk->name, def->dom->disks[idx]->dst) < 0)
>> +                goto cleanup;
>> +        }
>> +    }
> 
> There's something about this remapping of disk->name that doesn't feel
> right or is ripe for something going wrong.  I would think we'd want to
> correllate where we are/were in a previous list with now, but I guess
> I'm not 100% of how this is/will be used.

That's what the bitmap is supposed to track. The first pass sees which
disk names the user passed, and figures out which index value those
disks are in the corresponding domain definition (or errors out if the
same disk is specified more than once or is not present in the domain);
the second pass fills in array entries for all other disks that the user
omitted.

> 
>> +
>> +    /* Provide defaults for all remaining disks.  */
>> +    ndisks = def->ndisks;
>> +    if (VIR_EXPAND_N(def->disks, def->ndisks,
>> +                     def->dom->ndisks - def->ndisks) < 0)
>> +        goto cleanup;
>> +
>> +    for (i = 0; i < def->dom->ndisks; i++) {
>> +        virDomainCheckpointDiskDefPtr disk;
>> +
>> +        if (virBitmapIsBitSet(map, i))
>> +            continue;
>> +        disk = &def->disks[ndisks++];
>> +        if (VIR_STRDUP(disk->name, def->dom->disks[i]->dst) < 0)
>> +            goto cleanup;
>> +        disk->idx = i;
>> +
>> +        /* Don't checkpoint empty drives */
>> +        if (virStorageSourceIsEmpty(def->dom->disks[i]->src))
>> +            disk->type = VIR_DOMAIN_CHECKPOINT_TYPE_NONE;
>> +        else
>> +            disk->type = checkpoint_default;
>> +    }
>> +
>> +    qsort(&def->disks[0], def->ndisks, sizeof(def->disks[0]),
>> +          virDomainCheckpointCompareDiskIndex);

You are right that things might not be correlated until after the qsort,
but after this point, iterating through def->disks[] should visit disks
in the same order as iterating through def->dom->disks[].

>> +
>> +    virBufferEscapeString(buf, "<disk name='%s'", disk->name);
>> +    if (disk->type)
>> +        virBufferAsprintf(buf, " checkpoint='%s'",
>> +                          virDomainCheckpointTypeToString(disk->type));
>> +    if (disk->bitmap) {
>> +        virBufferEscapeString(buf, " bitmap='%s'", disk->bitmap);
>> +        if (flags & VIR_DOMAIN_CHECKPOINT_XML_SIZE)
>> +            virBufferAsprintf(buf, " size='%llu'", disk->size);
> 
> Recall my earlier point in RNG format w/ unsignedLong

Which I hope I addressed.


>> +    virBufferEscapeString(&buf, "<name>%s</name>\n", def->name);
>> +    if (def->description)
>> +        virBufferEscapeString(&buf, "<description>%s</description>\n",
>> +                              def->description);
>> +
>> +    if (def->parent) {
>> +        virBufferAddLit(&buf, "<parent>\n");
>> +        virBufferAdjustIndent(&buf, 2);
>> +        virBufferEscapeString(&buf, "<name>%s</name>\n", def->parent);
>> +        virBufferAdjustIndent(&buf, -2);
>> +        virBufferAddLit(&buf, "</parent>\n");
> 
> Interesting any reason why it's not <parent>%s</parent> - are you
> planning to add to parent. Guess I should have asked earlier, but it
> only dawns on me now seeing this.

Copy-paste from <domainsnapshot>. I don't know what else we might add to
<parent>, but as snapshot left it open like that, it's easier to do the
same here (and then things like virsh can reuse the same XPath queries
for more code reuse when converting a listing from array form into tree
form).


>> +
>> +    if (!(flags & VIR_DOMAIN_CHECKPOINT_XML_NO_DOMAIN) &&
>> +        virDomainDefFormatInternal(def->dom, caps, domflags, &buf, xmlopt) < 0)
>> +        goto error;
>> +
>> +    if (internal)
>> +        virBufferAsprintf(&buf, "<active>%d</active>\n", def->current);
> 
> So this is only valid for ACTIVE guest and not INACTIVE parse/format?
> Perhaps no need for bool, but use @flags...

Here, I've already improved the situation somewhat for snapshots, and so
intend to flow those improvements into checkpoints for v5. But you've
already got me wondering earlier in the series if we still want one file
per checkpoint, or if I'm better off using VIR_DOMAIN_XML_CHECKPOINTS
(where I don't need an internal <active> field after all) for how
libvirt stores checkpoints persistently.

> 
>> +
>> +    virBufferAdjustIndent(&buf, -2);
>> +    virBufferAddLit(&buf, "</domaincheckpoint>\n");
>> +
>> +    if (virBufferCheckError(&buf) < 0)
>> +        return NULL;
>> +
>> +    return virBufferContentAndReset(&buf);
>> +
>> + error:
>> +    virBufferFreeAndReset(&buf);
>> +    return NULL;
>> +}
>> +
> 
> Keeping the *Def logic separate from the *Obj logic is the new norm. The
> snapshot_conf was left alone by me when I went through the others...
> 
> 
> Anything object related should be virdomaincheckpointobj.{c,h}.  That
> means stuff above when separated out could be added sooner in order to
> make sure review cycles shorter and the pile of code smaller. Then to
> start the object code should only need a subset of what's here. Adding
> in the more complex stuff later. There's probably a couple of patches
> worth of changes and functionality all being rolled into one here. I
> think it'll be easier overall to review and get accepted if a more
> logical (and slow) progression to the desired functionality is posted.

Yeah, especially if I refactor the object code to make
virDomainSnapshotObj and virDomainCheckpointObj have a common parent
class, since the two are very similar (both are tied to a point-in-time
<domain> capture, have a notion of one current object amid a set of more
objects, and have a DAG relationship tracking parent/child pointers,
with a possibility of more than one child depending on how reverts play
with checkpoints in the future).

> 
>> +/* Checkpoint Obj functions */
>> +static virDomainCheckpointObjPtr virDomainCheckpointObjNew(void)
> 
> Two lines
> 
>> +{
>> +    virDomainCheckpointObjPtr checkpoint;
>> +
>> +    if (VIR_ALLOC(checkpoint) < 0)
>> +        return NULL;
>> +
>> +    VIR_DEBUG("obj=%p", checkpoint);
>> +
>> +    return checkpoint;
>> +}
>> +
>> +static void virDomainCheckpointObjFree(virDomainCheckpointObjPtr checkpoint)
>> +{
>> +    if (!checkpoint)
>> +        return;
>> +
>> +    VIR_DEBUG("obj=%p", checkpoint);
>> +
>> +    virDomainCheckpointDefFree(checkpoint->def);
>> +    VIR_FREE(checkpoint);
>> +}
> 
> Hmm, so not a real virObject...  The domain and domain snapshot code was
> just too inter-related for me to attempt to modify back when I
> objectified other driver -> object -> def code.
> 
> I think if you follow examples in virstorageobj, virsecretobj,
> virnodedevobj, virinterfaceobj, etc. you'll get a better mechanism than
> what exists in the domain and domain snapshot code for for how
> vir*Object can be used with hash tables.

Indeed, improving virDomainSnapshot first may be a wiser use of my time,
while still getting this checkpoint code ready for inclusion.

> 
> 
> There's no comments here... most important perhaps is on success @def is
> consume in the object and thus cannot be free'd by the caller.

I blame the lack of comments in snapshot code.  (Again, anything I fix
here as a result of your comments, I plan to do to snapshots as well).

> 
>> +
>> +virDomainCheckpointObjPtr virDomainCheckpointAssignDef(virDomainCheckpointObjListPtr checkpoints,
>> +                                                       virDomainCheckpointDefPtr def)
> 
> 
> virDomainCheckpointObjPtr
> virDomainCheckpointAssignDef(...)
> 
> I think "virDomainCheckpointObjListAdd" would be what's happening here.
> 
>> +{
>> +    virDomainCheckpointObjPtr chk;
>> +
>> +    if (virHashLookup(checkpoints->objs, def->name) != NULL) {
> 
> s/ != NULL//
> 
>> +        virReportError(VIR_ERR_OPERATION_INVALID,
>> +                       _("domain checkpoint %s already exists"),
>> +                       def->name);
>> +        return NULL;
>> +    }
>> +
>> +    if (!(chk = virDomainCheckpointObjNew()))
>> +        return NULL;
>> +    chk->def = def;
>> +
>> +    if (virHashAddEntry(checkpoints->objs, chk->def->name, chk) < 0) {
> 
> Hopefully we can never generate two checkpoints by name in the same
> clocktick ;-)

Or two snapshots.


>> +virDomainCheckpointObjListPtr
>> +virDomainCheckpointObjListNew(void)
>> +{
>> +    virDomainCheckpointObjListPtr checkpoints;
>> +    if (VIR_ALLOC(checkpoints) < 0)
>> +        return NULL;
>> +    checkpoints->objs = virHashCreate(50, virDomainCheckpointObjListDataFree);
>> +    if (!checkpoints->objs) {
>> +        VIR_FREE(checkpoints);
>> +        return NULL;
>> +    }
> 
> How is metaroot handled? At this point it's just an empty struct. Guess
> my first inclination was that it was going to be the first checkpoint
> added. That would mean some amount of management when it comes to that
> checkpoint being free'd on us.

The metaroot will ALWAYS be an empty placeholder object. It allows the
rest of the code to have more than one independent user-visible root
(depending on how reverting to an older snapshot ends up interplaying
with checkpoints).


>> +static int
>> +virDomainCheckpointObjListGetNames(virDomainCheckpointObjListPtr checkpoints,
>> +                                   virDomainCheckpointObjPtr from,
>> +                                   char **const names, int maxnames,
> 
> args...
> 
>> +                                   unsigned int flags)
>> +{
>> +    struct virDomainCheckpointNameData data = { names, maxnames, flags, 0,
>> +                                              false };
>> +    size_t i;
>> +
>> +    if (!from) {
>> +        /* LIST_ROOTS and LIST_DESCENDANTS have the same bit value,
>> +         * but opposite semantics.  Toggle here to get the correct
>> +         * traversal on the metaroot.  */
>> +        flags ^= VIR_DOMAIN_CHECKPOINT_LIST_ROOTS;
>> +        from = &checkpoints->metaroot;
> 
> I don't see checkpoints->metaroot being set anywhere yet.  I'd expect a
> Set/Get type API for external callers.
> 

virDomainCheckpointLookupByName(objlist, NULL) returns the metaroot, for
callers that directly manipulate its first_child/nchildren fields. But
as you point out elsewhere, refactoring that into common helper code for
both snapshots and checkpoints to reuse via accessor functions (rather
than direct field access) may be even smarter.

>> +    }
>> +
>> +    /* We handle LIST_ROOT/LIST_DESCENDANTS directly, mask that bit
>> +     * out to determine when we must use the filter callback.  */
>> +    data.flags &= ~VIR_DOMAIN_CHECKPOINT_LIST_DESCENDANTS;
>> +
>> +    /* If this common code is being used, we assume that all checkpoints
>> +     * have metadata, and thus can handle METADATA up front as an
>> +     * all-or-none filter.  XXX This might not always be true, if we
>                                ^^^
> Need to address the XXX...
> 
>> +     * add the ability to track qcow2 bitmaps without the
>> +     * use of metadata.  */

The same XXX exists for snapshots (if we were to ever reconstruct qcow2
snapshots without relying on libvirt metadata).  I don't see it as a
showstopper for the initial implementation.

>> +    if ((data.flags & VIR_DOMAIN_CHECKPOINT_FILTERS_METADATA) ==
>> +        VIR_DOMAIN_CHECKPOINT_LIST_NO_METADATA)
>> +        return 0;
>> +    data.flags &= ~VIR_DOMAIN_CHECKPOINT_FILTERS_METADATA;
>> +
>> +    /* For ease of coding the visitor, it is easier to zero each group
>> +     * where all of the bits are set.  */
>> +    if ((data.flags & VIR_DOMAIN_CHECKPOINT_FILTERS_LEAVES) ==
>> +        VIR_DOMAIN_CHECKPOINT_FILTERS_LEAVES)
>> +        data.flags &= ~VIR_DOMAIN_CHECKPOINT_FILTERS_LEAVES;
>> +
> 
> oh, my head, my eyes, my favorite deity. It seems we're building a
> mostrously complex pile here. I have to wonder whether leaves and
> filters are really worth all this trouble and whether/how this will be
> used.

Sadly, the virsh 'snapshot-list --tree' code DOES use it, and therefore
so does the new 'checkpoint-list --tree' code.  (On the bright side, it
also means that if I refactor things into a common base class for both
snapshots and checkpoints to reuse, I won't be open-coding the stuff twice).

> 
>> +    if (flags & VIR_DOMAIN_CHECKPOINT_LIST_DESCENDANTS) {
>> +        if (from->def)
>> +            virDomainCheckpointForEachDescendant(from,
>> +                                                 virDomainCheckpointObjListCopyNames,
>> +                                                 &data);
>> +        else if (names || data.flags)
>> +            virHashForEach(checkpoints->objs,
>> +                           virDomainCheckpointObjListCopyNames,
>> +                           &data);
>> +        else
>> +            data.count = virHashSize(checkpoints->objs);
>> +    } else if (names || data.flags) {
>> +        virDomainCheckpointForEachChild(from,
>> +                                        virDomainCheckpointObjListCopyNames,
>> +                                        &data);
>> +    } else {
>> +        data.count = from->nchildren;
> 
> Well this seems to be some sort of magic...

Yes, all faithfully copied from snapshots.  But some of it might not be
necessary - while snapshots HAS to cater to the older public API that
returns a count or list of names (rather than the list of objects
directly), the checkpoint code does not have that older API.  Then
again, hiding the common code into a common parent class rather than
re-coding it can't hurt, other than the time it takes for a good
refactoring. (But less technical debt now may be worth it, compared to
the pain of having to revisit this down the road).

>> +virDomainCheckpointObjPtr
>> +virDomainCheckpointFindByName(virDomainCheckpointObjListPtr checkpoints,
>> +                              const char *name)
> 
> Usually have this closer to or above the *Add function and not in the
> midst of these functions that use Hash callbacks...
> 
>> +{
>> +    return name ? virHashLookup(checkpoints->objs, name) :
>> +        &checkpoints->metaroot;
> 
> Would a caller really expect this? That is obj->def == NULL?

NULL is the special case for getting at the metaroot (idea copied from
snapshots), for the external code that manipulates the metaroot's
nchildren/first_child directly.  (And as you've pointed out above,
hiding that in saner accessor functions in a common base class would be
a worthwhile refactoring).

> 
>> +}
>> +
>> +void virDomainCheckpointObjListRemove(virDomainCheckpointObjListPtr checkpoints,
>> +                                      virDomainCheckpointObjPtr checkpoint)
>> +{
> 
> I think when I first read things, it felt like metaroot was the first
> checkpoint ever created...  If that does come to pass this is where the
> well we're about to remove metaroot would occur so we have to replace it
> with it's direct child/descendant.

No, metaroot is a placeholder which can never be created or deleted.
Rather, the user's first checkpoint is a child of metaroot.

> 
>> +    virHashRemoveEntry(checkpoints->objs, checkpoint->def->name);
>> +}
>> +
>> +int
>> +virDomainCheckpointForEach(virDomainCheckpointObjListPtr checkpoints,
>> +                           virHashIterator iter,
>> +                           void *data)
>> +{
>> +    return virHashForEach(checkpoints->objs, iter, data);
>> +}
>> +
>> +/* Run iter(data) on all direct children of checkpoint, while ignoring all
>> + * other entries in checkpoints.  Return the number of children
>> + * visited.  No particular ordering is guaranteed.  */
>> +int
>> +virDomainCheckpointForEachChild(virDomainCheckpointObjPtr checkpoint,
>> +                                virHashIterator iter,
>> +                                void *data)
>> +{
>> +    virDomainCheckpointObjPtr child = checkpoint->first_child;
>> +
>> +    while (child) {
>> +        virDomainCheckpointObjPtr next = child->sibling;
>> +        (iter)(child, child->def->name, data);
>> +        child = next;
>> +    }
>> +
>> +    return checkpoint->nchildren;
>> +}
>> +
>> +struct checkpoint_act_on_descendant {
>> +    int number;
>> +    virHashIterator iter;
>> +    void *data;
>> +};
>> +
>> +static int
>> +virDomainCheckpointActOnDescendant(void *payload,
>> +                                   const void *name,
>> +                                   void *data)
>> +{
>> +    virDomainCheckpointObjPtr obj = payload;
>> +    struct checkpoint_act_on_descendant *curr = data;
>> +
>> +    curr->number += 1 + virDomainCheckpointForEachDescendant(obj,
>> +                                                             curr->iter,
>> +                                                             curr->data);
> 
> Does this double count with the 1 +.
> 
> Having a really hard time understanding what's being done ... What's
> really the difference between a descendant and a child?

For the DAG:

A
  - B
    - C
  - D

The descendends of A are B, C, D; the children of A are just B, D.  The
function returns 1 + 3 (for a total of 4 nodes visited), because:

ActOnDescendent(A) =>
1 + ForEachDescendent(A) =>
1 + ForEachChild(A, ActOnDescendent) =>
1 + ActOnDescendent(B) + ActOnDescendent(D) =>
1 + 1 + ForEachDescendent(B) + 1 + ForEachDescendent(D) =>
1 + 1 + ForEachChild(B, ActOnDescendent) + 1 + ForEachChild(D,
ActOnDescendent) =>
1 + 1 + ActOnDescendent(C) + 1 =>
1 + 1 + 1 + ForEachDescendent(C) + 1 =>
1 + 1 + 1 + ForEachChild(C, ActOnDescendent) + 1 =>
1 + 1 + 1 + 1
the ForEachDescendent

Yeah, it's hairy, but it's a straight copy from snapshots, so I should
really refactor that first into a common base class.

> 
>> +    (curr->iter)(payload, name, curr->data);
>> +    return 0;
>> +}
>> +
>> +/* Run iter(data) on all descendants of checkpoint, while ignoring all
>> + * other entries in checkpoints.  Return the number of descendants
>> + * visited.  No particular ordering is guaranteed.  */
>> +int
>> +virDomainCheckpointForEachDescendant(virDomainCheckpointObjPtr checkpoint,
>> +                                     virHashIterator iter,
>> +                                     void *data)
>> +{
>> +    struct checkpoint_act_on_descendant act;
>> +
>> +    act.number = 0;
>> +    act.iter = iter;
>> +    act.data = data;
>> +    virDomainCheckpointForEachChild(checkpoint,
>> +                                    virDomainCheckpointActOnDescendant, &act);
> 
> So many levels of indirection in multiple directions.  We call
> ForEachChild, which calls an @iter function to act on the descendant
> which calls ForEachDescendant (IOW, this function) again.
> 
> ForEachDescendant and ForEachChild use ActOnDescendant, but have this
> relationship with each other which is hard to think about without some
> sort of picture.
> 
> Maybe I'm thinking of a much simpler model where
> 
> parent <- child1 <- child2 <- child3 [etc.]
> 
> One takes a checkpoint and it is the parent.  Then at some point in the
> future another checkpoint is taken. So it's now the child1 with it's
> parent.  When the next checkpoint is taken, it's grandparent is parent
> and child1 is it's parent.

The simplest implementation is that you are indeed creating a linear
link of checkpoints. But I don't know how snapshot reversion will play
into the future (for qemu, or more importantly, for other domains that
may track checkpoints differently via tracking generation ids on every
cluster rather than a bitmap of which clusters were changed since a
point in time).  And since reverting to a snapshot IS a very likely
action, I'd rather leave the door open to having a forest of DAGs for
checkpoints just as we permit a forest of DAGs for snapshots (especially
if I enhance virDomainSnapshotCreateXML to also capture a checkpoint at
that time).

> 
> Getting a desendant of some "child#" I thought was a linear operation.
> If it's more of a tree operation then the model used by the hash table
> to bucket and link same hashed values into a bucket would then be more
> useful.

The snapshot operation is definitely tree-based, but the linked list
approach of first_child/next_sibling was sufficient rather than having a
full-on hash table of children.

> 
>> +
>> +    return act.number;
>> +}
>> +
>> +/* Struct and callback function used as a hash table callback; each call
>> + * inspects the pre-existing checkpoint->def->parent field, and adjusts
>> + * the checkpoint->parent field as well as the parent's child fields to
>> + * wire up the hierarchical relations for the given checkpoint.  The error
>> + * indicator gets set if a parent is missing or a requested parent would
>> + * cause a circular parent chain.  */
>> +struct checkpoint_set_relation {
>> +    virDomainCheckpointObjListPtr checkpoints;
>> +    int err;
>> +};
> 
> Having a hard time picturing the use...

Easy - see my recent patch on VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE_LIST.
When we read in multiple checkpoints at once (and not necessarily in
topological order), we then do a post-parse pass over the list of
checkpoints to reconstruct the parent/child relationship pointers.

> 
>> +static int
>> +virDomainCheckpointSetRelations(void *payload,
>> +                                const void *name ATTRIBUTE_UNUSED,
>> +                                void *data)> +{
>> +    virDomainCheckpointObjPtr obj = payload;
>> +    struct checkpoint_set_relation *curr = data;
>> +    virDomainCheckpointObjPtr tmp;
>> +
>> +    obj->parent = virDomainCheckpointFindByName(curr->checkpoints,
>> +                                                obj->def->parent);
>> +    if (!obj->parent) {
>> +        curr->err = -1;
>> +        obj->parent = &curr->checkpoints->metaroot;
>> +        VIR_WARN("checkpoint %s lacks parent", obj->def->name);
>> +    } else {
>> +        tmp = obj->parent;
>> +        while (tmp && tmp->def) {
>> +            if (tmp == obj) {
>> +                curr->err = -1;
>> +                obj->parent = &curr->checkpoints->metaroot;
>> +                VIR_WARN("checkpoint %s in circular chain", obj->def->name);
>> +                break;
>> +            }
>> +            tmp = tmp->parent;
>> +        }
>> +    }
>> +    obj->parent->nchildren++;
>> +    obj->sibling = obj->parent->first_child;
>> +    obj->parent->first_child = obj;
> 
> Ah so I see where nchildren comes from... Not sure I'm clear yet on the
> need for sibling/first_child. This would appear to be inserting
> something into the middle of a list.

I track:

A
 - B
   - C
   - D
 - E
 - F
   - G
   - H

by having B->nchildren == 2, B->first_child = C, C->sibling = D,
D->sibling = NULL, B->sibling = E, E->sibling = F, F->first_child = G,
G->sibling = H, H->sibling = NULL, F->sibling = NULL

Not so much inserting in the middle of the list, but at the front (any
time I associate a child with its parent, the child's sibling pointer
takes over the parent's original first_child, the parent's nchildren
increments, and the parent's first_child becomes the newly inserted
child).  The lists are generally short (in fact, never more than one
element long if you don't branch into having multiple children), so it
isn't worth keeping a pointer to the tail of a list.

> 
>> +    return 0;
>> +}
>> +
>> +/* Populate parent link and child count of all checkpoints, with all
>> + * relations starting as 0/NULL.  Return 0 on success, -1 if a parent
>> + * is missing or if a circular relationship was requested.  */
>> +int
>> +virDomainCheckpointUpdateRelations(virDomainCheckpointObjListPtr checkpoints)
>> +{
>> +    struct checkpoint_set_relation act = { checkpoints, 0 };
>> +
> 
> Updating the relations for everyone? Having a hard time thinking about
> the usage model that would need this.  If an objet is removed, the
> update is simple - alter the parent/child.  If for some reason we're
> rebuilding the tree - what is that reason? What causes this to be
> necessary. I know, you've been at this for a long time.

Rebuilding the tree is needed for when libvirtd is first started
(reading multiple files of <domainsnapshot>/<domaincheckpoint>), as well
as when doing REDEFINE_LIST (which has a precondition of no existing
snapshots because it is about to read in multiple and then rebuild the
relationships all at once).

> 
>> +    virHashForEach(checkpoints->objs, virDomainCheckpointSetRelations, &act);
>> +    return act.err;
>> +}
>> +
>> +/* Prepare to reparent or delete checkpoint, by removing it from its
>> + * current listed parent.  Note that when bulk removing all children
>> + * of a parent, it is faster to just 0 the count rather than calling
>> + * this function on each child.  */
>> +void
>> +virDomainCheckpointDropParent(virDomainCheckpointObjPtr checkpoint)
>> +{
>> +    virDomainCheckpointObjPtr prev = NULL;
>> +    virDomainCheckpointObjPtr curr = NULL;
>> +
>> +    checkpoint->parent->nchildren--;
> 
> Decrementing the child count before actually dropping the child?

Copied from snapshots; if there's a latent bug there, it should be a
separate fix.  Meanwhile, refactoring to a common base class should get
rid of the worst of understanding this code (it's easier to check if
code motion is correct than it is to see if a duplicated open-code
repeat is correct).


>> +int
>> +virDomainListAllCheckpoints(virDomainCheckpointObjListPtr checkpoints,
>> +                            virDomainCheckpointObjPtr from,
>> +                            virDomainPtr dom,
>> +                            virDomainCheckpointPtr **chks,
>> +                            unsigned int flags)
>> +{
> 
> There's better examples than what *snapshots do in order to build/export
> a list of virDomainCheckpointPtr's - usually in *Export type functions.
>

Except for the fact that the public APIs have nice filters for listing
roots only vs. roots and all their descendents, for both the domain and
for a subtree rooted in a particular checkpoint, and that virsh already
takes advantage of those filters in implementing its --tree output.

>> +    int count = virDomainCheckpointObjListNum(checkpoints, from, flags);
> 
> Go direct virHashSize

Works if we aren't filtering to a particular subtree, but if any
filtering is involved, then you don't want the return value to include
the objects that were filtered out.

> 
>> +    virDomainCheckpointPtr *list = NULL;
>> +    char **names;
>> +    int ret = -1;
>> +    size_t i;
>> +
>> +    if (!chks || count < 0)
> 
> <= 0?

No, because when the answer is 0,...

> 
>> +        return count;
>> +    if (VIR_ALLOC_N(names, count) < 0 ||
>> +        VIR_ALLOC_N(list, count + 1) < 0)
>> +        goto cleanup;

...we still return a valid NULL-terminated list.

>> +
>> +    if (virDomainCheckpointObjListGetNames(checkpoints, from, names, count,
>> +                                         flags) < 0)
> 
> indent is off
> 
>> +        goto cleanup;
>> +    for (i = 0; i < count; i++)
>> +        if ((list[i] = virGetDomainCheckpoint(dom, names[i])) == NULL)
> 
> Other examples will show that this is done in the callback function. I
> know you've found it easier to use the same callback function for
> several different kinds of operations. I tried that too when I did the
> common object changes and got told/reminded during review that one
> function should serve 3 or 4 purposes as it makes each callback function
> more unnecessarily complicated especially w/r/t knowing what each caller
> may expect.  So take that advice as something to follow for the round.
> 
>> +            goto cleanup;
>> +
>> +    ret = count;
>> +    *chks = list;
>> +
>> + cleanup:
>> +    for (i = 0; i < count; i++)
>> +        VIR_FREE(names[i]);
>> +    VIR_FREE(names);
>> +    if (ret < 0 && list) {
>> +        for (i = 0; i < count; i++)
>> +            virObjectUnref(list[i]);
>> +        VIR_FREE(list);
>> +    }
>> +    return ret;
>> +}
>> +
>> +
> 
> This is like the vir*ObjList*Export* function from other code.
> 
> Only a very sparse look/scan of anything below this point.
> 
> John

Thanks for the feedback so far. (And yes, I was totally expecting your
point that refactoring the snapshot code to be cleaner, rather than just
blindly copying its existing code into checkpoints, is a better
reduction of overall technical debt, even if it will cost more time to
get the patches polished).


>> +    /* Parsing XML does not populate the domain definition, so add a
>> +     * canned bare-bones fallback */
>> +    if (!def->dom) {
>> +        // HACK
> 
> yep.
> 
>> +        ret = 77;
>> +        goto cleanup;

Yeah, and now you see why I have 'wip' because testing is not finalized :)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the libvir-list mailing list