[libvirt] [PATCH 3/3] introduce pull backup

John Ferlan jferlan at redhat.com
Thu Sep 29 22:02:08 UTC 2016



[...]

>>
>> Because it's also dependent upon an x-blockdev-del, it cannot be pushed
>> upstream to libvirt. I know qemu work continues w/r/t blockdev-add and
>> backups, but I don't follow it in detail (not enough time in the day!).
> 
> Ok, at least the patch can be some kind of candidate to be pushed upstream
> as soon as blockdev-del drops x- prefix.
> 

Not as a single patch though - I have a feeling you're going to have a
git branch committed to this for a while...  Consider my recent response
to your 1/3 patch w/r/t qemu's statement about the blockdev-add command...

My *guess* is when 'blockdev-del' is officially supported - that's when
libvirt can "assume" support for blockdev-add (since the x- was taken
off of it - something that we can document when that happens).

[...]


>>
>> BTW: Using QUIESCE would then rely on the guest agent being available
>> for the domain...  Just making a mental note, but yet something you have
>> a dependency upon.
> 
> Well it is just a flag, just as in case of snapshots. As to coding conventions
> etc a lot of stuff is rooted in similar snapshot code and xml.
> 

Something that just needs to be remembered - oh and documented in
virsh.pod...

[...]

>>> diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
>>> index 2ec560e..c1f8c6c 100644
>>> --- a/include/libvirt/virterror.h
>>> +++ b/include/libvirt/virterror.h
>>> @@ -131,6 +131,7 @@ typedef enum {
>>>      VIR_FROM_XENXL = 64,        /* Error from Xen xl config code */
>>>  
>>>      VIR_FROM_PERF = 65,         /* Error from perf */
>>> +    VIR_FROM_DOMAIN_BACKUP = 66,/* Error from domain backup */
>>
>> Use a shorter name, suggest "DOMBKUP"
>>
> 
> but there is VIR_FROM_DOMAIN_SNAPSHOT... and it is more pleasant
> for eyes.
> 

OK - now that I know SNAPSHOT is your basis...  BTW: IIRC the snapshot
code didn't make it all in one patch...

>>>  
>>>  # ifdef VIR_ENUM_SENTINELS
>>>      VIR_ERR_DOMAIN_LAST
>>> diff --git a/po/POTFILES.in b/po/POTFILES.in
>>> index 25dbc84..4cdeb2f 100644
>>> --- a/po/POTFILES.in
>>> +++ b/po/POTFILES.in
>>> @@ -18,6 +18,7 @@ src/bhyve/bhyve_driver.c
>>>  src/bhyve/bhyve_monitor.c
>>>  src/bhyve/bhyve_parse_command.c
>>>  src/bhyve/bhyve_process.c
>>> +src/conf/backup_conf.c
>>
>> why not domain_backup_conf.{c,h}
> 
> similar to snapshot_conf.{c,h}
> 

It's just a file name in the long run; however, what if in the future
the storage driver added some sort of backup, then we'd have
'storage_backup_conf.c' to clearly describe that it was meant for
storage.  I think the domain_<function>_conf.c should be used...  And
yes, the same argument can be made for snapshot_conf.c - perhaps it
should be renamed.

[...]


>>> +VIR_ENUM_IMPL(virDomainBackupAddress, VIR_DOMAIN_BACKUP_ADDRESS_LAST,
>>> +              "ip",
>>
>> IPv4, IPv6, IPnotinventedyet
>>
>> why not "tcp", "tls", etc.? That would seemingly allow a bit of code
>> reuse for client transport. There's also virStorageNetHostTransport
> 
> Name 'tcp' is more appropriate indeed )) I would not use virStorageNetHostTransport
> and associated structures and xml becasue they have different semantics.
> In case of backups we have address client can connect to, in case of 
> network disks we have address qemu as client can connect to.
> 

Creating a specific name is fine - I was just pointing out the
StorageNet for names to consider.

>>
>> FYI: also see qemuMonitorGraphicsAddressFamily
>>
> 
> And this one has right semantics. I remember I evaluated it as reuse candidate.
> I thought it was awkward to have port in upper level element:
> 
>     <graphics type='vnc' port='5904' sharePolicy='allow-exclusive'>
>       <listen type='address' address='1.2.3.4'/>
>     </graphics>
> 
> if we could declare it outdated and recommend to use new style...
> 
>     <graphics type='vnc' sharePolicy='allow-exclusive'>
>       <listen type='address' address='1.2.3.4' port='5904' />              (autoport, tlsPort etc goes here too...)
>     </graphics>
> 
> Actually we already do this for address attribute:
> "The address attribute is duplicated as listen attribute in graphics element for backward compatibility. If both are provided they must be equal."
> 
> So if you are ok with this, i would definetly reuse this element and
> associated structures.
> 

Again - just pointed out different things to consider. My knowledge of
blockdev-backup is miniscule (I think I spelled it right).

[...]

>>
>> Why differ from existing transport definitions? Seems like there would
>> be code reuse opportunities.
>>
>> BTW: If you're going to have a network protocol, then shouldn't you also
>> have an authentication mechanism?
> 
> AFAIK there is not any yet for pull backups. The situation is in case of
> migration, it is not secure when moving state and disks. This is what
> Daniel is working on AFAIK.
>

Just something that may have to be considered - although who knows. I've
recently added LUKS support and now am being told about all those corner
cases that weren't considered and where things didn't work quite right -
so I'm just trying to consider various things as they come to mind. With
any luck the end result will be better.

[...]


>>> +
>>> +    if ((n = virXPathNodeSet("./disks/*", ctxt, &nodes)) < 0)
>>> +        goto cleanup;
>>
>> So the purpose of disks is what?
> 
> At least to specify where temporary backup files should be.
> 

There's nothing specific in the <disks> xml - you could just list 'n'
disks and use specific XML API's to get you a count of <disk> elements.
Thus the <disks> just seemed superfluous.

[...]


>>> --- a/src/qemu/qemu_domain.h
>>> +++ b/src/qemu/qemu_domain.h
>>> @@ -284,6 +284,10 @@ struct _qemuDomainDiskPrivate {
>>>       * single disk */
>>>      bool blockjob;
>>>  
>>> +    bool backuping;
>>
>> Is this like hiccuping ;-)
>>
>> Maybe consider an enum of sorts for a state of a backup... none,
>> possible, in process, failed, ?? others... Haven't got that far yet.
> 
> i'll check, but looks like bool is enough. There is 'migrating'
> by the way, I can name it 'backup_active' if you prefer.
> 

It just looked funny to read...

>>
>> If it's possible to backup, then there's perhaps some sanity checks that
>> need to be made or at least synchronizing with migration. Would be
>> horrible to start a migration and then start a backup. Just thinking off
>> the cuff without looking at the rest of the code.
>>
>>> +    bool backupdev;
>>
>> bool ?  cannot wait to see this!
> 
> ok, backupdev_created
> 

Sounds like you need a enum that can go through states "not desired",
"desired", "selected", "in process", "succeeded", etc.

>>
>>> +    bool backupFailed;
>>> +

And this is the one that is the kicker... Once you have failed, you'd
need to decide how to handle any others that were "in process" somehow
or had been processed successfully.  Error recovery could be a full time
job - similar to error recovery for snapshot failures (internal,
external, make my head spin)

[...]

>>> +        if (virAsprintf(&disk->src->path, "%s.backup", tmppath) < 0) {
>>
>> A name selection problem...   Seems to me some way to use a temporary
>> file name could be devised.  Consider using ".XXXXXX" and then mkostemp
>> (there are other examples).
>>
>> Once you've successfully backed things up the name then changes to
>> whatever user supplied name is provided.  And of course similar issues
>> exist there - do you overwrite existing files?
> 
> It is a pull backup. User don't really need this temporary file. It is
> keeped only in the process of backup as inverse delta to keep disk
> state at the moment of backup start. So there is no reason to rename.
> 
> As to randomly generated names... Do we really need this? We can afford
> only one backup at a time, one backup delta per disk...
> 

I thought there was a whole naming thing with snapshots - not sure.
Certainly synergies can be made.

>>
>> You may also end up with a space problem. That is - does the size of
>> your backup exceed the size of the space to be written. Again, I didn't
>> check following code to see if/whether that's true - it just came to
>> mind as another one of those areas we need to consider.
> 
> You can not check space at all. The size of backup inverse delta heavily
> depends on guest activity. It can be 0, it can be full disk size.
> 

Ok - again something that having done backups before I'd have to
consider... It's more of a general comment...

>>
>> We also have to consider the security aspects of creating a file and
>> using security manager for "domain" files (the selinux, apparmor, etc)
>> setup for labels and the such.
> 
> While this is true I thought this can be done in next patches which 
> can address other issues like disk cleanups, handling daemon
> restart etc.
> 

If history has taught us anything it's that future patches sometimes
aren't written... Leaving that task to someone else to pick up.

>>
>>
>>> +            VIR_FREE(tmppath);
>>> +            return -1;
>>> +        }
>>> +
>>> +        VIR_FREE(tmppath);
>>> +
>>> +        /* verify that we didn't generate a duplicate name */
>>> +        for (j = 0; j < i; j++) {
>>> +            if (STREQ_NULLABLE(disk->src->path, def->disks[j].src->path)) {
>>
>> Wouldn't this only work if we used the same name from within the domain?
>>  What if another domain used the ".backup" file name?  IOW: This isn't
>> checking if "on disk" right now there isn't a file with the name you
>> just attempted to create.
> 
> Well I have a copy-paste argument ) IOW snapshots use this already.
> It comes from b60af444 and here to handle special case when
> image paths differ only in suffix. Ok, to make some really bulletproof protection
> we need to make functions like virStorageFileBackendFileCreate
> to include O_EXCL on open.
> 

Again - if snapshots is your basis then we should consider more code
sharing options. Not sure how to accomplish that at this point, but
that's the suggestion at this point in time.

[...]

>>> +    /* Double check requested disks.  */
>>> +    for (i = 0; i < def->ndisks; i++) {
>>
>> I cannot believe the number of times we traverse this array. For 1 or 2
>> disks - easy... For 100-200 disks, that could be brutal especially error
>> paths...
> 
> At least we stay at O(n) complexity. '200 disks' is for real?
> 

In a former company, 1000 was the test case.  People uses guests for all
different kinds of things. I don't have that many disks, but there is a
numerical problem. Recently it was pointed out on list that going
through ndisks in migration code should be done in one place and let's
try to limit the number of times we need to run through the list.

Works great in my test environment of 1 or 2 disks, can't understand why
it's failing for (name your favorite large company) with many more disks...

[...]

>>> +static int
>>> +qemuDomainBackupExportDisks(virQEMUDriverPtr driver,
>>> +                            virDomainObjPtr vm,
>>> +                            virDomainBackupDefPtr def)
>>> +{
>>> +    qemuDomainObjPrivatePtr priv = vm->privateData;
>>> +    char *dev = NULL;
>>> +    char *dev_backup = NULL;
>>> +    int ret = -1, rc;
>>> +    virJSONValuePtr actions = NULL;
>>> +    size_t i;
>>> +    virHashTablePtr block_info = NULL;
>>> +
>>> +    qemuDomainObjEnterMonitor(driver, vm);
>>> +    rc = qemuMonitorNBDServerStart(priv->mon, def->address.data.ip.host,
>>> +                                   def->address.data.ip.port);
>>
>> Well it may have been really good to note that usage of the NBD server
>> was taking place somewhere earlier.
> 
> You mean track nbd server usage? To give better errors or smth?
> 

Like you have a reliance on guest agent for quiesce - you have a
reliance on nbd server to do something...

I think this is where I went back and wrote you need to consider
networked disks in your XML (the auth portion).

I have bz that's looking to add/allow TLS for NBD server starts...

>>
>>> +    if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0)
>>> +        return -1;
>>> +
>>> +    if (!(actions = virJSONValueNewArray()))
>>> +        goto cleanup;
>>> +
>>> +    qemuDomainObjEnterMonitor(driver, vm);
>>> +    block_info = qemuMonitorGetBlockInfo(priv->mon);
>>> +    if (qemuDomainObjExitMonitor(driver, vm) < 0 || !block_info)
>>> +        goto cleanup;
>>> +
>>
>> Each of the following goes through ndisks array and does
>> multiple/different things, but it's not clear what error recovery looks
>> like if any one fails - the cleanup path is shall we say sparse while
>> the setup is multiple steps.  I haven't compared it to the migration code...
> 
> I tried to address failure in code, so that flags are triggered after
> specific action in the loop, like 'blockdev'. I doubt that splitting
> the loop will make code more comprehensible. On cleanup you need to undo
> setup actions in reverse order starting from last successful one which
> means use 'i' counter after goto and this is more complicated than setup 
> per disk flags in my opinion.
> 

At this point the brain is already going to mush and the eyes are tired,
but you get the point - there's something "not quite right" with number
of times going through the loop.

[...]

>>
>> and realistically my time as run out.  I really didn't look too much in
>> depth at the qemu_driver code. My head was spinning from the number of
>> times going through the ndisks and the inlined code...
>>
>>
>> John
> 
> Thank you for review!
> 
> Nikolay
> 

Thank you for starting to think about this. It's a tricky and complex
issue based upon code that isn't quite baked in QEMU yet. I think given
more cycles in my day and perhaps smaller piles of patches eventually
we'll hammer out some working code.  I'm sure I'll learn a lot about
blockdev during the next few months too...


John




More information about the libvir-list mailing list