<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">12.04.2018 15:57, Nikolay Shirokovskiy wrote: </div> <blockquote type="cite" cite="mid:820d3ac8-9c22-4b00-b3b1-f6046ac30454@virtuozzo.com"> <pre wrap=""> On 12.04.2018 07:14, John Snow wrote: </pre> <blockquote type="cite"> <pre wrap=""> On 04/11/2018 12:32 PM, Eric Blake wrote: </pre> <blockquote type="cite"> <pre wrap="">On 04/03/2018 07:01 AM, Nikolay Shirokovskiy wrote: </pre> <blockquote type="cite"> <pre wrap="">Hi, all. This is another RFC on pull backup API. This API provides means to read domain disks in a snapshotted state so that client can back them up as well as means to write domain disks to revert them to backed up state. The previous version of RFC is [1]. I'll also describe the API implementation details to shed light on misc qemu dirty bitmap commands usage. </pre> </blockquote> <pre wrap=""> </pre> </blockquote> </blockquote> <pre wrap=""> [snip] </pre> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> Qemu can track what disk's blocks are changed from snapshotted state so on next backup client can backup only changed blocks. virDomainBlockSnapshotCreateXML accepts VIR_DOMAIN_BLOCK_SNAPSHOT_CREATE_CHECKPOINT flag to turn this option for snapshot which means to track changes from this particular snapshot. I used checkpoint term and not [dirty] bitmap because many qemu dirty bitmaps are used to provide changed blocks from the given checkpoint to current snapshot in current implementation (see *Implemenation* section for more details). Also bitmap keeps block changes and thus itself changes in time and checkpoint is a more statical terms means you can query changes from that moment in time. Checkpoints are visible in active domain xml: <disk type='file' device='disk'> .. <target dev='sda' bus='scsi'/> <alias name='scsi0-0-0-0'/> <checkpoint name="93a5c045-6457-2c09-e56c-927cdf34e178"> <checkpoint name="5768a388-c1c4-414c-ac4e-eab216ba7c0c"> .. </disk> </pre> </blockquote> </blockquote> <pre wrap=""> It makes sense to avoid the bitmap name in libvirt, but do these indeed correlate 1:1 with bitmaps? I assume each bitmap will have name=%%UUID%% ? </pre> </blockquote> <pre wrap=""> There is 1:1 correlation but names are different. Checkout checkpoints subsection of *implementation details* section below for naming scheme. </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">Every checkpoint requires qemu dirty bitmap which eats 16MiB of RAM with default dirty block size of 64KiB for 1TiB disk and the same amount of disk space is used. So client need to manage checkpoints and delete unused. Thus next API function: </pre> </blockquote> </blockquote> </blockquote> <pre wrap=""> [snip] </pre> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">First a few facts about qemu dirty bitmaps. Bitmap can be either in active or disable state. In disabled state it does not get changed on guest writes. And oppositely in active state it tracks guest writes. This implementation uses approach with only one active bitmap at a time. This should reduce guest write penalties in the presence of checkpoints. So on first snapshot we create bitmap B_1. Now it tracks changes from the snapshot 1. On second snapshot we create bitmap B_2 and disable bitmap B1 and so on. Now bitmap B1 keep changes from snaphost 1 to snapshot 2, B2 - changes from snaphot 2 to snapshot 3 and so on. Last bitmap is active and gets most disk change after latest snapshot. </pre> </blockquote> </blockquote> <pre wrap=""> So you are trying to optimize away write penalties if you have, say, ten bitmaps representing checkpoints so we don't have to record all new writes to all ten. This makes sense, and I would have liked to formalize the concept in QEMU, but response to that idea was very poor at the time. Also my design was bad :) </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> Getting changed blocks bitmap from some checkpoint in past till current snapshot is quite simple in this scheme. For example if the last snapshot is 7 then to get changes from snapshot 3 to latest snapshot we need to merge bitmaps B3, B4, B4 and B6. Merge is just logical OR on bitmap bits. Deleting a checkpoint somewhere in the middle of checkpoint sequence requires merging correspondent bitmap to the previous bitmap in this scheme. </pre> </blockquote> </blockquote> <pre wrap=""> Previous, or next? </pre> </blockquote> <pre wrap=""> In short previous. </pre> <blockquote type="cite"> <pre wrap=""> Say we've got bitmaps (in chronological order from oldest to newest) A B C D E F G H and we want to delete bitmap (or "checkpoint") 'C': A B D E F G H the bitmap representing checkpoint 'D' should now contain the bits that used to be in 'C', right? That way all the checkpoints still represent their appropriate points in time. </pre> </blockquote> <pre wrap=""> I merge to previous due to definition above. "A" contains changes from point in time A to point in time B an so no. So if you delete C in order for B to keep changes from point in time B to point in time D (next in checkpoint chain) you need merge C to B. </pre> <blockquote type="cite"> <pre wrap=""> The only problem comes when you delete a checkpoint on the end and the bits have nowhere to go: A B C A B _ In this case you really do lose a checkpoint -- but depending on how we annotate this, it may or may not be possible to delete the most recent checkpoint. Let's assume that the currently active bitmap that doesn't represent *any* point in time yet (because it's still active and recording new writes) is noted as 'X': A B C X If we delete C now, then, that bitmap can get re-merged into the *active bitmap* X: A B _ X </pre> </blockquote> <pre wrap=""> You can delete any bitmap (and accordingly any checkpoint). If checkpoint is last we just merge last bitmap to previous and additioanlly make the previous bitmap active.</pre> </blockquote> I propose, not to say that bitmap represents a checkpoint. It is simpler to say (and it reflects the reality) that bitmap is a difference between two consecutive checkpoints. And we can say, that active state is some kind of a checkpoint, current point in time. So, we have checkpoints (5* is an active state) which are points in time: 1 2 3 4 5* And bitmaps, first three are disabled, last is enabled: "1->2", "2->3", "3->4", "4->5*" So, remove first checkpoint: just remove bitmap "A->B". Remove any other checkpoint N: create new bitmap "(N-1)->(N+1)" = merge("(N-1)->N", "N->(N+1)"), drop bitmaps "(N-1)->N" and "N->(N+1)". If the latter was active, the new one becomes active. And we cant remove 5* checkpoint, as it is an active state, not an actual checkpoint. <blockquote type="cite" cite="mid:820d3ac8-9c22-4b00-b3b1-f6046ac30454@virtuozzo.com"> <pre wrap=""> </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">We use persitent bitmaps in the implementation. This means upon qemu process termination bitmaps are saved in disks images metadata and restored back on qemu process start. This makes checkpoint a persistent property that is we keep them across domain start/stops. Qemu does not try hard to keep bitmaps. If upon save something goes wrong bitmap is dropped. The same is applied to the migration process too. For backup process it is not critical. If we don't discover a checkpoint we always can make a full backup. Also qemu provides no special means to track order of bitmaps. These facts are critical for implementation with one active bitmap at a time. We need right order of bitmaps upon merge - for snapshot N and block changes from snanpshot K, K < N to N we need to merge bitmaps B_{K}, ..., B_{N-1}. Also if one of the bitmaps to be merged is missing we can't calculate desired block changes too. </pre> </blockquote> </blockquote> <pre wrap=""> Right. A missing bitmap anywhere in the sequence invalidates the entire sequence. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">So the implementation encode bitmap order in their names. For snapshot A1, bitmap name will be A1, for snapshot A2 bitmap name will be A2^A1 and so on. Using this naming encoding upon domain start we can find out bitmap order and check for missing ones. This complicates a bit bitmap removing though. For example removing a bitmap somewhere in the middle looks like this: - removing bitmap K (bitmap name is NAME_{K}^NAME_{K-1} - create new bitmap named NAME_{K+1}^NAME_{K-1} ---. - disable new bitmap | This is effectively renaming - merge bitmap NAME_{K+1}^NAME_{K} to the new bitmap | of bitmap K+1 to comply the naming scheme - remove bitmap NAME_{K+1}^NAME_{K} ___/ - merge bitmap NAME_{K}^NAME_{K-1} to NAME_{K-1}^NAME_{K-2} - remove bitmap NAME_{K}^NAME_{K-1} As you can see we need to change name for bitmap K+1 to keep our bitmap naming scheme. This is done creating new K+1 bitmap with appropriate name and copying old K+1 bitmap into new. </pre> </blockquote> </blockquote> <pre wrap=""> That seems... unfortunate. A record could be kept in libvirt instead, couldn't it? A : Bitmap A, Time 12:34:56, Child of (None), Parent of B B : Bitmap B, Time 23:15:46, Child of A, Parent of (None) </pre> </blockquote> <pre wrap=""> Yes it is possible. I was reluctant to implement this way for a couple of reasons: - if bitmap metadata is in libvirt we need carefully design it for things like libvirtd crashes. If metadata is out of sync with qemu then we can get broken incremental backups. One possible design is: - on bitmap deletion save metadata after deletion bitmap in qemu; in case of libvirtd crash in between upon libvirtd restart we can drop bitmaps that are in metadata but not in qemu as already deleted - on bitmap add (creating new snapshot with checkpoint) save metadata with bitmap before creating bitmap in qemu; then again we have a way to handle libvirtd crashes in between So this approach has tricky parts too. The suggested approach uses qemu transactions to keep bitmap consistent. - I don't like another metadata which looks like belongs to disks and not a domain. It is like keeping disk size in domain xml. </pre> <blockquote type="cite"> <pre wrap=""> I suppose in this case you can't *reconstruct* this information from the bitmap stored in the qcow2, which necessitates your naming scheme... ...Still, if you forego this requirement, deleting bitmaps in the middle becomes fairly easy. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">So while it is possible to have only one active bitmap at a time it costs some exersices at managment layer. To me it looks like qemu itself is a better place to track bitmaps chain order and consistency. </pre> </blockquote> <pre wrap=""> </pre> </blockquote> <pre wrap=""> If this is a hard requirement, it's certainly *easier* to track the relationship in QEMU ...</pre> </blockquote> </blockquote> It's not easier, as we'll have to implement either separate of bitmaps concept of checkpoints, which will be based on bitmaps, and we'll have to negotiate and implement storing these objects to qcow2 and migrate them. Or we'll go through proposed by Kevin (If I remember correctly) way of adding something like "backing" or "parent" pointer to BdrvDirtyBitmap, and anyway store to qcow2, migrate and expose qapi for them. The other (more hard way) is move to multi-bit bitmaps (like in vmware), where for each granularity-chunk we store a number, representing "before which checkpoint was the latest change of this chunk", and the same, qapi+qcow2+migration. It's all not easier than call several simple qmp commands. <blockquote type="cite" cite="mid:820d3ac8-9c22-4b00-b3b1-f6046ac30454@virtuozzo.com"> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <pre wrap="">Libvirt is already tracking a tree relationship between internal snapshots (the virDomainSnapshotCreateXML), because qemu does NOT track that (true, internal snapshots don't get as much attention as external snapshots) - but the fact remains that qemu is probably not the best place to track relationship between multiple persistent bitmaps, any more than it tracks relationships between internal snapshots. So having libvirt track relations between persistent bitmaps is just fine. Do we really have to rename bitmaps in the qcow2 file, or can libvirt track it all on its own? </pre> </blockquote> <pre wrap=""> This is a way, really, of storing extra metadata by using the bitmap name as arbitrary data storage. I'd say either we promote QEMU to understanding checkpoints, or enhance libvirt to track what it needs independent of QEMU -- but having to rename bitmaps smells fishy to me. </pre> <blockquote type="cite"> <pre wrap="">Earlier, you said that the new virDomainBlockSnapshotPtr are independent, with no relations between them. But here, you are wanting to keep incremental backups related to one another. </pre> </blockquote> <pre wrap=""> I think the *snapshots*, as temporary objects, are independent and don't carry a relation to each other. The *checkpoints* here, however, are persistent and interrelated. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> Now how exporting bitmaps looks like. - add to export disk snapshot N with changes from checkpoint K - add fleece blockdev to NBD exports - create new bitmap T - disable bitmap T - merge bitmaps K, K+1, .. N-1 into T </pre> </blockquote> </blockquote> <pre wrap=""> I see; so we compute a new slice based on previous bitmaps and backup arbitrary from that arbitrary slice. So "T" is a temporary bitmap meant to be discarded at the conclusion of the operation, making it much more like a consumable object. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> - add bitmap to T to nbd export - remove disk snapshot from export - remove fleece blockdev from NBD exports - remove bitmap T </pre> </blockquote> </blockquote> <pre wrap=""> Aha. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">Here is qemu commands examples for operation with checkpoints, I'll make several snapshots with checkpoints for purpuse of better illustration. - create snapshot d068765e-8b50-4d74-9b72-1e55c663cbf8 with checkpoint - same as without checkpoint but additionally add bitmap on fleece blockjob start ... { "execute": "transaction" "arguments": { "actions": [ { "type": "blockdev-backup" "data": { "device": "drive-scsi0-0-0-0", "sync": "none", "target": "snapshot-scsi0-0-0-0" }, }, { "type": "block-dirty-bitmap-add" "data": { "name": "libvirt-d068765e-8b50-4d74-9b72-1e55c663cbf8", "node": "drive-scsi0-0-0-0", "persistent": true }, } </pre> </blockquote> <pre wrap=""> </pre> </blockquote> <pre wrap=""> So a checkpoint creates a reference point, but NOT a backup. You are manually creating checkpoint instances. In this case, though, you haven't disabled the previous checkpoint's bitmap (if any?) atomically with the creation of this one... </pre> </blockquote> <pre wrap=""> In the example this is first snapshot so there is no previous checkpoint and thus nothing to disable. </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <pre wrap="">Here, the transaction makes sense; you have to create the persistent dirty bitmap to track from the same point in time. The dirty bitmap is tied to the active image, not the backup, so that when you create the NEXT incremental backup, you have an accurate record of which sectors were touched in snapshot-scsi0-0-0-0 between this transaction and the next. </pre> <blockquote type="cite"> <pre wrap=""> ] }, } - delete snapshot d068765e-8b50-4d74-9b72-1e55c663cbf8 - same as without checkpoints - create snapshot 0044757e-1a2d-4c2c-b92f-bb403309bb17 with checkpoint - same actions as for the first snapshot, but additionally disable the first bitmap </pre> </blockquote> <pre wrap=""> Again, you're showing the QMP commands that libvirt is issuing; which libvirt API calls are driving these actions? </pre> <blockquote type="cite"> <pre wrap=""> ... { "execute": "transaction" "arguments": { "actions": [ { "type": "blockdev-backup" "data": { "device": "drive-scsi0-0-0-0", "sync": "none", "target": "snapshot-scsi0-0-0-0" }, }, { "type": "x-vz-block-dirty-bitmap-disable" "data": { </pre> </blockquote> <pre wrap=""> Do you have measurements on whether having multiple active bitmaps hurts performance? I'm not yet sure that managing a chain of disabled bitmaps (and merging them as needed for restores) is more or less efficient than managing multiple bitmaps all the time. On the other hand, you do have a point that restore is a less frequent operation than backup, so making backup as lean as possible and putting more work on restore is a reasonable tradeoff, even if it adds complexity to the management for doing restores. </pre> </blockquote> <pre wrap=""> Depending on the number of checkpoints intended to be kept... we certainly make no real promises on the efficiency of marking so many. It's at *least* a linear increase with each checkpoint... </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> "name": "libvirt-d068765e-8b50-4d74-9b72-1e55c663cbf8", "node": "drive-scsi0-0-0-0" }, }, { "type": "block-dirty-bitmap-add" "data": { "name": "libvirt-0044757e-1a2d-4c2c-b92f-bb403309bb17^d068765e-8b50-4d74-9b72-1e55c663cbf8", "node": "drive-scsi0-0-0-0", "persistent": true }, } ] }, } </pre> </blockquote> </blockquote> <pre wrap=""> Oh, I see, you handle the "disable old" case here. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">- delete snapshot 0044757e-1a2d-4c2c-b92f-bb403309bb17 - create snapshot 8fc02db3-166f-4de7-b7aa-1f7303e6162b with checkpoint - add disk snapshot 8fc02db3-166f-4de7-b7aa-1f7303e6162b to export and bitmap with changes from checkpoint d068765e-8b50-4d74-9b72-1e55c663cbf8 - same as add export without checkpoint, but aditionally - form result bitmap - add bitmap to NBD export ... { "execute": "transaction" "arguments": { "actions": [ { "type": "block-dirty-bitmap-add" "data": { "node": "drive-scsi0-0-0-0", "name": "libvirt-__export_temporary__", "persistent": false }, }, { "type": "x-vz-block-dirty-bitmap-disable" "data": { "node": "drive-scsi0-0-0-0" "name": "libvirt-__export_temporary__", }, }, { "type": "x-vz-block-dirty-bitmap-merge" "data": { "node": "drive-scsi0-0-0-0", "src_name": "libvirt-d068765e-8b50-4d74-9b72-1e55c663cbf8" "dst_name": "libvirt-__export_temporary__", }, }, { "type": "x-vz-block-dirty-bitmap-merge" "data": { "node": "drive-scsi0-0-0-0", "src_name": "libvirt-0044757e-1a2d-4c2c-b92f-bb403309bb17^d068765e-8b50-4d74-9b72-1e55c663cbf# "dst_name": "libvirt-__export_temporary__", }, } ] }, } </pre> </blockquote> </blockquote> <pre wrap=""> OK, so in this transaction you add a new temporary bitmap for export, and merge the contents of two bitmaps into it. However, it doesn't look like you created a new checkpoint and managed that handoff here, did you? </pre> </blockquote> <pre wrap=""> We don't need create checkpoints for the purpuse of exporting. Only temporary bitmap to merge appropriate bitmap chain.</pre> </blockquote> But in this case we will export something strange. Actually, we can only export data changed from some checkpoint to newly created, if we start fleecing in same transaction with creating a new checkpoint. And we can't create backup from one checkpoint in past to any other checkpoint in past (because corresponding data may be already changed after the second checkpoint). <blockquote type="cite" cite="mid:820d3ac8-9c22-4b00-b3b1-f6046ac30454@virtuozzo.com"> <pre wrap=""> </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> { "execute": "x-vz-nbd-server-add-bitmap" "arguments": { "name": "sda-8fc02db3-166f-4de7-b7aa-1f7303e6162b" "bitmap": "libvirt-__export_temporary__", "bitmap-export-name": "d068765e-8b50-4d74-9b72-1e55c663cbf8", }, </pre> </blockquote> </blockquote> <pre wrap=""> And then here, once the bitmap and the data is already frozen, it's actually alright if we add the export at a later point in time. </pre> <blockquote type="cite"> <pre wrap=""> Adding a bitmap to a server is would would advertise to the NBD client that it can query the "qemu-dirty-bitmap:d068765e-8b50-4d74-9b72-1e55c663cbf8" namespace during NBD_CMD_BLOCK_STATUS, rather than just "base:allocation"? </pre> </blockquote> <pre wrap=""> Don't know much about this, I stopped paying attention to the BLOCK STATUS patches. Is the NBD spec the best way to find out the current state right now?</pre> </blockquote> </blockquote> I'm afraid, yes, branch extension-blockstatus <a class="moz-txt-link-freetext" href="https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md">https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md</a>. You can search for "Metadata querying" paragraph. <blockquote type="cite" cite="mid:820d3ac8-9c22-4b00-b3b1-f6046ac30454@virtuozzo.com"> <blockquote type="cite"> <pre wrap=""> (Is there a less technical, briefer overview somewhere, perhaps from a commit message or a cover letter?) </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> } - remove snapshot 8fc02db3-166f-4de7-b7aa-1f7303e6162b from export - same as without checkpoint but additionally remove temporary bitmap ... { "arguments": { "name": "libvirt-__export_temporary__", "node": "drive-scsi0-0-0-0" }, "execute": "block-dirty-bitmap-remove" } </pre> </blockquote> </blockquote> <pre wrap=""> OK, this just deletes the checkpoint. I guess we delete the node and </pre> </blockquote> <pre wrap="">I would not call it checkpoint. Checkpoint is something visible to client. An ability to get CBT from that point in time. Here we create a temporary bitmap to calculate desired CBT. </pre> <blockquote type="cite"> <pre wrap="">stop the NBD server too, right? </pre> </blockquote> <pre wrap=""> yeah, just like in case without checkpoint (mentioned in this case description) </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">- delete checkpoint 0044757e-1a2d-4c2c-b92f-bb403309bb17 (similar operation is described in the section about naming scheme for bitmaps, with difference that K+1 is N here and thus new bitmap should not be disabled) </pre> </blockquote> <pre wrap=""> A suggestion on the examples - while UUIDs are nice and handy for management tools, they are a pain to type and for humans to quickly read. Is there any way we can document a sample transaction stream with all the actors involved (someone issues a libvirt API call XYZ, libvirt in turn issues QMP command ABC), and using shorter names that are easier to read as humans? </pre> </blockquote> <pre wrap=""> Yeah, A-B-C-D terminology would be nice for the examples. It's fine if the actual implementation uses UUIDs. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> { "arguments": { "actions": [ { "type": "block-dirty-bitmap-add" "data": { "node": "drive-scsi0-0-0-0", "name": "libvirt-8fc02db3-166f-4de7-b7aa-1f7303e6162b^d068765e-8b50-4d74-9b72-1e55c663cbf8", "persistent": true }, }, { "type": "x-vz-block-dirty-bitmap-merge" "data": { "node": "drive-scsi0-0-0-0", "src_name": "libvirt-0044757e-1a2d-4c2c-b92f-bb403309bb17^d068765e-8b50-4d74-9b72-1e55c663cbf# "dst_name": "libvirt-d068765e-8b50-4d74-9b72-1e55c663cbf8", }, }, { "type": "x-vz-block-dirty-bitmap-merge" "data": { "node": "drive-scsi0-0-0-0", "src_name": "libvirt-8fc02db3-166f-4de7-b7aa-1f7303e6162b^0044757e-1a2d-4c2c-b92f-bb403309bb1# "dst_name": "libvirt-8fc02db3-166f-4de7-b7aa-1f7303e6162b^d068765e-8b50-4d74-9b72-1e55c663cbf# }, }, ] }, "execute": "transaction" } { "execute": "x-vz-block-dirty-bitmap-remove" "arguments": { "node": "drive-scsi0-0-0-0" "name": "libvirt-8fc02db3-166f-4de7-b7aa-1f7303e6162b^0044757e-1a2d-4c2c-b92f-bb403309bb17", }, }, { "execute": "x-vz-block-dirty-bitmap-remove" "arguments": { "node": "drive-scsi0-0-0-0" "name": "libvirt-0044757e-1a2d-4c2c-b92f-bb403309bb17^d068765e-8b50-4d74-9b72-1e55c663cbf8", }, } Here is a list of bitmap commands used in implementation but not yet in upstream (AFAIK). x-vz-block-dirty-bitmap-remove </pre> </blockquote> </blockquote> <pre wrap=""> We already have this, right? It doesn't even need to be transactionable. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">x-vz-block-dirty-bitmap-merge </pre> </blockquote> </blockquote> <pre wrap=""> You need this... </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">x-vz-block-dirty-bitmap-disable </pre> </blockquote> </blockquote> <pre wrap=""> And this we had originally but since removed, but can be re-added trivially. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">x-vz-block-dirty-bitmap-enable (not in the examples; used when removing most recent checkpoint) x-vz-nbd-server-add-bitmap </pre> </blockquote> </blockquote> <pre wrap=""> Do my comments make sense? Am I understanding you right so far? I'll try to offer a competing writeup to make sure we're on the same page with your proposed design before I waste any time trying to critique it -- in case I'm misunderstanding you. </pre> </blockquote> <pre wrap=""> Yes, looks like we are in tune. </pre> <blockquote type="cite"> <pre wrap=""> Thank you for leading the charge and proposing new APIs for this feature. It will be very nice to expose the incremental backup functionality we've been working on in QEMU to users of libvirt. --js </pre> </blockquote> <pre wrap=""> There are also patches too ( if API design survive review phase at least partially :) ) </pre> <blockquote type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">*Restore operation nuances* As it was written above to restore a domain one needs to start it in paused state, export domain's disks and write them from backup. However qemu currently does not let export disks for write even for a domain that never starts guests CPU. We have an experimental qemu command option -x-vz-nbd-restore (passed together with -incoming option) to fix it. </pre> </blockquote> <pre wrap=""> Why can't restore be done while the guest is offline? (Oh right, we still haven't added decent qemu-img support for bitmap manipulation, so we need a qemu process around for any bitmap changes). </pre> </blockquote> <pre wrap=""> I'm working on this right now, actually! I'm working on JSON format output for bitmap querying, and simple clear/delete commands. I hope to send this out very soon. </pre> <blockquote type="cite"> <pre wrap="">As I understand it, the point of bitmaps and snapshots is to create an NBD server that a third-party can use to read just the dirty portions of a disk in relation to a known checkpoint, to save off data in whatever form it wants; so you are right that the third party then needs a way to rewrite data from whatever internal form it stored it in back to the view that qemu can consume when rolling back to a given backup, prior to starting the guest on the restored data. Do you need additional libvirt APIs exposed for this, or do the proposed APIs for adding snapshots cover everything already with just an additional flag parameter that says whether the <domainblocksnapshot> is readonly (the third-party is using it for collecting the incremental backup data) or writable (the third-party is actively writing its backup into the file, and when it is done, then perform a block-commit to merge that data back onto the main qcow2 file)? </pre> </blockquote> </blockquote> </blockquote> <pre class="moz-signature" cols="72">-- Best regards, Vladimir</pre> </body> </html>