[libvirt] [PATCH 17/16] snapshot: Store qemu snapshot state in bulk

Mon Mar 25 17:19:19 UTC 2019

On Fri, Mar 22, 2019 at 01:07:04PM -0500, Eric Blake wrote:
> [keeping context, because of adding Dan in cc]
> > To some degree we could be 'stuck with' the model for snapshots, but
> > that doesn't mean checkpoints couldn't make use of some newer
> > functionality that stores everything in one file and not allow the
> > storage into multiple files.

I think it would be a really bad idea to use a different approach for
snapshots vs checkpoints given how similar they are in every other
respect.

> > 
> > This is one of those gray areas for me that from an overall
> > architectural level that we struggle with related to dealing with our
> > technical debt because we've guaranteed some sort of backcompat for
> > various things "forever".
> 
> It's not the first time where upgrading libvirt performs a one-way of
> the old-style internals to the new-style, where you then can't downgrade
> libvirt, but at least the bulk of the libvirt code no longer has to
> worry about generating both old- and new-style output.  But yes, getting
> a second opinion won't hurt.

We've never considered libvirt downgrades to be supported, especially
not with running VMs.  Downgrades will offline VMs haven't been supported
either but have been muuuuuuch less likely to break in general.  Since
this is in reference to critical VM state info, I'm a bit more concerned
about this downgrade issue.  If we decide its important though, the only
solutions become to not change the format, or to always write both old
and new formats which makes the new format rather pointless to support.

> >> There is a slight chance for confusion if a user named two separate
> >> domains 'foo' and 'foo.xml'; the new scheme expects 'foo.xml' to be a
> >> regular file for the former domain, while the old scheme expected
> >> 'foo.xml' to be a directory for the latter domain; if we are worried
> >> about that, we could tweak the code to instead output new state in a
> >> file named 'foo' instead of the more typical 'foo.xml' and just key
> >> off of whether the file is regular or a directory when deciding if
> >> this is the first libvirt run after an upgrade. But I felt that the
> >> chances of a user abusing domain names like that is not worth the
> >> effort.
> > 
> > I think this is just another thing to document. As in, if you've done
> > this and use snapshots, then "fix your stupidity" ;-).  Why anyone other
> > that QE to test all possibilities would name a domain with ".xml" in the
> > name suffix goes beyond common sense to me.

Relying on common sense isn't a good idea. This kind of naming clash is
the kind of thing that is useful for exploiting  a service to access
data you shouldn't be allowed to.

I think we should change the naming here to avoid risk of the clash
if we're going to do this.

> >> The bulk snapshot file can be huge (over RPC, we allow <domain> to be
> >> up to 4M, and we allow up to 16k snapshots; since each each snapshot
> >> includes a <domain>, a worst-case domain would result in gigabytes of
> >> bulk data); it is no worse on overall filesystem usage than before,
> >> but now in a single file vs. a series of files it requires more memory
> >> to read in at once.  I don't know if we have to worry about that in
> >> practice, but my patch does cap things to read in no more than an
> >> arbitrarily-picked 128M, which we may have to raise in the future.
> > 
> > and even more.  I think when I first reviewed this I thought of the RPC
> > limit stuff - at least w/r/t how the stats code is the other place where
> > we've hit this limit. But for some reason I felt that perhaps some under
> > the covers code in that code we solved something that allowed for things
> > to work, but now I'm not so sure.
> > 
> 
> I don't know if virXML has some way of visiting a seekable stream, and
> parsing in only chunks at a time. You still have to PARSE the entire
> file to learn where the chunks are, but if the parser is stream based
> and keeps track of offsets where interesting chunks start, it is perhaps
> feasable that you could cap your memory usage by revisiting chunks as
> needed. (But doing that to save memory ALSO means reparsing portions as
> you reload them into memory, so you're slower than if you had kept the
> full file in memory anyway).

As soon as we talk about making the stream seekable & parsing chunks
for efficiency, this is a red flag to me saying that we should not
go down this route. We already have a filesystem layout that lets us
read just the individual snapshots that we care about one at a time.
Creating this problem by putting all snapshots into a single file
and then trying to solve it by creating indexes for chunks inside the
file is basically creating a filesystem within a file. 

> Or maybe we need to think about ways to compress the information (most
> of the <domain>s in the overall <snapshotlist> will share a lot of
> commonality, even if the user hot-plugged devices between snapshots).
> But that's a big project, which I don't want to tackle.

On disk storage space is practically free, so I don't see compression
being neccessary, especially if you compare the XML size to the actual
disk image snapshot data.

Overall, I'm struggling to be convinced that putting all snapshots into
a single file is a compelling enough think todo. I can see there are
some benefits, but there are also benefits to the current per-file
approach too. I worry that the bulk file will create efficiency problems
later that will require us to go back to per-file impl later, or reinvent
filesystems-within-a-file.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|