[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libguestfs] Fwd: Inspection of disk snapshots

On Tue, Mar 24, 2015 at 10:54:05AM +0200, NoxDaFox wrote:
> I was sure I was doing something wrong as I'm not yet fully aware of QCOW2
> snapshot feature and how it interacts with libguestfs.
> I'll try to explain better the scenario:
> I have several hosts running lots of VMs which are generated from few base
> images, say A, B, C the base images (backing file) and A1, A2, A*, B1, B2,
> B* clones on top of which the newly spawned VMs are running.
> I need to collect the disk states of A*, B*, C* machines and see what has
> been written there. I don't care about the whole content as the base images
> content A, B, C are well known to me, only thing it matters are the deltas
> of the new clones.
> One more piece in the puzzle is that the inspection does not happen on the
> hosts running the VMs but on a dedicated server.
> My idea was to collect those "snapshots" (generic term not the QEMU one)
> from the hosts and send them to my inspection server. As A, B and C are
> accessible from that server only thing I need is to rebase those snapshot
> to correctly inspect them through libguestfs, and it proved to work (I'm
> using readonly mode as I only care about reading the disks). I'm not really
> interested in having consistent point-in-time state of the disks as the
> operation is done several times a day so I can cope with semi-consistent
> data as it can be easily re-constructed.
> My real problem comes when I try to inspect the disk snapshot: libguestfs
> will, of course, let me see the whole content of the disks, which means A +
> A*. Apart from the waste of CPU time spend on looking at files I already
> know the state (the ones contained in A), it generates a lot of noise. A
> Linux base image with some library installed consists in 20+ K files,
> installing something extra (Apache server for example) just brings some
> hundreds new files and I'm interested only in those ones.
> So my real question is: is there a way to distinguish the files contained
> in the two different disk images (A and A1) or shall I think about a
> totally different approach?

Well we have a tool called virt-diff
(http://libguestfs.org/virt-diff.1.html) which prints the differences
between two disks.  It's quite commonly used to show the differences
between an original base image and a snapshot taken some time later,
so you can tell which files have been modified by the guest.

Now virt-diff works by opening both disks, reading all of the metadata
(or even the file content if you use the --checksum option), and then
internally diffing it and presenting the result.

Of course this means it's not especially fast, but it's the way that
it has to work: The snapshot doesn't contain "files which changed", it
contains underlying device blocks which changed.  It operates a whole
layer or two below the filesystem.

To do this from Python is not particularly hard, but you'll have to
read the C and translate it.  The guts of the algorithm are in the
recursive "visitor" mini-library:


There are alternatives -- perhaps parsing the qcow2 snapshot, and
mapping disk blocks back to files -- but they won't be very easy to
implement.  I wrote a highly experimental* tool called 'virt-bmap' that
may be of interest:



* = if it breaks, you get to keep all the pieces

Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]