[Crash-utility] crash and libvirt, and more

Mon Aug 18 20:01:41 UTC 2008

Richard W.M. Jones wrote:
> Hi:
> 
> I don't know if you're aware of this, but libvirt[1] recently added a
> call which allows you to snoop on the live memory of guests,
> virDomainMemoryPeek[2]:
> 
> int virDomainMemoryPeek (virDomainPtr dom,
>     			 unsigned long long start,  /* start address */
> 			 size_t size,               /* size (bytes) */
> 			 void * buffer,		    /* return buffer */
> 			 unsigned int flags);
> 
> This would allow, in theory, for crash to debug running guests.  I had
> a look at the crash code and it doesn't seem like it would be too hard
> to add this.
> 
> We [the libvirt team] only support this for QEMU & KVM guests at the
> moment, but we plan to support this call for Xen in the near future.
> Also, the call only works on virtual memory addresses (in other words,
> the address is translated through the guest's page tables), but in
> practice that isn't too bad because the common configuration for Linux
> is to map all of physical memory at some address, eg. 0xc0000000 on
> i386.  Also the peek operation is read-only.
> 
> So if you are interested, let me know, and I will attempt a patch.

That is very interesting, i.e., as opposed to just logging into the guest and
running crash live there.  Though, there are a number of places where the
crash readmem() function is passed a physical address, and I wonder whether
it's going to be hampered by that?  Say for translating and reading vmalloc/
module addresses, user task virtual addresses, etc.  Can the "start" address
above be a vmalloc address?  In those cases, things might get a bit more
involved -- as opposed to the simple "readmem(KVADDR, ...)" case where if it's
a unity-mapped address it could jump to libvirt function above without having
to turn it into a physical address.  In other words, all physical address
readmem() requests are not necessarily the result of a kernel virtual address
reference that's been pre-translated.  And if the pseudo-physical address is
beyond the 32-bit unity-mapping limit, you couldn't turn it back into a
unity-mapped kernel virtual address, so I don't know how you could access it?

I may be missing something, so by all means don't let me stop you from trying
though...  ;-)

> 
> 	-	-	-
> 
> Now, the bigger picture ...
> 
> For some months now we've been attempting to write system
> administrator tools to mimic common sysadmin commands, except that
> they work on guests.  For example 'virt-ps <guest>' lists out the
> process table in <guest>.  It runs from the host and works by snooping
> guest memory using virDomainMemoryPeek.
> 
> We have had some success, although it's been quite a lot harder than
> we imagined it would be.  At the moment we have 'virt-dmesg',
> 'virt-uname', 'virt-ifconfig' and 'virt-ps', plus a handful of custom
> commands, working to a greater or lesser extent.
> 
> However I wasn't aware before that crash could already do this
> (particularly 'log', 'ps', 'mount' and 'net' commands), and in fact
> crash has a lot more complete support for these commands than we do.
> So it makes sense to use crash to do this, instead of continuing with
> our separate implementation, if we can make it work.
> 
> I think there are two things that we'd need to add to crash in order
> to get this working:
> 
> (i) Scripting.  I'm aware that there are two scripting projects for
> crash out there already, but it looked fairly immature and/or
> unsupported.  However, not too hard to pull these projects up to
> standard and/or add some scripting support, or use expect.
> 
> (ii) Getting the debug symbols.
> 
> Item (ii) is the big deal for us.  Our current virt-* tools can work
> with a wide range of kernels.
> 
> What we do is to download the kernel-debuginfo packages beforehand,
> extract only the tiny amount of debug info we actually need from
> vmlinux, and build a 'kernel database'.  (We're using dwarves to get
> the layout of the dozen or so structures that we care about).  It
> turns out that it's quite easy to heuristically determine the version
> of a running kernel, and from that we can look up the structures in
> the kernel database at runtime.
> 
> Upshot is that we support currently ~ 350 kernels with a database
> which is a modest 1 MB in size, and probably could be made smaller
> with very little effort.
> 
> The problem I haven't yet resolved with using crash is that we need a
> matching, identical vmlinux image (ie. 50-100 MB) per guest kernel
> version.  In the case where we see a kernel version we've not seen
> before, we may have to download this and store it somewhere.
> 
> The alternative seems to involve some really deep hacking inside gdb,
> perhaps so it can be persuaded to use only partial debug info?
> 
> I don't know if you have any thoughts about (ii).

Other than "good luck", I don't have any thoughts about that one...

Dave