[Crash-utility] crash and libvirt, and more

Wed Aug 20 14:04:06 UTC 2008

Richard W.M. Jones wrote:
> On Tue, Aug 19, 2008 at 04:32:35PM -0400, Dave Anderson wrote:
>> Out of curiousity, any reason why a libvirt interface couldn't be
>> created that accesses guest pseudo-physical addresses?  And does
>> the existing interface accept vmalloc addresses, or only unity-mapped
>> kernel virtual addresses?
> 
> I'm afraid I didn't fully understand your previous comment about
> vmalloc & unity-mapped addresses.  I'm not sure what a "unity-mapped"
> address is, and I thought vmalloc just used ordinary kernel addresses
> above PAGE_OFFSET.

It depends upon the architecture, but all of them have at least one
kernel virtual address range where, the PAGE_OFFSET value can be stripped off,
resulting in the physical address.  So "unity-mapping" just means a one-to-one
address translation, since the physical address is right "there" encoded in the
virtual address value.

For bare-metal kernels, the physical address is then read directly from the
dumpfile or /dev/mem -- or since Red Hat restricts /dev/mem, the /dev/crash
driver is loaded as replacement for /dev/mem.

For xen kernels specifically, the stripped address is a pseudo-physical
address, but the xendump dumpfile formats (the "old" and newer-ELF-style
formats) require those to be translated to machine-address values.
So then the p2m table needs to be accessed to turn the pseudo-physical
into a machine address, and then that address is what is found and accessed
from the dumpfile.

On the other hand, vmalloc() addresses are a range of kernel virtual addresses,
typically above PAGE_OFFSET (but not necessarily on all architectures), that
are mapped to some unknown physical address.  Therefore, they *do* require a
full page-table walkthrough, because, like user virtual addresses, there's no
indication in the vmalloc address value as to what the underlying physical address
is.  And, FWIW, the page-table walkthrough is far more involved in our xen
writeable-page-table kernels than on bare-metal kernels because the physical
address encoded into each page-table entry is a *machine* address -- that must
be translated back into pseudo-physical address in order to find the next page-table.

BTW, for xen unity-mapped address translations, a full page-table walk could
be done as well, but it's unnecessary.  But I digress...

So in any case, the underlying libvirt guest kernel virtual memory access
function sounds like it translates them regardless whether they're
unity-mapped or vmalloc-mapped addresses, so I think you've answered my
question.

The issue at hand is that the crash utility is "founded" upon the
access of physical-addresses.  So all readmem() requests are translated
to physical addresses before the resultant value is passed on to
/dev/mem or /dev/crash (live systems), or to the myriad of dumpfile
formats that crash supports (about a dozen of them).

Getting back to the libvirt adaptation into crash, yes, the requests for
unity-mapped -- and if I understand it correctly -- for vmalloc addresses,
could be passed to your interface directly without a translation to
physical.  And for crash readmem(PHYSADDR, ...) direct requests, they
could presumably be turned into unity-mapped addresses by applying
PAGE_OFFSET, and then passing that value to libvirt -- with one significant
caveat on 32-bit systems.  On 32-bit systems, only the first 896MB of
physical memory can be accessed via unity-mapped kernel virtual addresses.
The remaining 128MB is given to vmalloc and a handful of other hardwired-type
virtual addresses on the top end.  So, for example, you wouldn't be able
to read user virtual addresses on larger 32-bit systems because
their page tables and resultant physical pages are biased to be
located in highmem, i.e., above the 896MB limit.  That would
also be a problem for vmalloc addresses *if* the libvirt
interface only handled unity-mapped addresses, which was the
genesis of my question.

And that's also why I asked about the issue of creating a libvirt
interface that accepted pseudo-physical addresses.  If that were
in place, it would simply mimic /dev/mem (/dev/crash), and it
should "just work".  There are a couple of other minor gotchas that
would also have to be handled for live guest access, like for example,
the access of /proc/version.

> 
> Anyway, in libvirt we rely on what the underlying hypervisor can do.
> In the case of QEMU/KVM, the QEMU monitor supports a simple "memsave"
> command.  This command takes three parameters: start, size and a
> filename, and it saves the memory from start to start+size-1 into the
> file.  Along the way it translates these virtual addresses through CR3
> / the page tables (or the equivalent on non-x86 architectures).
> 
> We could offer a way to get at physical addresses, but it would
> require getting a patch accepted into QEMU & KVM (separate but loosely
> synchronized codebases), and then a corresponding change in libvirt.
> Then there's a long wait while everyone updates to the newest versions
> of everything and finally physical memory peeking would be possible
> through libvirt.

Yep, understood...

> 
> For the Xen driver's virDomainMemoryPeek call -- which isn't
> implemented in libvirt yet -- it's actually a lot easier to use
> physical addresses, because you request from the hypervisor that pages
> from another domain be mapped into your process using an ioctl which
> takes physical addresses.  In order to provide compatibility with the
> existing software using virDomainMemoryPeek we were planning on
> implementing the page table lookups ourselves within libvirt.

Right -- and what I'm suggesting is letting the crash utility do
all the dirty work for you by just giving crash access to pseudo-physical
addresses of the target guest.  By doing that, for all practical
purposes, crash wouldn't even know that it was even dealing with a
"remote" system.

Dave