[Crash-utility] PATCH 00/10] teach crash to work with "live" ramdump

Tue Apr 26 16:11:46 UTC 2016

----- Original Message -----
> On 04/26, Dave Anderson wrote:
> >
> > > OK. Suppose we add ACTIVE_QEMU() helper. IMO this is a bad idea in any case, the core
> > > code should not even know that this kernel runs under qemu. Nevermind, suppose we have
> > > say
> > >
> > > 	#define ACTIVE_QEMU() ((pc->flags & LIVE_SYSTEM) && (pc->flags2 && QEMU))
> > >
> > > Now what? We need the same 1-7 patches, just LOCAL_ACTIVE() should be replaced
> > > with "ACTIVE() && !QEMU_ACTIVE()".
> >
> > Correct.  ACTIVE() is used ~100 times, and in the vast majority of cases, its use
> > applies to a live QEMU/KVM session. In the few circumstances that it doesn't, then
> > ACTIVE_QEMU() should be applied so that it's obvious to the maintainer (me), what
> > the issue is.
> 
> to a live QEMU/KVM session, and/or to any other live-and-remote session, please see
> below.
> 
> > Who know what "live" mechanism may come about in the future that  may also have its own
> > quirks?  I don't want to hide it, but rather make it strikingly obvious.
> 
> Ah, but this is another story.
> 
> I mean... OK, as 00/10 says, my vague/distant goal is teach /usr/bin/crash to use
> gdb-remote protocol to debug the live guests. And in this case ACTIVE_QEMU() makes
> a lot of sense. Say, cmd_bt() can use it to get the registers/trace even if the
> process is running, pause/resume the guest, etc.
> 
> But all the LOCAL_ACTIVE changes in 1-7 do not care about the details of "live"
> mechanism at all. So I still think we need a generic helper which should be true
> if local-and-active. Or, vice versa, remote-and-active, this doesn't matter.
> 
> > > 	--- a/kernel.c
> > > 	+++ b/kernel.c
> > > 	@@ -2900,7 +2900,7 @@ back_trace(struct bt_info *bt)
> > > 			return;
> > > 		}
> > >
> > > 	-	if (ACTIVE() && !INSTACK(esp, bt)) {
> > > 	+	if (LOCAL_ACTIVE() && !INSTACK(esp, bt)) {
> > > 			sprintf(buf, "/proc/%ld", bt->tc->pid);
> > > 			if (!file_exists(buf, NULL))
> > > 				error(INFO, "task no longer exists\n");
> > >
> > > The usage of ACTIVE() is obviously wrong if this is the live (so that ACTIVE()
> > > is true) but remote kernel. We should not even try to look at /proc files on
> > > the local system in this case.
> >
> > Correct.  So restrict it meaningfully (to me anyway).
> 
> So you suggest to change this patch to do
> 
> 		if (ACTIVE() && !ACTIVE_QEMU() && !INSTACK(...))
> 
> To me this simply looks worse, but I won't insist. But note that if we ever have
> another ACTIVE_SOMETHING() source, we will need to modify this code again.  While
> this code do not care about qemu/something at all. So I still think we need a new
> helper which doesn't depend on qemu or whatever else.

Right, but this is definitely the outlier with respect to "live" systems.

> > > Or perhaps you mean that ACTIVE_QEMU() should be defined as
> > >
> > > 	#define ACTIVE_QEMU()	(pc->flags2 & QEMU_LIVE)
> > >
> > > ? iow, it should not imply ACTIVE() ? This would be even worse, in this case we
> > > would neet to replace almost every ACTIVE() with "ACTIVE() || ACTIVE_QEMU()".

QEMU_LIVE should be in pc->flags, and appear as part of MEMORY_SOURCES.  And LIVE_SYSTEM
should also be set so that the facility falls under both ACTIVE() and ACTIVE_QEMU().
And then in the subset of cases where ACTIVE() is too broad, ACTIVE_QEMU() can be added
as a restriction.

But the above is not relevant with respect to some new extension of the ramdump.

> >
> > I agree that there are a handful of circumstances that you have run into where
> > ACTIVE() may not apply, such as the case where /proc was accessed.  But I don't
> > understand why you say "almost every" instance?
> 
> Ah, sorry for confusion. I meant, If we add ACTIVE_QEMU() it should imply
> ACTIVE(), otherwise we have even more problems.

Correct.

> 
> > Why?  If the target is live, then all of the above should be called as-is.  Each
> > of them returns if the target is a dumpfile.
> 
> Yes, sure, see above. If ACTIVE_QEMU() plugin sets LIVE_SYSTEM flag too, most users
> of ACTIVE() are fine.
> 
> > > OK, lets suppose we add this feature... How do you think the command line should
> > > look?
> > >
> > > I mean, after this series we can do, say,
> > >
> > > 	./crash vmlinux raw:DUMP_1 at OFFSET_1,DUMP_2 at OFFSET_2
> > >
> > > if we have 2 ramdump's which should be used together. How do you think the new
> > > syntax should look? I am fine either way.
> >
> > I guess I've got some basic misunderstandings here...
> >
> > If it's a live system, why is necessary to specify RAM offsets?
> 
> I suspect we will need offsets in more complex situations, qemu can have multiple
> memory-backend-file/numa options.  And perhaps even a single file may need it,
> not sure.

But with any live system, crash reads the relevant kernel data structures and sets
up its picture of the system's physical memory accordingly.  There's no need to specify
where the memory lies -- it's all available in the live kernel itself.

On the other hand, typical dumpfile headers give the crash utility instructions
on how to randomly access physical memory in the dump, i.e., like the PT_LOAD
segments in an ELF vmcore.  Ramdumps don't have any header information, so the
physical memory blocks have to be specified on the crash command line -- and then
crash creates a temporary in-memory ELF header for subsequent memory reads.
(or the user can specify "-o dumpfile" to transform the ramdump into a kdump clone.   

> 
> > And if you're just emulating the ramdump facility by first dumping the guest's memory
> > into a dumpfile, why isn't it just a ramdump clone?
> 
> Sorry, can't understand... could you spell?

I'm not sure because that's what I don't understand.  You seem to be describing two completely
different facilities:

  (1) a live access facility like /dev/mem, et al, but to a live KVM guest
  (2) some kind of ramdump facility?

And if it's a ramdump facility, couldn't you just copy it from the guest to the host
and analyze it there?

Dave