[Crash-utility] PATCH 00/10] teach crash to work with "live" ramdump

Tue Apr 26 15:24:09 UTC 2016

On 04/26, Dave Anderson wrote:
>
> > OK. Suppose we add ACTIVE_QEMU() helper. IMO this is a bad idea in any case, the core
> > code should not even know that this kernel runs under qemu. Nevermind, suppose we have
> > say
> >
> > 	#define ACTIVE_QEMU() ((pc->flags & LIVE_SYSTEM) && (pc->flags2 && QEMU))
> >
> > Now what? We need the same 1-7 patches, just LOCAL_ACTIVE() should be replaced
> > with "ACTIVE() && !QEMU_ACTIVE()".
>
> Correct.  ACTIVE() is used ~100 times, and in the vast majority of cases, its use
> applies to a live QEMU/KVM session. In the few circumstances that it doesn't, then
> ACTIVE_QEMU() should be applied so that it's obvious to the maintainer (me), what
> the issue is.

to a live QEMU/KVM session, and/or to any other live-and-remote session, please see
below.

> Who know what "live" mechanism may come about in the future that
> may also have its own quirks?  I don't want to hide it, but rather make it strikingly
> obvious.

Ah, but this is another story.

I mean... OK, as 00/10 says, my vague/distant goal is teach /usr/bin/crash to use
gdb-remote protocol to debug the live guests. And in this case ACTIVE_QEMU() makes
a lot of sense. Say, cmd_bt() can use it to get the registers/trace even if the
process is running, pause/resume the guest, etc.

But all the LOCAL_ACTIVE changes in 1-7 do not care about the details of "live"
mechanism at all. So I still think we need a generic helper which should be true
if local-and-active. Or, vice versa, remote-and-active, this doesn't matter.

> > 	--- a/kernel.c
> > 	+++ b/kernel.c
> > 	@@ -2900,7 +2900,7 @@ back_trace(struct bt_info *bt)
> > 			return;
> > 		}
> >
> > 	-	if (ACTIVE() && !INSTACK(esp, bt)) {
> > 	+	if (LOCAL_ACTIVE() && !INSTACK(esp, bt)) {
> > 			sprintf(buf, "/proc/%ld", bt->tc->pid);
> > 			if (!file_exists(buf, NULL))
> > 				error(INFO, "task no longer exists\n");
> >
> > The usage of ACTIVE() is obviously wrong if this is the live (so that ACTIVE()
> > is true) but remote kernel. We should not even try to look at /proc files on
> > the local system in this case.
>
> Correct.  So restrict it meaningfully (to me anyway).

So you suggest to change this patch to do

		if (ACTIVE() && !ACTIVE_QEMU() && !INSTACK(...))

To me this simply looks worse, but I won't insist. But note that if we ever have
anothe ACTIVE_SOMETHING() source, we will need to modify this code again. While
this code do not care about qemu/something at all. So I still think we need a new
helper which doesn't depend on qemu or whatever else.

> > Or perhaps you mean that ACTIVE_QEMU() should be defined as
> >
> > 	#define ACTIVE_QEMU()	(pc->flags2 & QEMU_LIVE)
> >
> > ? iow, it should not imply ACTIVE() ? This would be even worse, in this case we
> > would neet to replace almost every ACTIVE() with "ACTIVE() || ACTIVE_QEMU()".
>
> I agree that there are a handful of circumstances that you have run into where
> ACTIVE() may not apply, such as the case where /proc was accessed.  But I don't
> understand why you say "almost every" instance?

Ah, sorry for confusion. I meant, If we add ACTIVE_QEMU() it should imply
ACTIVE(), otherwise we have even more problems.

> Why?  If the target is live, then all of the above should be called as-is.  Each
> of them returns if the target is a dumpfile.

Yes, sure, see above. If ACTIVE_QEMU() plugin sets LIVE_SYSTEM flag too, most users
of ACTIVE() are fine.

> > OK, lets suppose we add this feature... How do you think the command line should
> > look?
> >
> > I mean, after this series we can do, say,
> >
> > 	./crash vmlinux raw:DUMP_1 at OFFSET_1,DUMP_2 at OFFSET_2
> >
> > if we have 2 ramdump's which should be used together. How do you think the new
> > syntax should look? I am fine either way.
>
> I guess I've got some basic misunderstandings here...
>
> If it's a live system, why is necessary to specify RAM offsets?

I suspect we will need offsets in more complex situations, qemu can have multiple
memory-backend-file/numa options. And perhaps even a single file may need it, not
sure.

> And if you're just emulating the ramdump facility by first dumping the guest's memory
> into a dumpfile, why isn't it just a ramdump clone?

Sorry, can't understand... could you spell?

Oleg.