[Crash-utility] [PATCH 0/7] Upgrade the remote access to a xen domain.

Don Slutz dslutz at verizon.com
Tue Nov 19 19:03:09 UTC 2013


On 11/19/13 11:08, Dave Anderson wrote:
>
> ----- Original Message -----
>> From: Don Slutz <dslutz at verizon.com>
>>
>> Currently crash 4.0 and later using the code:
>>
>> http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html
>>
>> There is some very limited documention on crashes remote ptotocol in:
>>
>> http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html
>>
>> and some fixes in the attachment
>> 0001-xen-crashd-Connect-crash-with-domain.patch
>> from the 1st link above.
>>
>> How ever there are issues with the current code:
>>
>> 1) The remote protocol has a minor issue (which I have not been able to
>>     happen) based on the fact TCP/IP is stream based protocol.  This means
>>     a RECV or a SEND may not do the fully requested size of data.  In fact
>>     the current code assumes that the amount of data that a SEND is called
>>     with will all be read with a single RECV.
>>
>> 2) The most common mismatch between crash and older kernels is
>>     phys_base.  In the remote case, see if the remote server supports
>>     vitrual memory access, and if so, see if phys_base can be retreived.
>>
>> 3) crash assumes that the remote system is active and can not return
>>     currect IP and SP to do a better back trace.
>>
>> 4) enable a non-active mode of remote access.
>>
>> This patch set attempts to fix these:
>>
>> The fix for #1 I have called NIL mode (patch 1).
>> The fix for #2 uses get_remote_phys_base (patch 3).
>> The fix for #3 uses get_remote_regs (patch 5).
>> The fix for #4 uses special file /dev/xenmem (patch 7).
>>    It also REMOTE_NIL() to indicate "remote paused system".
>>
>> Don Slutz (7):
>>    Add NIL mode to remote.
>>    remote_proc_version: NULL terminate passed buffer on error.
>>    Add get_remote_phys_base.
>>    Add remote_vtop.
>>    bt: get remote live registers if possible.
>>    Add get_remote_cr3
>>    Add support for non-live remote.
>>
>>   defs.h   |   8 +-
>>   kernel.c |  25 +++-
>>   memory.c |  25 +++-
>>   remote.c | 462
>>   +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
>>   x86_64.c |  24 ++++
>>   5 files changed, 465 insertions(+), 79 deletions(-)
>>
>> --
>> 1.8.4
>   
> Hi Don,
>
> A few comments...
>
> I don't really understand what entity the code is communicating with,
> so for the most part, I'm primarily interested in making sure that your
> patches in no way can affect the normal usage of the crash utility.
>
> That being said, is the term "NIL mode" some kind of Xen-ism?
Nope, it is term I came up with to describe the protocol change.
>   It wasn't
> until I saw the display_sys_stats() function that I even understood that
> REMOTE_NIL mean "(remote paused system)".  Anyway, if you made it something
> like "REMOTE_PAUSED" or something like that, the code might make some sense
> to the untrained eye.
Will do.  Looking back at it, I ended up overloading the term. :(
>
> And it's not entirely clear to me what REMOTE_ACTIVE() means now?  It used
> to imply communicating with a remote crash daeamon that accessed the live
> system that the daemon was running on.  Has that meaning been changed to mean
> that crash is communicating with a daemon that's accessing a "live" Xen guest
> kernel?
The meaning is not as clear.  Since the existing versions of crash already have REMOTE_ACTIVE() and I wanted the Xen utility to be usable to them, in both cases the "live" Xen guest kernel is being accessed while it is paused.  I added REMOTE_PAUSED() to have a few changes to crash to correctly understand that the guest is paused. As part of this I add the fake file "/dev/xenmem".
>
> I'm confused about these two changes, where you re-patch your own patch:
>
> -        else
> +        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
> +               if (get_remote_regs(bt, &eip, &esp))
> +                       machdep->get_stack_frame(bt, &eip, &esp);
> +       } else
>    
> followed by:
>
> -        else if (REMOTE_ACTIVE() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
> +        else if (REMOTE_NIL() && ((bt->task == tt->this_task) || is_task_active(bt->task))) {
>                  if (get_remote_regs(bt, &eip, &esp))
>                          machdep->get_stack_frame(bt, &eip, &esp);
>          } else
Clearly I need to adjust the patch order.
> If I'm not mistaken, it would seem to be impossible for "(bt->task == tt->this_task)"
> to ever be true in the REMOTE_NIL() case?  As I recall tt->this_task used to be the
> PID of the remote daemon itself, which would be accessing the memory of its own kernel.
> So how could the pid of the remote daemon ever be the target of a back trace of a
> paused guest?  And so I don't understand why you first made an if clause for
> REMOTE_ACTIVE() to begin with -- whatever that means now -- and then changed it to
> REMOTE_NIL()?
This came about from some old changes from 2012 to a crash version 4.0 for other work.  I had not thought of /dev/xenmem at the time, and just copied the condition from above.  I see no reason not to drop "bt->task == tt->this_task".  Will check that the right actions are still done.
> In any case, since the ultimate change is only interested in REMOTE_NIL(),
> can you make it all less of an eyesore by encapulating its if case
> something like:
>
>          else if (SADUMP_DUMPFILE())
>                  get_sadump_regs(bt, &eip, &esp);
> +       else if (REMOTE_NIL()) {
> +               if (!((bt->task == tt->this_task) || is_task_active(bt->task)) ||
> +                   !get_remote_regs(bt, &eip, &esp))	
> +                       machdep->get_stack_frame(bt, &eip, &esp);
>          } else
>                  machdep->get_stack_frame(bt, &eip, &esp);
>
> Note above, for consistency's sake with the rest of the crash utility,
> can you make get_remote_regs() return non-zero when it successfully gets
> the registers, and zero if it fails?
Sure.
> And lastly, can you fix these?
>
> $ make warn
> ...
> remote.c: In function ‘validate_phys_base’:
> remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘physaddr_t’ [-Wformat]
> remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
> remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘physaddr_t’ [-Wformat]
> remote.c:2302:25: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 6 has type ‘physaddr_t’ [-Wformat]
> remote.c: In function ‘remote_vtop’:
> remote.c:2361:2: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
> remote.c:2367:4: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘physaddr_t’ [-Wformat]
> remote.c:2352:18: warning: variable ‘p3’ set but not used [-Wunused-but-set-variable]
> remote.c:2352:13: warning: variable ‘p2’ set but not used [-Wunused-but-set-variable]
> remote.c:2352:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
> remote.c: In function ‘get_remote_regs’:
> remote.c:2392:33: warning: unused variable ‘p6’ [-Wunused-variable]
> remote.c:2392:28: warning: variable ‘p5’ set but not used [-Wunused-but-set-variable]
> remote.c:2392:18: warning: variable ‘p3’ set but not used [-Wunused-but-set-variable]
> remote.c:2392:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
> remote.c: In function ‘get_remote_cr3’:
> remote.c:2449:13: warning: variable ‘p2’ set but not used [-Wunused-but-set-variable]
> remote.c:2449:8: warning: variable ‘p1’ set but not used [-Wunused-but-set-variable]
> ...
Yes.

> And I'm not interested in appending these to the 4 files that you changed:
>
> +/*
> + * Specify Emacs local variables so the formating
> + * of the code stays the same.
> + *
> + * Local variables:
> + * mode: C
> + * c-set-style: "BSD"
> + * c-basic-offset: 8
> + * indent-tabs-mode: t
> + * End:
> + */
>
> Thanks,
>    Dave
Ok, that is 2 votes to drop this, so I will.
     -Don Slutz




More information about the Crash-utility mailing list