[Crash-utility] [PATCH v2] Add read diagnostics to crash

Tue Jan 10 21:09:46 UTC 2012

Hi Dave,

On 01/10/2012 12:43 PM, Dave Anderson wrote:
>
>
> ----- Original Message -----
>> Hi Dave,
>>
>> On 01/10/2012 09:24 AM, Dave Anderson wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> Re-posting the patch as an attachment. I'm sorry for the mess on the
>>>> first pass.
>>>>
>>>> --
>>>> David.
>>>
>>> Hi Dave,
>>>
>>> Your patch has merit, but I'm going to pare it down a bit.  I use CRASHDEBUG(8)
>>> fairly frequently, but with your changes as-is, it's far too verbose.  It's also
>>> a bit redundant -- here's an example of a CRASHDEBUG(8) kdump read
>>> currently:
>>>
>>>     crash>   rd ffff88003c1a5cc0
>>>     <addr: ffff88003c1a5cc0 count: 1 flag: 490 (KVADDR)>
>>>     <readmem: ffff88003c1a5cc0, KVADDR, "64-bit KVADDR", 8, (FOE), 7fff0096b5a8>
>>>         addr: ffff88003c1a5cc0  paddr: 3c1a5cc0  cnt: 8
>>>     ffff88003c1a5cc0:  0000000000000000                    ........
>>>          text hit rate: 65% (3505 of 5378)
>>>     crash>
>>>
>>> And then with your patch:
>>>
>>>     crash>   rd ffff88003c1a5cc0
>>>     <addr: ffff88003c1a5cc0 count: 1 flag: 490 (KVADDR)>
>>>     <readmem: ffff88003c1a5cc0, KVADDR, "64-bit KVADDR", 8, (FOE), 7fff1c97b6e8>
>>>         addr: ffff88003c1a5cc0  paddr: 3c1a5cc0  cnt: 8
>>>     rd: read_kdump for: virt=ffff88003c1a5cc0 phys=3c1a5cc0
>>>     rd: read_kdump trying read_netdump for: virt=ffff88003c1a5cc0 phys=3c1a5cc0
>>>     rd: read_netdump(-1, 7fff1c97b6e8, 8, ffff88003c1a5cc0, 3c1a5cc0)
>>>     rd: read KDUMP or NETDUMP_ELF64 for multi-segment: 3c195fb8
>>>     rd: seek and read offset: 3c195fb8
>>>     ffff88003c1a5cc0:  0000000000000000                    ........
>>>          text hit rate: 65% (3505 of 5378)
>>>     crash>
>>>
>>> The "<readmem: ...>" and subsequent "addr: ..." lines come from readmem().
>>> I don't understand the need for the next 3 lines, as they are completely
>>> redundant:
>>
>> Except that one intention was to show program flow (failures that didn't
>> happen) as well as continuation of program state.
>>
>> *	read_kdump() entered
>> rd: read_kdump for: virt=ffff88003c1a5cc0 phys=3c1a5cc0
>>
>> *	None of the conditional fatal error exits in read_kdump()
>> rd: read_kdump trying read_netdump for: virt=ffff88003c1a5cc0
>> phys=3c1a5cc0
>>
>> *	read_netdump() entered
>> rd: read_netdump(-1, 7fff1c97b6e8, 8, ffff88003c1a5cc0, 3c1a5cc0)
>>
>> *	switch (DUMPFILE_FORMAT(nd->flags)) VIA case KDUMP_ELF64 show calculated offset
>> rd: read KDUMP or NETDUMP_ELF64 for multi-segment: 3c195fb8
>>
>> *	Not the if (FLAT_FORMAT()) read case rd: seek and read offset: 3c195fb8
>>
>>>     rd: read_kdump for: virt=ffff88003c1a5cc0 phys=3c1a5cc0
>>>     rd: read_kdump trying read_netdump for: virt=ffff88003c1a5cc0 phys=3c1a5cc0
>>>     rd: read_netdump(-1, 7fff1c97b6e8, 8, ffff88003c1a5cc0, 3c1a5cc0)
>>> I don't argue with the displays of the resultant file offsets, but I may move
>>> them to CRASHDEBUG(9):
>>
>> I wouldn't object and can re-post a CRASHDEBUG(9) version myself if
>> you'd prefer.
>
> But there is still no need to redundantly display the virtual and physical
> addresses.  And the displays of the calculated file offsets, zero-fill, etc.
> will still end up showing the complete function sequence by putting the
> function name in the output.  For example, the updated readmem() output would
> show a call to read_kdump(), but the file offset display would show that it
> has transitioned to read_netdump(), so there's no need for any debug output
> at all in read_kdump().

Well, read_kdump() in the case of a non-hyper-mode XEN dump has code 
that has the appearance of a route of change for paddr, i.e. the 
following doesn't result in no change or in P2M_FAILURE:

paddr = xen_kdump_p2m(paddr)

The code I posted can show two unique paths through read_kdump() but if 
as you say, you log calling it with the physical address known and log 
in read_netdump() with the physical address included then you get the 
same result.

Also, are there any cases of overlapped/threaded execution of reads? If 
not then removal of redundant output is wise but the virt/phys addr 
would identify which thread of execution each line refers to among 
overlapped output in most cases.

>>
>>>     rd: read KDUMP or NETDUMP_ELF64 for multi-segment: 3c195fb8
>>>     rd: seek and read offset: 3c195fb8
>>>
>>> And the same goes for compressed kdumps:
>>>
>>>     crash>   rd ffff81005286e0c0
>>>     <addr: ffff81005286e0c0 count: 1 flag: 490 (KVADDR)>
>>>     <readmem: ffff81005286e0c0, KVADDR, "64-bit KVADDR", 8, (FOE), 7fff11d16958>
>>>         addr: ffff81005286e0c0  paddr: 5286e0c0  cnt: 8
>>>     ffff81005286e0c0:  0000000000000000                    ........
>>>          text hit rate: 66% (239 of 357)
>>>     crash>
>>>
>>> With the patch, aside from the "pfn" displays, it's redundant
>>> information:
>>
>> Except for the implied flow and continuation of state. I've been asked
>> to look into alleged crash bugs that were no more than corrupt or
>> incomplete dumps that required either hexdump style deciphering of the
>> structure or a method of crash exposing its own deciphering of the
>> structure because the program starts and exits with no more than an
>> error code that can't always identify a unique point of failure and
>> you'd need to find the cause in source anyway. For example, a case where
>> an incomplete crash dump has page-present flags showing the page not
>> present for kernel data structures (every single page present flag is
>> zero in-fact). IOW, no crash bug because kdump created an incomplete
>> crash dump.
>>
>>>     crash>   rd ffff81005286e0c0
>>>     <addr: ffff81005286e0c0 count: 1 flag: 490 (KVADDR)>
>>>     <readmem: ffff81005286e0c0, KVADDR, "64-bit KVADDR", 8, (FOE),
>>>     7fffd3ca4478>
>>>         addr: ffff81005286e0c0  paddr: 5286e0c0  cnt: 8
>>>     rd: read_diskdump(-1, 7fffd3ca4478, 8, ffff81005286e0c0,
>>>     5286e0c0), pfn=5286e
>>>     rd: Buffering page with pfn=5286e and current physaddr=5286e000
>>>     ffff81005286e0c0:  0000000000000000                    ........
>>>          text hit rate: 66% (239 of 357)
>>>     crash>
>>>
>>> For starters, I think the patch can be simplified by modifying the "addr: ..."
>>> line that is displayed by readmem() just prior to issuing it to the configured
>>> pc->readmem function so that it shows the receiving function.  So that line,
>>> which may be displayed several times if the read crosses a page boundary, could
>>> be reworked to show something like this:
>>>
>>>     crash>   rd ffff81005286e0c0
>>>     <addr: ffff81005286e0c0 count: 1 flag: 490 (KVADDR)>
>>>     <readmem: ffff81005286e0c0, KVADDR, "64-bit KVADDR", 8, (FOE),
>>>     7fffd3ca4478>
>>>     <read_netdump: addr: ffff81005286e0c0 paddr: 5286e0c0 cnt: 8
>>>     ...
>>>
>>> and then a line or two from the relevant read functions would show
>>> the file offset that was resolved from the physical address.
>>>
>>> Also, the new function that translates the pc->readmem function address to a
>>> string for display could also be used in a single location in main.c instead of
>>> having to put debug statements in each location where where pc->readmem is
>>> initialized.
>>
>> Isn't there at least one case, e.g. x86_64_xen_kdump_p2m_create() that
>> replaces and restores the pc->readmem value at runtime?
>
> That is correct -- it's a one-time deal.  I'll leave a debug statement
> in place for those two cases (x86 and x86_64).
>
>>
>>> So let me play around with this a bit...
>>
>> You are welcome too but I would be happy to take responsibility and
>> re-work it too. If you do go ahead with a re-work then please consider
>> my goal of monitoring program flow.
>
> I will absolutely keep that in mind -- I'll post the patch for your review
> before checking it in.

I appreciate the effort and consideration.

Thanks,
David.