[Crash-utility] kmem: WARNING: cannot find mem_map page for address

Bruce Korb bruce.korb at gmail.com
Mon Dec 17 23:30:38 UTC 2012


Hi Dave,

On 12/17/12 11:23, Dave Anderson wrote:
>>> Right -- I would never expect error() to be called while inside
>>> an open_tmpfile() operation.  Normally the behind-the-scenes data
>>> is parsed, and if anything is to be displayed while open_tmpfile()
>>> is still in play, it would be fprint()'ed using pc->saved_fp.
>>
>> I think the aesthetically pleasing solution is an "i_am_playing_with_tmpfile()"
>> call that says it isn't closed and crash functions shouldn't be using it.
>> Plus a parallel "i_am_done_with_tmpfile()" that gets implied by "close_tmpfile()".
>> I can supply a patch, if you like.  Probably with less verbose function names.
> 
> If pc->tmpfile is non-NULL, then open_tmpfile() is in use.  What would be
> the purpose of the extra functions?

It would be to allow the client code that is processing that temp file to emit
warning/info messages without disrupting the reading of that file pointer.
To me, that doesn't seem unreasonable.  You run some code that emits output
to a temp file and you reprocess those data.  You surely do not want such
messages showing up in the file you are re-processing.  And you cannot
call close_tmpfile() because it calls ftruncate().

So, what is your recommendation for how to reprocess diverted output
wherein you might occasionally want to say something during that reprocessing?

Three solutions come to mind:

1. Juggle file pointers before and after the __error() function call (please say, "No.")
2. Create my own temporary file and fiddle the global "fp" and "pc" state so it
    gets used while I am gathering data and crash code doesn't know about it later.
    (I insist the answer must be, "No." because there is too much fiddling with
    intricate crash state.)
3. These two functions that I am suggesting:

void
resume_tmpfile(void)
{
	int ret ATTRIBUTE_UNUSED;

        if (pc->tmpfile)
                error(FATAL, "recursive temporary file usage\n");

	if (!pc->tmp_fp)
        	error(FATAL, "temporary file not ready\n");

	rewind(pc->tmp_fp);
	pc->tmpfile = pc->tmp_fp;
	pc->saved_fp = fp;
	fp = pc->tmpfile;
}

void
sequester_tmpfile(void)
{
	int ret ATTRIBUTE_UNUSED;

	if (pc->tmpfile) {
		fflush(pc->tmpfile);
		rewind(pc->tmpfile);
		pc->tmpfile = NULL;
		fp = pc->saved_fp;
	} else
		error(FATAL, "trying to sequester an unopened temporary file\n");
}

I sequester the file after doing the data gathering and resume it
after I am done reprocessing it.  It might be worth putting in a little jig
to ensure that open/close_tmpfile work reasonably, too.  (I would guess
that either would cancel the sequestration.)

>>> I'm not sure, other than it doesn't seem to be able to find
>>> ffffea001bb1d1e8
>>
>> I was able to figure that out.  I also printed out the "kmem -v" table and
>> sorted the result.  The result with "kmem -n"
>>
>> [...]
>> 66  ffff88087fffa420  ffffea0000000000  ffffea0007380000  2162688
>> 67  ffff88087fffa430  ffffea0000000000  ffffea0007540000  2195456
>> 132608  ffff88083c9bdb98  ffff88083c9bdd98  ffff8840e49bdd98  4345298944
>> 132609  ffff88083c9bdba8  ffff88083c9796c0  ffff8840e4b396c0  4345331712
>> ;...]
>>
>> viz. it ain't there.  Which is quite interesting, because if the lustre
>> cluster file system structure "cfs_trace_data" actually pointed off into
>> unmapped memory, it would have fallen over long, long before the point
>> where it did fall over.
> 
> I don't see the vmemmap range in the "kmem -v" output.  It is mapped
> kernel memory, but AFAIK it's not kept in the kernel's "vmlist" list.
> Do you see that range in your "kmem -v" output? 

Also no.  "kmem -v" and "kmem -n" both show the same memory mappings
(as best as _my_ memory serves, that is.  For certain, neither has a mapping
for 0xffffea001bb1d1e8.)

> OK so you say you cannot get the mappings for it, but what 
> does "vtop 0xffffea001bb1d1e8" show?

This:

> crash> vtop 0xffffea001bb1d1e8
> VIRTUAL           PHYSICAL        
> ffffea001bb1d1e8  879b1d1e8       
> 
> PML4 DIRECTORY: ffffffff817e7000
> PAGE DIRECTORY: 87fdf7067
>    PUD: 87fdf7000 => 87fdf6067
>    PMD: 87fdf66e8 => 8000000879a001e3
>   PAGE: 879a00000  (2MB)
> 
>       PTE         PHYSICAL   FLAGS
> 8000000879a001e3  879a00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)

But given:

> Sorry -- that's irrelevant.  You want to access the physical
> memory that the odd vmemmap page address references (not the
> physical page behind the page structure itself). 

Exactly right.  I need to be able to see the binary bits for that page so I can
pull them in and write them back out to a file of just those bits.  From there,
we'll be formatting a text file showing the lustre trace log.

Thank you so much!  Regards, Bruce




More information about the Crash-utility mailing list