[Crash-utility] [PATCH 1/1 V2] crash: initial note of excluded page structures

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Fri Jan 10 06:10:56 UTC 2014


On 2014/01/10 7:23:32, crash-utility-bounces at redhat.com wrote:
> On Thu, Jan 09, 2014 at 04:11:40PM -0500, Dave Anderson wrote:
> > > > > > > Version 2
> > > > > > > - Moves the warning to this point:
> > > > > > >       ...
> > > > > > >       This GDB was configured as "x86_64-unknown-linux-gnu"...
> > > > > > > 
> > > > > > >       WARNING: All unused vmemmap page structures are excluded from
> > > > > > >       this
> > > > > > >       dump.
> > > > > > >                This will cause failures of the kmem command if it
> > > > > > >                attempts to
> > > > > > >                walk any list of free pages (with options -f -F -i -s
> > > > > > >                and
> > > > > > >                -S).
> > > > > > > 
> > > > > > >         SYSTEM MAP: /boot/System.map-2.6.32-cpw
> > > > > > >       ...
> > > > > > > 
> > > > > > > - Drop patch 2, which added warnings to individual kmem options.
> > > > > > > 
> > > > > > > - Feel free to change the wording of the warning.
> > > > > > > 
> > > > > > > And this patch is contingent upon the acceptance of a change to the
> > > > > > > makedumpfile command.
> > > > > > >    http://marc.info/?l=kexec&m=138853299130125&w=2
> > > > > > >   
> > > > > > > 
> > > > > > > If makedumpfile excludes unused page structures it will flag that
> > > > > > > fact in the dump header.
> > > > > > > (There are about 3.67 million pages full of page structures for
> > > > > > >  every tera byte of system memory.  The great bulk of those
> > > > > > >  page structures are not needed.)
> > > > > > > Their exclusion is a makedumpfile option.
> > > > > > > 
> > > > > > > Crash will display a note during initialization if such structures
> > > > > > > have been excluded.  Crash commands that walk page freelists, for
> > > > > > > example, will fail. So the note will help the user understand why.
> > > > > > > 
> > > > > > > Signed-off-by: Cliff Wickman <cpw at sgi.com>
> > > > > > > ---
> > > > > > >  defs.h     |    1 +
> > > > > > >  diskdump.c |    3 +++
> > > > > > >  diskdump.h |    1 +
> > > > > > >  main.c     |    6 ++++++
> > > > > > >  4 files changed, 11 insertions(+)
> > > > > > > 
> > > > > > > Index: crash-7.0.4/diskdump.c
> > > > > > > ===================================================================
> > > > > > > --- crash-7.0.4.orig/diskdump.c
> > > > > > > +++ crash-7.0.4/diskdump.c
> > > > > > > @@ -749,6 +749,9 @@ restart:
> > > > > > >  				dd->valid_pages[i]++;
> > > > > > >  	}
> > > > > > >  
> > > > > > > +	if (header->status & DUMP_DH_EXCLUDED_VMEMMAP)
> > > > > > > +		pc->flags2 |= VMEXCLUDED;
> > > > > > > +
> > > > > > >          return TRUE;
> > > > > > >  
> > > > > > >  err:
> > > > > > > Index: crash-7.0.4/diskdump.h
> > > > > > > ===================================================================
> > > > > > > --- crash-7.0.4.orig/diskdump.h
> > > > > > > +++ crash-7.0.4/diskdump.h
> > > > > > > @@ -84,6 +84,7 @@ struct kdump_sub_header {
> > > > > > >  #define DUMP_DH_COMPRESSED_ZLIB    0x1   /* page is compressed with
> > > > > > >  zlib
> > > > > > >  */
> > > > > > >  #define DUMP_DH_COMPRESSED_LZO     0x2   /* page is compressed with
> > > > > > >  lzo
> > > > > > >  */
> > > > > > >  #define DUMP_DH_COMPRESSED_SNAPPY  0x4   /* page is compressed with
> > > > > > >  snappy
> > > > > > >  */
> > > > > > > +#define DUMP_DH_EXCLUDED_VMEMMAP   0x8   /* unused vmemmap pages are
> > > > > > > excluded */
> > > > > > >  
> > > > > > >  /* descriptor of each page for vmcore */
> > > > > > >  typedef struct page_desc {
> > > > > > > Index: crash-7.0.4/defs.h
> > > > > > > ===================================================================
> > > > > > > --- crash-7.0.4.orig/defs.h
> > > > > > > +++ crash-7.0.4/defs.h
> > > > > > > @@ -505,6 +505,7 @@ struct program_context {
> > > > > > >  #define VMCOREINFO    (0x400ULL)
> > > > > > >  #define ALLOW_FP      (0x800ULL)
> > > > > > >  #define REM_PAUSED_F (0x1000ULL)
> > > > > > > +#define VMEXCLUDED   (0x2000ULL)
> > > > > > >  #define REMOTE_PAUSED() (pc->flags2 & REM_PAUSED_F)
> > > > > > >  	char *cleanup;
> > > > > > >  	char *namelist_orig;
> > > > > > > Index: crash-7.0.4/main.c
> > > > > > > ===================================================================
> > > > > > > --- crash-7.0.4.orig/main.c
> > > > > > > +++ crash-7.0.4/main.c
> > > > > > > @@ -662,6 +662,12 @@ main_loop(void)
> > > > > > >  	} else
> > > > > > >  		SIGACTION(SIGINT, restart, &pc->sigaction, NULL);
> > > > > > >  
> > > > > > > +	if (pc->flags2 & VMEXCLUDED)
> > > > > > > +                fprintf(fp,
> > > > > > > +        "WARNING: All unused vmemmap page structures are excluded
> > > > > > > from
> > > > > > > this dump.\n"
> > > > > > > +	"         This will cause failures of the kmem command if it
> > > > > > > attempts
> > > > > > > to\n"
> > > > > > > +	"         walk any list of free pages (with options -f -F -i -s and
> > > > > > > -S).\n\n");
> > > > > > > +
> > > > > > >          /*
> > > > > > >           *  Display system statistics and current context.
> > > > > > >           */
> > > > > > > 
> > > > > > 
> > > > > > This patch looks reasonable.
> > > > > > 
> > > > > > BTW, what happens if you enter "kmem <address>" alone with no option?
> > > > > > It should fail as well, no?
> > > > > 
> > > > > Yes, it does.
> > > > > 
> > > > > crash> kmem 133360
> > > > > kmem: page excluded: kernel virtual address: ffffea0000004350  type:
> > > > > "page.lru.next"
> > > > > 
> > > > > crash> kmem ffff8897baae7540
> > > > > CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS
> > > > > SSIZE
> > > > > ffff88c7bec70040 task_struct             2656       4578      5361   1787
> > > > > 8k
> > > > > SLAB              MEMORY            TOTAL  ALLOCATED  FREE
> > > > > ffff8897baae6040  ffff8897baae6080      3          2     1
> > > > > FREE / [ALLOCATED]
> > > > >   [ffff8897baae7540]
> > > > > 
> > > > > kmem: page excluded: kernel virtual address: ffffea0000007028  type: "first list entry"
> > > > > 
> > > > > Another way to search a freelist, that I didn't notice.
> > > > > Do you want to add that to the warning?
> > > >  
> > > > I'm thinking that there might be others as well.  Like how does "kmem -p" handle
> > > > the missing page structs?
> > > 
> > > kmem -p gives the appearance of working, but is unable to read the
> > > mem_map pages.  It is giving false results.
> > > This is an example of what you feared.
> > > 
> > > If we do this:
> > > 
> > > ---
> > >  memory.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > Index: crash-7.0.4/memory.c
> > > ===================================================================
> > > --- crash-7.0.4.orig/memory.c
> > > +++ crash-7.0.4/memory.c
> > > @@ -5657,7 +5657,7 @@ fill_mem_map_cache(ulong pp, ulong ppend
> > >  	 *  Try to read it in one fell swoop.
> > >   	 */
> > >  	if (readmem(pp, KVADDR, page_cache, SIZE(page) * PGMM_CACHED,
> > > -      	    "page struct cache", RETURN_ON_ERROR|QUIET))
> > > +      	    "page struct cache", FAULT_ON_ERROR))
> > >  		return;
> > >  
> > >  	/*
> > > 
> > > Then we see the immediate failure:
> > > 
> > > crash> kmem -p
> > >       PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
> > > kmem: page excluded: kernel virtual address: ffffea0000004000  type: "page
> > > struct cache"
> > > 
> > > What should be done about that?
> > 
> > I'd leave it alone -- at least it doesn't fail miserably like the other commands.
> > 
> > > I wonder why the reads of the mem_map were done with RETURN_ON_ERROR|QUIET
> > > to begin with?
> > 
> > It's done to guard against a vmemmap boundary (real page structs vs. unmapped
> > page structs) that happens to occur within the 512-page-size chunk of memory 
> > that's being cached.  So first it tries to read the full chunk in one readmem(),
> > but if that fails, it breaks up the read requests, zeroing out the unmapped pages. 
> > 
> > Dave
> 
> Okay.  I'll just make issue a more general warning like:
> 
> WARNING: All unused vmemmap page structures are excluded from this dump.
>          This will cause failures in any command that accesses page
>          structures of pages that are not included in the dump.  This
>          is particularly likely when using several options of the
>          kmem command.
> 
> If that looks reasonable to you.

Dave, the excluding vmemmap obviously affect a user's investigation,
is this really acceptable for you ?

There is no chance to re-capture the same dump image,
I think we should be more carefully about filtering out
since it's an irreversible change.
How many users want to get such a broken dump image even if
it could be gotten faster ?

Why we capture dump images, it's for analyzing, of course.
At least, we should supply alternatives to the affected commands.


Thanks
Atsushi Kumagai

> -Cliff
> > 
> > > 
> > > -Cliff
> > > 
> > > > 
> > > > It might be better to just generally warn that results may be unpredictable
> > > > because any command that accesses page structures, such as kmem, may fail.
> > > > By listing specific options, it makes it sound like that they are the only
> > > > ones, so it may be better to leave it open-ended.
> > > > 
> > > > Dave
> > > 
> > > --
> > > Cliff Wickman
> > > SGI
> > > cpw at sgi.com
> > > (651) 683-3824
> > > 
> 
> -- 
> Cliff Wickman
> SGI
> cpw at sgi.com
> (651) 683-3824
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility




More information about the Crash-utility mailing list