[Crash-utility] crash-7.3.2 very long list iteration progressively increasing memory usage

Tue Jun 26 18:34:00 UTC 2018

----- Original Message -----
> On Tue, 2018-06-26 at 11:59 -0400, Dave Anderson wrote:
> > 
> > ----- Original Message -----
> > > On Tue, 2018-06-26 at 11:27 -0400, Dave Anderson wrote:
> > > > 
> > > > ----- Original Message -----
> > > > > On Tue, 2018-06-26 at 15:34 +0100, Jeremy Harris wrote:
> > > > > > On 06/26/2018 03:29 PM, David Wysochanski wrote:
> > > > > > > On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
> > > > > > > > Yes, by default all list entries encountered are put in the
> > > > > > > > built-in
> > > > > > > > hash queue, specifically for the purpose of determining whether
> > > > > > > > there
> > > > > > > > are duplicate entries.  So if it's still running, it hasn't
> > > > > > > > found
> > > > > > > > any.
> > > > > > > > 
> > > > > > > > To avoid the use of the hashing feature, try entering "set hash
> > > > > > > > off"
> > > > > > > > before kicking off the command.  But of course if it finds any,
> > > > > > > > it
> > > > > > > > will loop forever.
> > > > > > > > 
> > > > > > > 
> > > > > > > Ah ok yeah I forgot about the built-in list loop detection!
> > > > > > 
> > > > > > For a storage-less method of list loop-detection: run two walkers
> > > > > > down the list, advancing two versus one elements.  If you ever
> > > > > > match the same element location after starting, you have a loop.
> > > > > 
> > > > > I agree some algorithm [1] without a hash table may be better
> > > > > especially for larger lists.
> > > > 
> > > > I'll await your patch...
> > > > 
> > > 
> > > Do you see any advantage to keeping the hash table for loop detection
> > > or would you accept a patch that removes it completely in favor of a
> > > another algorithm?
> > 
> > For maintenance sake, it's probably worth keeping the hash queue option
> > in place, primarily since there are a few dozen other internal facilities
> > besides the user "list" command that use the do_list() function.
> > 
> 
> Ok.  I was actually thinking about replacing all the callers of
> hq_enter() with a new algorithm.  There are 32 of them so it would be a
> non-trivial patch though longer term may be simpler maintenance.
> Probably only now it matters more since it's much more common to have
> vmcores 100's of GB and much larger lists but maybe some of these it
> would not be useful replacement.

Duplicate list detection is not the only reason for the hq_enter() facility.
It has to stay in place, because after collecting a list, several hq_enter()
users need to read all of the entries back out for whatever post-processing
they need to do.  

So it can't just be a "repeal-and-replace"...   ;-)

Dave