[Crash-utility] crash-7.3.2 very long list iteration progressively increasing memory usage

Tue Jun 26 16:42:35 UTC 2018

On Tue, 2018-06-26 at 11:59 -0400, Dave Anderson wrote:
> 
> ----- Original Message -----
> > On Tue, 2018-06-26 at 11:27 -0400, Dave Anderson wrote:
> > > 
> > > ----- Original Message -----
> > > > On Tue, 2018-06-26 at 15:34 +0100, Jeremy Harris wrote:
> > > > > On 06/26/2018 03:29 PM, David Wysochanski wrote:
> > > > > > On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
> > > > > > > Yes, by default all list entries encountered are put in the
> > > > > > > built-in
> > > > > > > hash queue, specifically for the purpose of determining whether
> > > > > > > there
> > > > > > > are duplicate entries.  So if it's still running, it hasn't found
> > > > > > > any.
> > > > > > > 
> > > > > > > To avoid the use of the hashing feature, try entering "set hash
> > > > > > > off"
> > > > > > > before kicking off the command.  But of course if it finds any, it
> > > > > > > will loop forever.
> > > > > > > 
> > > > > > 
> > > > > > Ah ok yeah I forgot about the built-in list loop detection!
> > > > > 
> > > > > For a storage-less method of list loop-detection: run two walkers
> > > > > down the list, advancing two versus one elements.  If you ever
> > > > > match the same element location after starting, you have a loop.
> > > > 
> > > > I agree some algorithm [1] without a hash table may be better
> > > > especially for larger lists.
> > > 
> > > I'll await your patch...
> > > 
> > 
> > Do you see any advantage to keeping the hash table for loop detection
> > or would you accept a patch that removes it completely in favor of a
> > another algorithm?
> 
> For maintenance sake, it's probably worth keeping the hash queue option
> in place, primarily since there are a few dozen other internal facilities
> besides the user "list" command that use the do_list() function.
> 

Ok.  I was actually thinking about replacing all the callers of
hq_enter() with a new algorithm.  There are 32 of them so it would be a
non-trivial patch though longer term may be simpler maintenance. 
Probably only now it matters more since it's much more common to have
vmcores 100's of GB and much larger lists but maybe some of these it
would not be useful replacement.

I haven't read far enough whether there are instances where dumping out
the hash table would be more valuable or if it's solely for loop
detection.

Also it's unclear to me that there would be an even distribution into
'buckets' - HQ_INDEX() does not do much other than a simple mod so it's
possible the slowdown I saw was due to uneven distribution of the hash
function and we could improve that there.  Still it would be good to
have an option of no memory usage for larger lists.

FWIW, the memory usage does not seem to be freed or at least I could
not find it yet.

> Dave
> 
> 
> > 
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> > 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility