[Crash-utility] crash-7.3.2 very long list iteration progressively increasing memory usage

Wed Jun 27 14:02:27 UTC 2018

On 06/26/2018 12:01 PM, Dave Kleikamp wrote:
> On 06/26/2018 11:15 AM, Dave Anderson wrote:
>>
>>
>> ----- Original Message -----
>>> On 06/26/2018 10:40 AM, David Wysochanski wrote:
>>>> On Tue, 2018-06-26 at 11:27 -0400, Dave Anderson wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>> On Tue, 2018-06-26 at 15:34 +0100, Jeremy Harris wrote:
>>>>>>> On 06/26/2018 03:29 PM, David Wysochanski wrote:
>>>>>>>> On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
>>>>>>>>> Yes, by default all list entries encountered are put in the built-in
>>>>>>>>> hash queue, specifically for the purpose of determining whether there
>>>>>>>>> are duplicate entries.  So if it's still running, it hasn't found any.
>>>>>>>>>
>>>>>>>>> To avoid the use of the hashing feature, try entering "set hash off"
>>>>>>>>> before kicking off the command.  But of course if it finds any, it
>>>>>>>>> will loop forever.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ah ok yeah I forgot about the built-in list loop detection!
>>>>>>>
>>>>>>> For a storage-less method of list loop-detection: run two walkers
>>>>>>> down the list, advancing two versus one elements.  If you ever
>>>>>>> match the same element location after starting, you have a loop.
>>>>>>
>>>>>> I agree some algorithm [1] without a hash table may be better
>>>>>> especially for larger lists.
>>>>>
>>>>> I'll await your patch...
>>>>>
>>>>
>>>> Do you see any advantage to keeping the hash table for loop detection
>>>> or would you accept a patch that removes it completely in favor of a
>>>> another algorithm?
>>>
>>> Could the same algorithm be modified so that it can slow down after a
>>> certain number of list members, say maybe saving only every 10th element
>>> to the hash (but checking every new one)?
>>
>> I've seen list corruption where a list containing hundreds of entries
>> contains an entry that points back to another entry in the list that
>> came hundreds of entries before it.
> 
> Right, but if we didn't detect the first duplicate, we should still see
> one in the next 10 entries. Once the list repeats, it should repeat
> identically.

Why choose the longer loop step size value at code creation? Make it
user-configurable with a minimum of 2 (longer than 1) and force the
short step size in code to be 1. Document the method of selection for
the long step size but make that documentation include enough to
understand the effect (and purpose). Then let people perform their own
experiments based on the sizes of the lists they believe they are
working with.

-- 
David.