[Crash-utility] crash-7.3.2 very long list iteration progressively increasing memory usage

Wed Jun 27 14:31:39 UTC 2018

On 06/27/2018 09:02 AM, David Mair wrote:
> On 06/26/2018 12:01 PM, Dave Kleikamp wrote:
>> On 06/26/2018 11:15 AM, Dave Anderson wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> On 06/26/2018 10:40 AM, David Wysochanski wrote:
>>>>> On Tue, 2018-06-26 at 11:27 -0400, Dave Anderson wrote:
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> On Tue, 2018-06-26 at 15:34 +0100, Jeremy Harris wrote:
>>>>>>>> On 06/26/2018 03:29 PM, David Wysochanski wrote:
>>>>>>>>> On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
>>>>>>>>>> Yes, by default all list entries encountered are put in the built-in
>>>>>>>>>> hash queue, specifically for the purpose of determining whether there
>>>>>>>>>> are duplicate entries.  So if it's still running, it hasn't found any.
>>>>>>>>>>
>>>>>>>>>> To avoid the use of the hashing feature, try entering "set hash off"
>>>>>>>>>> before kicking off the command.  But of course if it finds any, it
>>>>>>>>>> will loop forever.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ah ok yeah I forgot about the built-in list loop detection!
>>>>>>>>
>>>>>>>> For a storage-less method of list loop-detection: run two walkers
>>>>>>>> down the list, advancing two versus one elements.  If you ever
>>>>>>>> match the same element location after starting, you have a loop.
>>>>>>>
>>>>>>> I agree some algorithm [1] without a hash table may be better
>>>>>>> especially for larger lists.
>>>>>>
>>>>>> I'll await your patch...
>>>>>>
>>>>>
>>>>> Do you see any advantage to keeping the hash table for loop detection
>>>>> or would you accept a patch that removes it completely in favor of a
>>>>> another algorithm?
>>>>
>>>> Could the same algorithm be modified so that it can slow down after a
>>>> certain number of list members, say maybe saving only every 10th element
>>>> to the hash (but checking every new one)?
>>>
>>> I've seen list corruption where a list containing hundreds of entries
>>> contains an entry that points back to another entry in the list that
>>> came hundreds of entries before it.
>>
>> Right, but if we didn't detect the first duplicate, we should still see
>> one in the next 10 entries. Once the list repeats, it should repeat
>> identically.
> 
> Why choose the longer loop step size value at code creation? Make it
> user-configurable with a minimum of 2 (longer than 1) and force the
> short step size in code to be 1. Document the method of selection for
> the long step size but make that documentation include enough to
> understand the effect (and purpose). Then let people perform their own
> experiments based on the sizes of the lists they believe they are
> working with.

I was just throwing the idea out there and wanted to keep it simple at
first. If this is a feasible solution we can improve upon it.