[Linux-cluster] OOM failures with GFS, NFS and Samba on a cluster with RHEL3-AS

Mon Jan 24 18:43:29 UTC 2005

/proc/meminfo:
         total:    used:    free:  shared: buffers:  cached:
Mem:  4189741056 925650944 3264090112        0 18685952 76009472
Swap: 2146787328        0 2146787328
MemTotal:      4091544 kB
MemFree:       3187588 kB
MemShared:           0 kB
Buffers:         18248 kB
Cached:          74228 kB
SwapCached:          0 kB
Active:         107232 kB
ActiveAnon:      50084 kB
ActiveCache:     57148 kB
Inact_dirty:      1892 kB
Inact_laundry:   16276 kB
Inact_clean:     16616 kB
Inact_target:    28400 kB
HighTotal:     3276544 kB
HighFree:      3164096 kB
LowTotal:       815000 kB
LowFree:         23492 kB
SwapTotal:     2096472 kB
SwapFree:      2096472 kB
Committed_AS:    72244 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

When a bunch of locks become free, lowmem seems to recover somewhat. 
However, shutting down lock_gulmd entirely does NOT return lowmem to 
what it probably should be (though I'm not sure if the system is just 
keeping all of that memory cached until something else needs it or not).

jonathan

Jonathan Woytek wrote:
> Michael Conrad Tadpol Tilstra wrote:
> 
>> On Sun, Jan 23, 2005 at 01:45:28PM -0500, Jonathan Woytek wrote:
>>
>>> Additional information:
>>>
>>> I enabled full output on lock_gulmd, since my dead top sessions would 
>>> often show that process near the top of the list around the time of 
>>> crashes.  The machine was rebooted around 10:50AM, and was down again at 
>>
>>
>>
>> Not suprising that lock_gulmd is working hard when gfs is under heavy
>> use.  Its it busy processing all those lock requests.  What would be
>> more useful from gulm for this than the logging messages, is to query
>> the locktable every so often for its stats.
>> `gulm_tool getstats <master>:lt000`
>> The 'locks = ###' line is how many lock structures are current held.
>> gulm is very greedy about memory, and you are running the lock servers
>> on the same nodes you're mounting from.
> 
> 
> Here are the stats from the master lock_gulmd lt000:
> 
> I_am = Master
> run time = 9436
> pid = 2205
> verbosity = Default
> id = 0
> partitions = 1
> out_queue = 0
> drpb_queue = 0
> locks = 20356
> unlocked = 17651
> exclusive = 15
> shared = 2690
> deferred = 0
> lvbs = 17661
> expired = 0
> lock ops = 107354
> conflicts = 0
> incomming_queue = 0
> conflict_queue = 0
> reply_queue = 0
> free_locks = 69644
> free_lkrqs = 60
> used_lkrqs = 0
> free_holders = 109634
> used_holders = 20366
> highwater = 1048576
> 
> 
> Something keeps eating away at lowmem, though, and I still can't figure 
> out what exactly it is.
> 
> 
>> also, just to see if I read the first post right, you have
>> samba->nfs->gfs?
> 
> 
> If I understand your arrows correctly, I have a filesystem mounted with 
> GFS that I'm sharing via NFS to another machine that is sharing it via 
> Samba.  I've closed that link, though, to try to eliminate that as a 
> problem.  So now I'm serving the GFS filesystem directly through Samba.
> 
> jonathan
> 

-- 
Jonathan Woytek                 w: 412-681-3463         woytek+ at cmu.edu
NREC Computing Manager          c: 412-401-1627         KB3HOZ
PGP Key available upon request