[Linux-cluster] optimising DLM speed?

Wed Feb 16 21:12:58 UTC 2011

 > Yes, ls -l will always take longer because it is not just accessing 
the directory, but also every inode in the directory. As a result the 
I/O pattern will generally be poor.

I know and accept that. It's common to most filesystems but the access 
time is particularly pronounced with GFS2 (presumably because of the 
added latencies)

The problem is that users don't see things from the same point of view, 
so there's a constant flow of complaints about "slow servers".

They think that holding down the number of files/directory is an 
unreasonable restriction - and in some cases (NASA/ESA archives) I can't 
even explain the reasons why as the people involved are unreachable.

This is despite quite documentable performance gains from breaking up 
large directories even on non-cluster filesystems - We saw a ls -lR 
speedup of around 700x when moving one directory structure from flat 
(130k files) to nested.

The same poor I/O pattern has a direct bearing on incremental backup 
speeds - backup software has to stat() a file (at minimum - SHA hash 
comparisons are even more overhead) to see if anything's changed, which 
means in large directories a backup may drop down to scan rates of 10 
files/second or lower and seldom exceeds 100 files/second at best.

(Bacula is pretty good about caching and issues a fadvise(notneeded) 
after each file is checked. I just wish other filesystem-crawling 
processes did the same)

 > I assume that once the directory has been read in once, that it 
acesses will be much faster on subsequent occasions,

Correct - but after 5-10 idle minutes the cached information is lost and 
the pattern repeats.

 > It is a historical issue that we have inherited from GFS and I've 
spent some time trying to come up with a solution in kernel space, but 
in the end, a userland solution may be a better way to solve it.

In the case of NFS clients, I'm seriously looking at trying to move to 
RHEL6 and use fscache - this should help reduce load a little but won't 
help for uncached directories.

If you have any suggestions on the [nfs export|client mount] side to try 
and help things I'm open to suggestions.