[Linux-cluster] LOCK_DLM Performance under Fire

Wed Apr 6 03:47:39 UTC 2005

On Tue, Apr 05, 2005 at 05:35:01PM -0700, Peter Shearer wrote:

> ext3 on local disk, the test app takes about 3 min 20 sec to complete.
> ext3 on GNBD exported disk (one node only, obviously); completes in
> about 3 min 35 sec.
> GFS on GNBD mounted with the localflocks option; completes in 5 min 30
> sec.
> GFS on GNBD mounted using LOCK_DLM with only one server mounting the fs;
> completes in 50 min 45 sec.
> GFS on GNBD mounted using LOCK_DLM with two servers mounting the fs;
> went over 80 min and wasn't even half done.

It sounds like the app is using fcntl (posix) locks, not flock(2)?
If so, that's a weak spot for lock_dlm which translates posix-lock
requests into multiple dlm lock operations.

That said, it's possible the code may be doing some dumb things that
could be fixed to improve the speed.  If there are hundreds of files
being locked, one simple thing to try is to increase SHRINK_CACHE_COUNT
and SHRINK_CACHE_MAX in lock_dlm.h (sorry, never made them tunable
through proc.)  This relates to some basic caching lock_dlm does for
files that are repeatedly locked/unlocked.

If the app could get by with just using flock() that would certainly be
much faster.  Also, if you could provide the test you use or a simplified
equivalent it would help.

-- 
Dave Teigland  <teigland at redhat.com>