[Linux-cluster] GFS load average and locking

Treece, Britt Britt.Treece at savvis.net
Fri Mar 10 14:25:48 UTC 2006


Wendy,

Did the sysrq-t's that I sent illustrate this problem further?  I'm
hoping that they corroborate the situation that you described below.

Britt


-----Original Message-----
From: Wendy Cheng [mailto:wcheng at redhat.com] 
Sent: Thursday, March 09, 2006 9:30 PM
To: Treece, Britt
Cc: linux clustering; Stanley, Jon
Subject: RE: [Linux-cluster] GFS load average and locking

On Thu, 2006-03-09 at 17:04 -0600, Treece, Britt wrote:

> Is Redhat aware of any issues with GFS and flock syscalls?  

Just checked kernel source and got a rough idea what could go wrong. In
RHEL 3 (linux 2.4 based) kernel, flock has the following logic:

1. lock_kernel (Big Kernel Lock - BKL)
2. call filesystem-specific supplemental lock
3. handle linux vfs flock
4. unlock_kernel

There are two issues here:

* performance

Step 2 is a noop for most of the local filesystems (e.g. ext3) and the
code path of step 3 is relatively short. So you won't see much impacts
of BKL. For GFS, if step 2 is run concurrently (as in other cases such
as read, write, etc), it is reasonably "fast" unless you need the lock
for the very same file and/or the lock network traffic is congested.
However, adding BKL on top of that would have a big impact - it
virtually serializes *every* flock attempt. 

* deadlock

I'm a little bit fuzzy how Linux's BKL is implemented. In theory, the
above sequence would get into deadlock (unless when process goes to
sleep, it'll drop BKL), regardless whether step 2 is a noop or not. Will
ask our base kernel folks about this.

In any case, I think we need to remove that BKL if we can. At the mean
time, to work around this issue, you have to either:

* use previous mentioned PHP patch to turn off flock if you can; or
* get GFS U7 RPMs where we have two tuning parameters that could speed
up the lock process. However, I don't have quantitative data at this
moment to know how effective they'll be in this kind of situation.


-- Wendy






More information about the Linux-cluster mailing list