[Linux-cluster] GFS load average and locking

Wed Mar 8 19:06:10 UTC 2006

There is a condition (known) where locks are not being 
released as they should be. In a forthcoming patch, there
is a tunable parameter which allows the purging of unused,
yet retained locks by a percentage. I've tested this under
conditions which affect my ststem and it was rock solid 
afterwards. At the time I tested it, you had to make the 
change after the system was up and running (ie, not a config 
setting). Hopefully this will make it into update 7.

Regards,

Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stanley, Jon
Sent: Wednesday, March 08, 2006 1:54 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] GFS load average and locking

I have a 7 node GFS cluster, plus 3 lock servers (RH AS3U5, GULM
locking) that do not mount the filesystem.  I have a problem whereby the load
average on the system is extremely high (occasionally astronomical),
eventually leading to a complete site outage, via inability to access the
shared filesystem.  I have a couple questions about the innards of GFS that I
would be most grateful for someone to
answer:

The application is written in PHP, and the PHP sessioning is handled via the
GFS filesystem as well, if that's important.

1)  I notice that I have a lot of processes in uninterruptible sleep.
When I attached strace to one of these processes, I obviously found it doing
nothing for a period of ~30-60 seconds.  An excerpt of the strace (using -r)
follows:

     0.001224
stat64("/media/files/global/2/6/26c4f61c69117d55b352ce328babbff4.jpg",
{st_mode=S_IFREG|0644, st_size=9072, ...}) = 0
     0.000251
open("/media/files/global/2/6/26c4f61c69117d55b352ce328babbff4.jpg",
O_RDONLY) = 5
     0.000108 mmap2(NULL, 9072, PROT_READ, MAP_PRIVATE, 5, 0) = 0xaf381000
     0.000069 writev(4, [{"HTTP/1.1 200 OK\r\nDate: Wed, 08 M"..., 318},
{"\377\330\377\340\0\20JFIF\0\1\2\0\0d\0d\0\0\377\354\0\21"..., 9072}],
2) = 9390
     0.000630 close(5)                  = 0
     0.000049 munmap(0xaf381000, 9072)  = 0
     0.000052 rt_sigaction(SIGUSR1, {0x81ef474, [], SA_RESTORER|SA_INTERRUPT,
0x1b2eb8}, {SIG_IGN}, 8) = 0
     0.000068 read(4, 0xa239b3c, 4096)  = ? ERESTARTSYS (To be
restarted)
     6.546891 --- SIGALRM (Alarm clock) @ 0 (0) ---
     0.000119 close(4)                  = 0

What it looks like is it hangs out in read() for a period of time, thus
leading to the uninterruptible sleep.  This particular example was 6 seconds,
however it seems that the time is variable.  The particular file in this
instance is not large, only 9k.

I've never seen ERESTARTSYS before, and some googling tells me that it's
basically telling the kernel to interrupt the current syscall in order to
handle a signal (SIGALRM in this case, which I'm not sure the function of).
I could be *way* off base here - I'm not a programmer by any stretch of the
imagination.

2)  The locking statistics seems to be a huge mystery.  The lock total
doesn't seem to correspond to the number of open files that I have (I hope!).
Here's the output of a 'cat /proc/gulm/lockspace - I can't imagine that I
have 300,000+ files open on this system at this point - when are the locks
released, or is this even an indication of how many locks that are active at
the current time?  What does the 'pending'
number mean?

[svadmin at s259830hz1sl01 gulm]$ cat lockspace

lock counts:
  total: 369822
    unl: 176518
    exl: 1555
    shd: 191501
    dfr: 0
pending: 5
   lvbs: 2000
   lops: 21467433

[svadmin at s259830hz1sl01 gulm]$

Thanks for any help that anyone can provide on this!

Thanks!
-Jon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster