[Linux-cluster] GFS load average and locking

Wed Mar 8 18:54:22 UTC 2006

I have a 7 node GFS cluster, plus 3 lock servers (RH AS3U5, GULM
locking) that do not mount the filesystem.  I have a problem whereby the
load average on the system is extremely high (occasionally
astronomical), eventually leading to a complete site outage, via
inability to access the shared filesystem.  I have a couple questions
about the innards of GFS that I would be most grateful for someone to
answer:

The application is written in PHP, and the PHP sessioning is handled via
the GFS filesystem as well, if that's important.

1)  I notice that I have a lot of processes in uninterruptible sleep.
When I attached strace to one of these processes, I obviously found it
doing nothing for a period of ~30-60 seconds.  An excerpt of the strace
(using -r) follows:

     0.001224
stat64("/media/files/global/2/6/26c4f61c69117d55b352ce328babbff4.jpg",
{st_mode=S_IFREG|0644, st_size=9072, ...}) = 0
     0.000251
open("/media/files/global/2/6/26c4f61c69117d55b352ce328babbff4.jpg",
O_RDONLY) = 5
     0.000108 mmap2(NULL, 9072, PROT_READ, MAP_PRIVATE, 5, 0) =
0xaf381000
     0.000069 writev(4, [{"HTTP/1.1 200 OK\r\nDate: Wed, 08 M"..., 318},
{"\377\330\377\340\0\20JFIF\0\1\2\0\0d\0d\0\0\377\354\0\21"..., 9072}],
2) = 9390
     0.000630 close(5)                  = 0
     0.000049 munmap(0xaf381000, 9072)  = 0
     0.000052 rt_sigaction(SIGUSR1, {0x81ef474, [],
SA_RESTORER|SA_INTERRUPT, 0x1b2eb8}, {SIG_IGN}, 8) = 0
     0.000068 read(4, 0xa239b3c, 4096)  = ? ERESTARTSYS (To be
restarted)
     6.546891 --- SIGALRM (Alarm clock) @ 0 (0) ---
     0.000119 close(4)                  = 0

What it looks like is it hangs out in read() for a period of time, thus
leading to the uninterruptible sleep.  This particular example was 6
seconds, however it seems that the time is variable.  The particular
file in this instance is not large, only 9k.

I've never seen ERESTARTSYS before, and some googling tells me that it's
basically telling the kernel to interrupt the current syscall in order
to handle a signal (SIGALRM in this case, which I'm not sure the
function of).  I could be *way* off base here - I'm not a programmer by
any stretch of the imagination.

2)  The locking statistics seems to be a huge mystery.  The lock total
doesn't seem to correspond to the number of open files that I have (I
hope!).  Here's the output of a 'cat /proc/gulm/lockspace - I can't
imagine that I have 300,000+ files open on this system at this point -
when are the locks released, or is this even an indication of how many
locks that are active at the current time?  What does the 'pending'
number mean?

[svadmin at s259830hz1sl01 gulm]$ cat lockspace

lock counts:
  total: 369822
    unl: 176518
    exl: 1555
    shd: 191501
    dfr: 0
pending: 5
   lvbs: 2000
   lops: 21467433

[svadmin at s259830hz1sl01 gulm]$

Thanks for any help that anyone can provide on this!

Thanks!
-Jon