[Linux-cluster] RE: GFS2 subdirectory hang
Eric.Johnson at mtsallstream.com
Thu Aug 27 18:16:03 UTC 2009
On Thu, 2009-08-27 at 09:25 -0500, Johnson, Eric wrote:
>> I have a 32-bit RHEL 5.3 Cluster Suite setup of two nodes with GFS2
>> systems on FC attached SAN. I have run into this issue twice now,
>> attempts to access a certain directory within one of the GFS2 file
>> systems never return. Other directories and paths within that file
>> system work just fine.
>> The first time it happened, I had to crash the node to get it to
>> the FS, then unmount it on both nodes, fsck it, remount it, and it
>> fine. It has happened again (different path, different file system).
>> simple "ls" in the directory (which has maybe 20 files in it) leaves
>> process in an uninterruptible sleep state. I left it all night and it
>> never returned.
>> I'm not sure what other info would be useful on this, but this is
>> see from a gfs2_tool lockdump output for ls PID on that node:
>> G: s:UN n:2/bf1df f:l t:SH d:EX/0 l:0 a:0 r:4
>> H: s:SH f:aW e:0 p:9938 [ls] gfs2_lookup+0x44/0x90 [gfs2]
> ^ The W flag indicates that this is waiting for a glock
>Currently the glock is in the UN (unlocked) state, and its trying to
>a SH (shared) lock. The next step in the investigation is to look for
>the same glock number 2/bf1df on the other nodes, and see what is
>holding that lock. This particular node will hang until the lock is
>released on whichever other node is holding it.
>If there is nothing on any other node apparently holding that lock in
>the glock dumps, then looking at dlm lock dumps would be the next step,
Thanks for the response, Steve. I found this reference to that lock on
the other node:
G: s:EX n:2/bf1df f:dy t:EX d:SH/0 l:0 a:0 r:4
I: n:1155192/782815 t:8 f:0x00000010
I'm having trouble finding documentation that describes what each of
these fields are. There's no obvious process ID here, and all I can
determine is that it's an exclusive lock.
More information about the Linux-cluster