[Linux-cluster] Lock contention on BerkleyDB lock file
Elias, Michael
EliasM at dnb.com
Tue Mar 17 05:21:56 UTC 2009
Hi all,
We are trying to investigate an incident where rogue multiple instances
of Jboss caused possible lock contention on a BerkleyDB lock file.
This is a 5 node cluster, 3 nodes using a BerkelyDB on a GFS partition.
Basic sequence of events;
Node1 appears to be the lock master for the BerkleyDB lock file
Node2 starts 4 addition JBoss servers (under investigation to how/why
they started) that are in an uninterruptible sleep state. Attempted
killing the process with kill -9, but not killable as is usual with the
dreaded "D" state.
Node 1 we stop the JBoss running on that server which seem to release
any contention on Node 2, we are able to kill all the JBoss process. We
then restart the JBoss server on Node 1.
Node 2 we removed the node from the cluster to release any dlm locks it
is holding. Stopping the following in this order rgmanger, gfs, clvmd,
fenced, cman.
Node 2, we restarted the cluster, cman, fenced, clvmd, gfs.. Once
starting GFS we start receiving dlm_lock errors and would not mount. At
that point dlm seemed to make gfs unavailable on nodes 1,2,and 3.
At that point we recycled nodes 1,2,and3 and they came up clean.
So questions;
1. In the log info below "ex plock -11"; I believe ex=exclusive
plock=posixlock but what does "-11" mean?
2. Why would stopping JBoss on node1 release the contention on
node2?
3. Why would node 2 complain about dlm_lock when restarting all the
cluster services and lockup the cluster
4. And not necessarily for this group, is it possible for JBoss to
start random servers due to some sort of lock contention.
Thanks for any help that is offered.
Michael
The log info below is from node2 and is before any corrective action was
taken. It is the state when the additional JBoss processes were running.
/proc/cluster/lock_dlm/debug
/proc/cluster/lock_dlm/debug:14043 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14043 ex plock -11
/proc/cluster/lock_dlm/debug:14037 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14037 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14037 ex plock -11
/proc/cluster/lock_dlm/debug:14036 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14036 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14036 ex plock -11
/proc/cluster/lock_dlm/debug:14039 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14039 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14039 ex plock -11
/proc/cluster/lock_dlm/debug:22414 en punlock 7,f1021d
/proc/cluster/lock_dlm/debug:14038 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14038 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14038 ex plock -11
/proc/cluster/lock_dlm/debug:14041 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14041 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14041 ex plock -11
/proc/cluster/lock_dlm/debug:14041 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14041 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14041 ex plock -11
/proc/cluster/lock_dlm/debug:14040 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14040 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14040 ex plock -11
/proc/cluster/lock_dlm/debug:14040 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14040 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14040 ex plock -11
/proc/cluster/lock_dlm/debug:14037 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14037 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14037 ex plock -11
/proc/cluster/lock_dlm/debug:14040 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14040 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14040 ex plock -11
/proc/cluster/lock_dlm/debug:14041 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14041 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14041 ex plock -11
/proc/cluster/lock_dlm/debug:14040 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14040 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14040 ex plock -11
/proc/cluster/lock_dlm/debug:14038 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14038 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14038 ex plock -11
/proc/cluster/lock_dlm/debug:14044 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14044 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14044 ex plock -11
/proc/cluster/lock_dlm/debug:14040 en plock 7,25101ee
/proc/cluster/lock_dlm/debug:14040 global conflict -11 0-0 ex 1 own 1,
pid 1u
/proc/cluster/lock_dlm/debug:14040 ex plock -11
That inode hade the following in DLM_locks
Resource 00000100b833c508 (parent 0000000000000000). Name (len=24) "
2 25101ee"
Local Copy, Master is node 1
Granted Queue
154f0156 PR 14040 Master: 8fe80123
Conversion Queue
Waiting Queue
Resource 00000101da9a9598 (parent 0000000000000000). Name (len=24) "
5 25101ee"
Master Copy
Granted Queue
1229013e PR 14040
098f0154 PR 19040 Remote: 5 d1fa036e
190c02f1 PR 6587 Remote: 1 12cb023f
8612028c PR 15187 Remote: 3 00160179
Conversion Queue
Waiting Queue
Resource 00000101c5d28908 (parent 0000000000000000). Name (len=24) "
11 25101ee"
Master Copy
Granted Queue
28630181 NL 24849 Remote: 1 8ca6012e
277703b2 NL 32360
86f603bc NL 15187 Remote: 3 0011006d
Conversion Queue
Waiting Queue
Resource 00000100eab6c9d8 (parent 0000000000000000). Name (len=24) "
7 25101ee"
Master Copy
LVB: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Granted Queue
a7dd00b2 EX 21994 0-0 Remote: 1 8858013a
95a603ef NL 16953
86fd00a7 NL 15187 Remote: 3 000e005e
748702cf NL 8385 Remote: 1 a25e0322
Conversion Queue
Waiting Queue
The inode "25101ee" points to the following file:
38863342 -rw-r--r-- 1 nobody nobody 0 Jan 18 13:19 je.lck
nobody 13892 34.1 64.1 8238260 5138044 ? Sl Mar01 6328:11
/usr/java/jdk1.5.0_16/bin/java -Dprogram.name=run_yyy1.sh -server
-Xms6144m -Xmx6144m -XX:MaxPermSize=256m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Djava.endorsed.dirs=/xxxusr1/jboss/jboss-4.0.5.GA/lib/endorsed
-classpath
/xxxusr1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/jdk1.5.0_16/lib/tool
s.jar org.jboss.Main -c yyy1 -Dbinding-ports=ports-default
-Djboss.partition.name=PROD1-yyy1
nobody 21803 0.0 53.8 8223868 4311272 ? D 17:59 0:00
/usr/java/jdk1.5.0_16/bin/java -Dprogram.name=run_yyy1.sh -server
-Xms6144m -Xmx6144m -XX:MaxPermSize=256m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Djava.endorsed.dirs=/xxxusr1/jboss/jboss-4.0.5.GA/lib/endorsed
-classpath
/xxxusr1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/jdk1.5.0_16/lib/tool
s.jar org.jboss.Main -c yyy1 -Dbinding-ports=ports-default
-Djboss.partition.name=PROD1-yyy1
nobody 21808 0.0 53.8 8223868 4311496 ? D 17:59 0:00
/usr/java/jdk1.5.0_16/bin/java -Dprogram.name=run_yyy1.sh -server
-Xms6144m -Xmx6144m -XX:MaxPermSize=256m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Djava.endorsed.dirs=/xxxusr1/jboss/jboss-4.0.5.GA/lib/endorsed
-classpath
/xxxusr1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/jdk1.5.0_16/lib/tool
s.jar org.jboss.Main -c yyy1 -Dbinding-ports=ports-default
-Djboss.partition.name=PROD1-yyy1
nobody 21809 0.0 53.8 8223868 4311524 ? D 17:59 0:00
/usr/java/jdk1.5.0_16/bin/java -Dprogram.name=run_yyy1.sh -server
-Xms6144m -Xmx6144m -XX:MaxPermSize=256m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Djava.endorsed.dirs=/xxxusr1/jboss/jboss-4.0.5.GA/lib/endorsed
-classpath
/xxxusr1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/jdk1.5.0_16/lib/tool
s.jar org.jboss.Main -c yyy1 -Dbinding-ports=ports-default
-Djboss.partition.name=PROD1-yyy1
nobody 22414 0.0 55.2 8232092 4419488 ? D 18:03 0:00
/usr/java/jdk1.5.0_16/bin/java -Dprogram.name=run_yyy1.sh -server
-Xms6144m -Xmx6144m -XX:MaxPermSize=256m
-Dsun.rmi.dgc.client.gcInterval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000
-Djava.endorsed.dirs=/xxxusr1/jboss/jboss-4.0.5.GA/lib/endorsed
-classpath
/xxxusr1/jboss/jboss-4.0.5.GA/bin/run.jar:/usr/java/jdk1.5.0_16/lib/tool
s.jar org.jboss.Main -c yyy1 -Dbinding-ports=ports-default
-Djboss.partition.name=PROD1-yyy1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090317/2139cae9/attachment.htm>
More information about the Linux-cluster
mailing list