[Linux-cluster] Strange behavior(s) of DLM
David Teigland
teigland at redhat.com
Fri Aug 6 12:54:29 UTC 2004
On Wed, Aug 04, 2004 at 11:41:45PM -0400, Jeff wrote:
> The attached routine demonstrates some strange
> behavior in the DLM and it was responsible for the
> dmesg text at the end of this note.
>
> This is on a FC2, SMP box running cvs/latest version of
> cman and the dlm. Its a 2 CPU box configured with 4 logical
> CPUs.
>
> I have a two node cluster and the two machines are identical
> as far as I can tell with the exception of which order they are
> listed in the cluster config file.
>
> On node #1 (in the config file) when I run the attached test from
> two terminals the output looks reasonable. The same as it does if
> I run it on Tru64 or VMS (more or less).
>
> 8923: over last 10.000 seconds, grant 8922, blkast 0, cancel 0
> 18730: over last 9.001 seconds, grant 9807, blkast 0, cancel 0
> 28403: over last 9.001 seconds, grant 9673, blkast 0, cancel 0
>
> If you shut this down and start it up on node #2 (lx4) you start
> to get messages that look like:
> 91280: over last 10.000 seconds, grant 91279, blkast 0, cancel 0
> 125138: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
> 125138: NL Blocking Notification on lockid 0x00010312 (mode 0)
> 125138: NL Blocking Notification Rountine End ^^^^^^^^^^^^^^^^^^^^
> 141370: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
> 141371: NL Blocking Notification on lockid 0x00010312 (mode 0)
> 141371: NL Blocking Notification Rountine End ^^^^^^^^^^^^^^^^^^^^
> 141373: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
You're running the program on two nodes at once right? The line with "*"
is when I started the program on a second node, so it appears I get the
same thing. I don't get any assertion failure, though. That may be the
result of changes I've checked in for some other bugs over the past couple
days.
57150: over last 10.000 seconds, grant 57149, blkast 0, cancel 0
116825: over last 9.001 seconds, grant 59675, blkast 0, cancel 0
* 123790: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
123790: NL Blocking Notification on lockid 0x00010373 (mode 0)
123790: NL Blocking Notification Rountine End ^^^^^^^^^^^^^^^^^^^^
123822: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
123822: NL Blocking Notification on lockid 0x00010373 (mode 0)
123822: NL Blocking Notification Rountine End ^^^^^^^^^^^^^^^^^^^^
--
Dave Teigland <teigland at redhat.com>
More information about the Linux-cluster
mailing list