[Linux-cluster] Strange behavior(s) of DLM

David Teigland teigland at redhat.com
Fri Aug 6 12:54:29 UTC 2004


On Wed, Aug 04, 2004 at 11:41:45PM -0400, Jeff wrote:
> The attached routine demonstrates some strange
> behavior in the DLM and it was responsible for the
> dmesg text at the end of this note.
> 
> This is on a FC2, SMP box running cvs/latest version of
> cman and the dlm. Its a 2 CPU box configured with 4 logical
> CPUs.
> 
> I have a two node cluster and the two machines are identical
> as far as I can tell with the exception of which order they are
> listed in the cluster config file.
> 
> On node #1 (in the config file) when I run the attached test from
> two terminals the output looks reasonable. The same as it does if
> I run it on Tru64 or VMS (more or less).
> 
>       8923: over last 10.000 seconds, grant 8922, blkast 0, cancel 0
>      18730: over last 9.001 seconds, grant 9807, blkast 0, cancel 0
>      28403: over last 9.001 seconds, grant 9673, blkast 0, cancel 0
> 
> If you shut this down and start it up on node #2 (lx4) you start
> to get messages that look like:
>      91280: over last 10.000 seconds, grant 91279, blkast 0, cancel 0
>     125138: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     125138: NL Blocking Notification on lockid 0x00010312 (mode 0)
>     125138: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^ 
>     141370: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     141371: NL Blocking Notification on lockid 0x00010312 (mode 0)
>     141371: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^ 
>     141373: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^


You're running the program on two nodes at once right?  The line with "*"
is when I started the program on a second node, so it appears I get the
same thing.  I don't get any assertion failure, though.  That may be the
result of changes I've checked in for some other bugs over the past couple
days.

     57150: over last 10.000 seconds, grant 57149, blkast 0, cancel 0
    116825: over last 9.001 seconds, grant 59675, blkast 0, cancel 0
*   123790: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
    123790: NL Blocking Notification on lockid 0x00010373 (mode 0)
    123790: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^
    123822: NL Blocking Routine Start ^^^^^^^^^^^^^^^^^^^^^^^^^^
    123822: NL Blocking Notification on lockid 0x00010373 (mode 0)
    123822: NL Blocking Notification Rountine End  ^^^^^^^^^^^^^^^^^^^^

-- 
Dave Teigland  <teigland at redhat.com>




More information about the Linux-cluster mailing list