[Linux-cluster] Lost token - every 5 minutes: [TOTEM] The token was lost. Samba process possible cause?

Bevan Broun Bevan.Broun at ardec.com.au
Fri Jun 27 06:03:35 UTC 2008


Hi All

I have a 2 node RHEL-5.1 cluster. A quorum disk is configured.
The hosts have 4 NICs. These are bonded:
(eth0+eth2) -> bond0
(eth1+eth3) -> bond1
Unfortunately I was not able to use a dedicated interface for cluster communications - bond1 is being used. This is where I think Im in trouble.

The cluster has been configured using IP addressess. I did have to use http://archives.free.net.ph/message/20080130.074958.5c7a211c.en.html
as the hostname is related to the bond0 IP.

I have not defined the interface to be used by the cluster, just relying on the IP address configured.
The cluster's purpose is 2 GFS file systems.

The cluster was configured and working for 4 days before there was problems.

I now have almost constant lost of token message in /var/log/message. They are almost exactly 5 minutes apart. A typical bit of messages file is show below my sig.

Just before the problem started a samba message shows nmdb becomming local master browser for a work group on the interface used for cluster communications.

Jun 20 13:39:27 HOST1 nmbd[24506]: [2008/06/20 13:39:27, 0] nmbd/nmbd_become_lmb.c:become_loca
l_master_stage2(396)
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   Samba name server NBM1 is now a local master browser for
workgroup SMS_DOMAIN on subnet 162.16.96.229
Jun 20 13:39:27 HOST1 nmbd[24506]:
Jun 20 13:39:27 HOST1 nmbd[24506]:   *****
Jun 20 13:43:27 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.

"cman_tool status" shows both nodes and looks normal. Looks like clmvd is not happy, df commands are hanging.

Could nmdb be causing this token loss? Any ideas on how to proceed?

(names and IPs have been changed).

Thanks

Bevan Broun
Solutions Architect
Ardec International
http://www.ardec.com.au
http://www.lisasoft.com
http://www.terrapages.com
Sydney
-----------------------
Suite 112,The Lower Deck
19-21 Jones Bay Wharf
Pirrama Road, Pyrmont 2009
Ph:  +61 2 8570 5000
Fax: +61 2 8570 5099



Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Receive multicast socket recv buffer size (28800
 0 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Transmit multicast socket send buffer size (2621
 42 bytes).
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering GATHER state from 2.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Saving state aru 16 high seq received 16
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce34
 Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce38
 Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce3c
 Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep.
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce40
 Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering COMMIT state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering RECOVERY state.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [0] member 162.16.96.229:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [1] member 162.16.96.230:
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Did not need to originate any messages in recove
 ry.
Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Sending initial ORF token
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] CLM CONFIGURATION CHANGE
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] New Configuration:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.229)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ]    r(0) ip(162.16.96.230)
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Left:
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] Members Joined:
 Jun 20 13:49:06 HOST1 openais[15265]: [SYNC ] This node is within the primary component and wi
 ll provide service.
 Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering OPERATIONAL state.
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 162.16.96.229
 Jun 20 13:49:06 HOST1 openais[15265]: [CLM  ] got nodejoin message 162.16.96.230
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 2
 Jun 20 13:49:06 HOST1 openais[15265]: [CPG  ] got joinlist message from node 1
 Jun 20 13:53:38 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state.

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.




More information about the Linux-cluster mailing list