[Linux-cluster] Cluster node1 rebooted itself

Kalam, Imran Imran.Kalam at auspost.com.au
Sun Nov 11 22:48:30 UTC 2012


Hi Digimer.

Below are the information from the second node log file and configuration is on its way. Thanks

Nov 11 00:12:47 qdiskd[6704]: <notice> Writing eviction notice for node 1
Nov 11 00:12:47 kernel: CMAN: removing node node1hb from the cluster : Killed by another node
Nov 11 00:12:49 qdiskd[6704]: <notice> Node 1 evicted
Nov 11 00:12:55 fenced[6771]: node1hb not a cluster member after 8 sec post_fail_delay
Nov 11 00:12:55 fenced[6771]: fencing node "node1hb"
Nov 11 00:14:00 ccsd[6603]: Attempt to close an unopened CCS descriptor (5462880).
Nov 11 00:14:00 ccsd[6603]: Error while processing disconnect: Invalid request descriptor
Nov 11 00:14:00 fenced[6771]: fence "node1hb" success
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Trying to acquire journal lock...
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Looking at journal...
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Acquiring the transaction lock...
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replaying journal...
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replayed 4 of 4 blocks
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: replays = 4, skips = 0, sames = 0
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Journal replayed in 1s
Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Done
Nov 11 00:14:07 clurgmgrd[6833]: <info> Magma Event: Membership Change
Nov 11 00:14:07 clurgmgrd[6833]: <info> State change: node1hb DOWN
Nov 11 00:16:59 kernel: CMAN: node node1hb rejoining
Nov 11 00:17:08 clurgmgrd[6833]: <info> Magma Event: Membership Change
Nov 11 00:17:08 clurgmgrd[6833]: <info> State change: node1hb UP

-----Original Message-----
From: Digimer [mailto:lists at alteeve.ca] 
Sent: Monday, 12 November, 2012 9:36 AM
To: linux clustering
Cc: Kalam, Imran
Subject: Re: [Linux-cluster] Cluster node1 rebooted itself

It's hard to make much of a guess given that your cluster configuration
is unknown. That said, it would seem that something interrupted comms.
What is in the syslog of node 2 at the same time period? can you share
you cluster.conf please (obfuscating only passwords)?

On 11/11/2012 05:32 PM, Kalam, Imran wrote:
> Hi All.
>  
> I have 2 node GFS cluster running RHAS4 update 5 kernel 2.6.9-55.ELsmp.
> On Sunday morning the node1 (master) has rebooted itself and I could
> only see the following in the message log file. Has anyone experienced
> the same problem? Please let me know if you need more information. Thanks
>  
> Nov 11 00:12:47 kernel: CMAN: Being told to leave the cluster by node 2
> Nov 11 00:12:47 kernel: CMAN: we are leaving the cluster.
> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown
> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown
> Nov 11 00:12:47 kernel: SM: 00000002 sm_stop: SG still joined
> Nov 11 00:12:47 kernel: SM: 01000003 sm_stop: SG still joined
> Nov 11 00:12:47 kernel: SM: 02000007 sm_stop: SG still joined
> Nov 11 00:12:47 kernel: SM: 03000004 sm_stop: SG still joined
> Nov 11 00:12:47 clurgmgrd[6872]: <warning> #67: Shutting down uncleanly
> Nov 11 00:12:47 ccsd[6613]: Cluster manager shutdown.  Attemping to
> reconnect...
> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
> refused
> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
> descriptor
> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
> descriptor
> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-21).
> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
> Nov 11 00:12:48 ccsd[6613]: Error while processing disconnect: Invalid
> request descriptor
> Nov 11 00:12:48 clurgmgrd: [6872]: <info> unmounting
> /dev/mapper/vg_shared-lv00 (/opt/xxshare)
> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
> refused
> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
> refused
> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
> descriptor
> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
>  
>  
> *Regards*
> Imran Kalam
> Technical Specialist
> Post IT
> Corporate Services
> Australia Post
> Level 2, 185 Rosslyn St. West Melbourne
> Phone: (03) 9322 0382
> Fax: 9204 7303
> Mob: 0439 559 461
>  
> A
>  
>  
>  
> 
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or
> visit our website.
> 
> The information contained in this email communication may be
> proprietary, confidential or legally professionally privileged. It is
> intended exclusively for the individual or entity to which it is
> addressed. You should only read, disclose, re-transmit, copy,
> distribute, act in reliance on or commercialise the information if you
> are authorised to do so. Australia Post does not represent, warrant or
> guarantee that the integrity of this email communication has been
> maintained nor that the communication is free of errors, virus or
> interference.
> 
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper
> copy of this message. Any views expressed in this email communication
> are taken to be those of the individual sender, except where the sender
> specifically attributes those views to Australia Post and is authorised
> to do so.
> 
> Please consider the environment before printing this email.
> 
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Linux-cluster mailing list