[Linux-cluster] Cluster node1 rebooted itself

Digimer lists at alteeve.ca
Sun Nov 11 22:54:32 UTC 2012


Ya, certainly looks like a network problem.

If you have a support contract with Red Hat, you may want to bring them
in to have a more detailed review though. I am only guessing based on
what you've listed here.

Cheers

On 11/11/2012 05:48 PM, Kalam, Imran wrote:
> Hi Digimer.
> 
> Below are the information from the second node log file and configuration is on its way. Thanks
> 
> Nov 11 00:12:47 qdiskd[6704]: <notice> Writing eviction notice for node 1
> Nov 11 00:12:47 kernel: CMAN: removing node node1hb from the cluster : Killed by another node
> Nov 11 00:12:49 qdiskd[6704]: <notice> Node 1 evicted
> Nov 11 00:12:55 fenced[6771]: node1hb not a cluster member after 8 sec post_fail_delay
> Nov 11 00:12:55 fenced[6771]: fencing node "node1hb"
> Nov 11 00:14:00 ccsd[6603]: Attempt to close an unopened CCS descriptor (5462880).
> Nov 11 00:14:00 ccsd[6603]: Error while processing disconnect: Invalid request descriptor
> Nov 11 00:14:00 fenced[6771]: fence "node1hb" success
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Trying to acquire journal lock...
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Looking at journal...
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Acquiring the transaction lock...
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replaying journal...
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replayed 4 of 4 blocks
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: replays = 4, skips = 0, sames = 0
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Journal replayed in 1s
> Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Done
> Nov 11 00:14:07 clurgmgrd[6833]: <info> Magma Event: Membership Change
> Nov 11 00:14:07 clurgmgrd[6833]: <info> State change: node1hb DOWN
> Nov 11 00:16:59 kernel: CMAN: node node1hb rejoining
> Nov 11 00:17:08 clurgmgrd[6833]: <info> Magma Event: Membership Change
> Nov 11 00:17:08 clurgmgrd[6833]: <info> State change: node1hb UP
> 
> -----Original Message-----
> From: Digimer [mailto:lists at alteeve.ca] 
> Sent: Monday, 12 November, 2012 9:36 AM
> To: linux clustering
> Cc: Kalam, Imran
> Subject: Re: [Linux-cluster] Cluster node1 rebooted itself
> 
> It's hard to make much of a guess given that your cluster configuration
> is unknown. That said, it would seem that something interrupted comms.
> What is in the syslog of node 2 at the same time period? can you share
> you cluster.conf please (obfuscating only passwords)?
> 
> On 11/11/2012 05:32 PM, Kalam, Imran wrote:
>> Hi All.
>>  
>> I have 2 node GFS cluster running RHAS4 update 5 kernel 2.6.9-55.ELsmp.
>> On Sunday morning the node1 (master) has rebooted itself and I could
>> only see the following in the message log file. Has anyone experienced
>> the same problem? Please let me know if you need more information. Thanks
>>  
>> Nov 11 00:12:47 kernel: CMAN: Being told to leave the cluster by node 2
>> Nov 11 00:12:47 kernel: CMAN: we are leaving the cluster.
>> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown
>> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown
>> Nov 11 00:12:47 kernel: SM: 00000002 sm_stop: SG still joined
>> Nov 11 00:12:47 kernel: SM: 01000003 sm_stop: SG still joined
>> Nov 11 00:12:47 kernel: SM: 02000007 sm_stop: SG still joined
>> Nov 11 00:12:47 kernel: SM: 03000004 sm_stop: SG still joined
>> Nov 11 00:12:47 clurgmgrd[6872]: <warning> #67: Shutting down uncleanly
>> Nov 11 00:12:47 ccsd[6613]: Cluster manager shutdown.  Attemping to
>> reconnect...
>> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
>> refused
>> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
>> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
>> descriptor
>> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
>> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
>> descriptor
>> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-21).
>> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing disconnect: Invalid
>> request descriptor
>> Nov 11 00:12:48 clurgmgrd: [6872]: <info> unmounting
>> /dev/mapper/vg_shared-lv00 (/opt/xxshare)
>> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
>> refused
>> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate.  Refusing connection.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection
>> refused
>> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
>> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil.
>> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request
>> descriptor
>> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111).
>>  
>>  
>> *Regards*
>> Imran Kalam
>> Technical Specialist
>> Post IT
>> Corporate Services
>> Australia Post
>> Level 2, 185 Rosslyn St. West Melbourne
>> Phone: (03) 9322 0382
>> Fax: 9204 7303
>> Mob: 0439 559 461
>>  
>> A
>>  
>>  
>>  
>>
>> Australia Post is committed to providing our customers with excellent
>> service. If we can assist you in any way please telephone 13 13 18 or
>> visit our website.
>>
>> The information contained in this email communication may be
>> proprietary, confidential or legally professionally privileged. It is
>> intended exclusively for the individual or entity to which it is
>> addressed. You should only read, disclose, re-transmit, copy,
>> distribute, act in reliance on or commercialise the information if you
>> are authorised to do so. Australia Post does not represent, warrant or
>> guarantee that the integrity of this email communication has been
>> maintained nor that the communication is free of errors, virus or
>> interference.
>>
>> If you are not the addressee or intended recipient please notify us by
>> replying direct to the sender and then destroy any electronic or paper
>> copy of this message. Any views expressed in this email communication
>> are taken to be those of the individual sender, except where the sender
>> specifically attributes those views to Australia Post and is authorised
>> to do so.
>>
>> Please consider the environment before printing this email.
>>
>>
>>
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Linux-cluster mailing list