[Linux-cluster] info on "A processor failed" message and fencing when going to single user mode
Gianluca Cecchi
gianluca.cecchi at gmail.com
Mon Oct 5 10:08:43 UTC 2009
Hello,
2 nodes cluster (virtfed and virtfedbis their names) with F11 x86_64
up2date as of today and without qdisk
cman-3.0.2-1.fc11.x86_64
openais-1.0.1-1.fc11.x86_64
corosync-1.0.0-1.fc11.x86_64
and kernel 2.6.30.8-64.fc11.x86_64
I was in a situation where both nodes up, after virtfedbis hust restarted
and starting a service
Inside one of its resources there is a loop where it tests availability of a
file and so it was in starting of this service, but infra ws up, as of this
messages:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.101)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.102)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] This node is within the
primary component and will provide service.
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] Members[1]:
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] 1
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.101)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:44:39 virtfed corosync[4684]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 5 11:44:39 virtfed kernel: dlm: closing connection to node 2
Oct 5 11:44:39 virtfed corosync[4684]: [MAIN ] Completed service
synchronization, ready to provide service.
So now they are at this condition, reported by virtfedbis
[root at virtfedbis ~]# clustat
Cluster Status for kvm @ Mon Oct 5 11:49:27 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
kvm1 1 Online,
rgmanager
kvm2 2 Online,
Local, rgmanager
Service Name Owner
(Last) State
------- ---- -----
------ -----
service:DRBDNODE1
kvm1 started
service:DRBDNODE2
kvm2 starting
I realize that I forgot a thing so that after 10 attempts DRBDNODE2 service
would not come up and so I decide to put
virtfedbis in single user mode, so that I run on it
shutdown 0
I would expect virtfedbis to leave cleanly the cluster, instead it is fenced
and rebooted (via fence_ilo agent)
On virtfed these are the messages:
Oct 5 11:49:49 virtfed corosync[4684]: [TOTEM ] A processor failed,
forming new configuration.
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.101)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.102)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] This node is within the
primary component and will provide service.
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] Members[1]:
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] 1
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0)
ip(192.168.16.101)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:49:54 virtfed corosync[4684]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Oct 5 11:49:54 virtfed corosync[4684]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 5 11:49:54 virtfed kernel: dlm: closing connection to node 2
Oct 5 11:49:54 virtfed fenced[4742]: fencing node kvm2
Oct 5 11:49:54 virtfed rgmanager[5496]: State change: kvm2 DOWN
Oct 5 11:50:26 virtfed fenced[4742]: fence kvm2 success
What I find on virtfedbis after restart in /var/log/cluster directory is
this:
corosync.log
Oct 05 11:49:49 corosync [TOTEM ] A processor failed, forming new
configuration.
Oct 05 11:49:49 corosync [TOTEM ] The network interface is down.
Oct 05 11:49:54 corosync [CLM ] CLM CONFIGURATION CHANGE
Oct 05 11:49:54 corosync [CLM ] New Configuration:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(127.0.0.1)
Oct 05 11:49:54 corosync [CLM ] Members Left:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(192.168.16.102)
Oct 05 11:49:54 corosync [CLM ] Members Joined:
Oct 05 11:49:54 corosync [QUORUM] This node is within the primary component
and will provide service.
Oct 05 11:49:54 corosync [QUORUM] Members[1]:
Oct 05 11:49:54 corosync [QUORUM] 1
Oct 05 11:49:54 corosync [CLM ] CLM CONFIGURATION CHANGE
Oct 05 11:49:54 corosync [CLM ] New Configuration:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(127.0.0.1)
Oct 05 11:49:54 corosync [CLM ] Members Left:
Oct 05 11:49:54 corosync [CLM ] Members Joined:
Oct 05 11:49:54 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Oct 05 11:49:54 corosync [CMAN ] Killing node kvm2 because it has rejoined
the cluster with existing state
I think there is something wrong in this behaviour....
This is a test cluster so I have no qdisk .....
Is this the cause inherent with my config that has:
<cman expected_votes="1" two_node="1"/>
<fence_daemon clean_start="1" post_fail_delay="0"
post_join_delay="20"/>
In general, if I do a shutdown -r now an one of the two nodes I have not
thsi kind of problems.....
Thanks for any insight,
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091005/ea47fc8e/attachment.htm>
More information about the Linux-cluster
mailing list