[Linux-cluster] fenced segfault
Sebastian Reitenbach
sebastia at l00-bugdead-prods.de
Mon Jul 16 14:37:12 UTC 2007
Hi,
I am still on openSUSE 10,2, x86_64, using openais-0.80.1-6 (rpm from source
rpm), and cluster-2.0.0. (self compiled). Kernel is Linux srv4
2.6.20.15-default #1 SMP Fri Jul 13 12:44:51 CEST 2007 x86_64 x86_64 x86_64
GNU/Linux
Now, when I run /etc/init.d/cman, for the first time, the fenced segfaults,
and the cman init script hangs and is waiting for the fenced. When I Ctrl-C
the init script, and kill the aisexec and the /sbin/ccsd, and then restart
the init script, then, after some minutes, the fenced is also starting and
the script ends with a "success".
the following are the logs while starting /etc/init.d/cman
Jul 16 16:19:49 srv4 ccsd[29691]: Starting ccsd 2.00.00:
Jul 16 16:19:49 srv4 ccsd[29691]: Built: Jul 13 2007 13:24:27
Jul 16 16:19:49 srv4 ccsd[29691]: Copyright (C) Red Hat, Inc. 2004 All
rights reserved.
Jul 16 16:19:49 srv4 ccsd[29691]: cluster.conf (cluster name = correo,
version = 1) found.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1204 version 0.80.1'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2002-2006
MontaVista Software, Inc and contributors.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2006 Red Hat,
Inc.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Using default multicast address
of 239.192.25.250
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cpg
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais cluster closed process group service v1.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cfg
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais configuration service'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_msg
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais message service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_lck
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais distributed locking service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evt
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais event service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_ckpt
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais checkpoint service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_amf
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais availability management framework B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_clm
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais cluster membership service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evs
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais extended virtual synchrony service'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cman
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service
handler 'openais CMAN membership service 2.01'
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Token Timeout (10000 ms)
retransmit timeout (495 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] token hold (386 ms) retransmits
before loss (20 retrans)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] join (60 ms) send_join (0 ms)
consensus (4800 ms) merge (200 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] downcheck (1000000 ms) fail to
recv const (50 msgs)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] seqno unchanged const (30
rotations) Maximum network MTU 1500
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] window size per rotation (50
messages) maximum messages per rotation (17 messages)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] send threads (0 threads)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token expired timeout (495
ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token problem counter (2000
ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP threshold (10 problem
count)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP mode set to none.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] heartbeat_failures_allowed (0)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] max_network_delay (50 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] HeartBeat is Disabled. To
enable set heartbeat_failures_allowed > 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Receive multicast socket recv
buffer size (262142 bytes).
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Transmit multicast socket send
buffer size (262142 bytes).
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] The network interface
[192.168.8.13] is now up.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Created or loaded sequence id
68.192.168.8.13 for this ring.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering GATHER state from 15.
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais extended virtual synchrony service'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais cluster membership service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais availability management framework B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais checkpoint service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais event service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais distributed locking service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais message service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais configuration service'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais cluster closed process group service v1.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service
handler 'openais CMAN membership service 2.01'
Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] CMAN 2.00.00 (built Jul 13 2007
13:24:30) started
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] Not using a virtual synchrony
filter.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service: started
and ready to provide service.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Creating commit token because I
am the rep.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Saving state aru 0 high seq
received 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering COMMIT state.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering RECOVERY state.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] position [0] member
192.168.8.13:
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] previous ring seq 68 rep
192.168.8.13
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] aru 0 high delivered 0 received
flag 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Did not need to originate any
messages in recovery.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Storing new sequence id for
ring 48
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Sending initial ORF token
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] CLM CONFIGURATION CHANGE
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] New Configuration:
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Left:
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Joined:
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary
component and will provide service.
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] CLM CONFIGURATION CHANGE
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] New Configuration:
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] r(0) ip(192.168.8.13)
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Left:
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] Members Joined:
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] r(0) ip(192.168.8.13)
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary
component and will provide service.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering OPERATIONAL state.
Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] quorum regained, resuming
activity
Jul 16 16:19:51 srv4 openais[29697]: [CLM ] got nodejoin message
192.168.8.13
Jul 16 16:19:51 srv4 ccsd[29691]: Initial status:: Quorate
Jul 16 16:19:59 srv4 fenced[29709]: srv5 not a cluster member after 6 sec
post_join_delay
Jul 16 16:19:59 srv4 kernel: fenced[29709]: segfault at 0000000000000000 rip
0000000000405e97 rsp 00007fff30ee0b80 error 4
below my cluster.conf file:
<?xml version="1.0"?>
<cluster name="correo" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="srv4" nodeid="1" votes="1">
<fence>
<method name="single">
<device name="ilo_srv4"/>
</method>
</fence>
</clusternode>
<clusternode name="srv5" nodeid="1" votes="1">
<fence>
<method name="single">
<device name="ilo_srv5"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="ilo_srv4" agent="fence_ilo"
ipaddr="192.168.8.180" login="ilo" />
<fencedevice name="ilo_srv5" agent="fence_ilo"
ipaddr="192.168.8.181" login="ilo" />
</fencedevices>
</cluster>
any hint what could cause the segfault of the fenced? The ilo boards on the
two servers are not yet configured, I don't know whether this could cause
the problem?
kind regards
Sebastian
More information about the Linux-cluster
mailing list