[Linux-cluster] fenced segfault

Sebastian Reitenbach sebastia at l00-bugdead-prods.de
Mon Jul 16 14:37:12 UTC 2007


Hi,

I am still on openSUSE 10,2, x86_64, using openais-0.80.1-6 (rpm from source 
rpm), and cluster-2.0.0. (self compiled). Kernel is Linux srv4 
2.6.20.15-default #1 SMP Fri Jul 13 12:44:51 CEST 2007 x86_64 x86_64 x86_64 
GNU/Linux

Now, when I run /etc/init.d/cman, for the first time, the fenced segfaults, 
and the cman init script hangs and is waiting for the fenced. When I Ctrl-C 
the init script, and kill the aisexec and the /sbin/ccsd, and then restart 
the init script, then, after some minutes, the fenced is also starting and 
the script ends with a "success".

the following are the logs while starting /etc/init.d/cman


Jul 16 16:19:49 srv4 ccsd[29691]: Starting ccsd 2.00.00:
Jul 16 16:19:49 srv4 ccsd[29691]:  Built: Jul 13 2007 13:24:27
Jul 16 16:19:49 srv4 ccsd[29691]:  Copyright (C) Red Hat, Inc.  2004  All 
rights reserved.
Jul 16 16:19:49 srv4 ccsd[29691]: cluster.conf (cluster name = correo, 
version = 1) found.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service 
RELEASE 'subrev 1204 version 0.80.1'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2002-2006 
MontaVista Software, Inc and contributors.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Copyright (C) 2006 Red Hat, 
Inc.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Using default multicast address 
of 239.192.25.250
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cpg 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais cluster closed process group service v1.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cfg 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais configuration service'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_msg 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais message service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_lck 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais distributed locking service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evt 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais event service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_ckpt 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais checkpoint service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_amf 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais availability management framework B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_clm 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais cluster membership service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_evs 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais extended virtual synchrony service'
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] openais component openais_cman 
loaded.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] Registering service 
handler 'openais CMAN membership service 2.01'
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Token Timeout (10000 ms) 
retransmit timeout (495 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] token hold (386 ms) retransmits 
before loss (20 retrans)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] join (60 ms) send_join (0 ms) 
consensus (4800 ms) merge (200 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] downcheck (1000000 ms) fail to 
recv const (50 msgs)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] seqno unchanged const (30 
rotations) Maximum network MTU 1500
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] window size per rotation (50 
messages) maximum messages per rotation (17 messages)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] send threads (0 threads)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token expired timeout (495 
ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP token problem counter (2000 
ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP threshold (10 problem 
count)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] RRP mode set to none.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] heartbeat_failures_allowed (0)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] max_network_delay (50 ms)
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] HeartBeat is Disabled. To 
enable set heartbeat_failures_allowed > 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Receive multicast socket recv 
buffer size (262142 bytes).
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Transmit multicast socket send 
buffer size (262142 bytes).
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] The network interface 
[192.168.8.13] is now up.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Created or loaded sequence id 
68.192.168.8.13 for this ring.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering GATHER state from 15.
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais extended virtual synchrony service'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais cluster membership service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais availability management framework B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais checkpoint service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais event service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais distributed locking service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais message service B.01.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais configuration service'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais cluster closed process group service v1.01'
Jul 16 16:19:51 srv4 openais[29697]: [SERV ] Initialising service 
handler 'openais CMAN membership service 2.01'
Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] CMAN 2.00.00 (built Jul 13 2007 
13:24:30) started
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] Not using a virtual synchrony 
filter.
Jul 16 16:19:51 srv4 openais[29697]: [MAIN ] AIS Executive Service: started 
and ready to provide service.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Creating commit token because I 
am the rep.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Saving state aru 0 high seq 
received 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering COMMIT state.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering RECOVERY state.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] position [0] member 
192.168.8.13:
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] previous ring seq 68 rep 
192.168.8.13
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] aru 0 high delivered 0 received 
flag 0
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Did not need to originate any 
messages in recovery.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Storing new sequence id for 
ring 48
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] Sending initial ORF token
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] New Configuration:
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] Members Left:
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] Members Joined:
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary 
component and will provide service.
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] New Configuration:
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ]    r(0) ip(192.168.8.13)
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] Members Left:
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] Members Joined:
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ]    r(0) ip(192.168.8.13)
Jul 16 16:19:51 srv4 openais[29697]: [SYNC ] This node is within the primary 
component and will provide service.
Jul 16 16:19:51 srv4 openais[29697]: [TOTEM] entering OPERATIONAL state.
Jul 16 16:19:51 srv4 openais[29697]: [CMAN ] quorum regained, resuming 
activity
Jul 16 16:19:51 srv4 openais[29697]: [CLM  ] got nodejoin message 
192.168.8.13
Jul 16 16:19:51 srv4 ccsd[29691]: Initial status:: Quorate
Jul 16 16:19:59 srv4 fenced[29709]: srv5 not a cluster member after 6 sec 
post_join_delay
Jul 16 16:19:59 srv4 kernel: fenced[29709]: segfault at 0000000000000000 rip 
0000000000405e97 rsp 00007fff30ee0b80 error 4


below my cluster.conf file:

<?xml version="1.0"?>
<cluster name="correo" config_version="1">
  <cman two_node="1" expected_votes="1">
</cman>

<clusternodes>

<clusternode name="srv4" nodeid="1" votes="1">
        <fence>
                <method name="single">
                        <device name="ilo_srv4"/>
                </method>
        </fence>
</clusternode>

<clusternode name="srv5" nodeid="1" votes="1">
        <fence>
                <method name="single">
                        <device name="ilo_srv5"/>
                </method>
        </fence>
</clusternode>

</clusternodes>

<fencedevices>
        <fencedevice name="ilo_srv4" agent="fence_ilo" 
ipaddr="192.168.8.180" login="ilo" />
        <fencedevice name="ilo_srv5" agent="fence_ilo" 
ipaddr="192.168.8.181" login="ilo" />
</fencedevices>

</cluster>

any hint what could cause the segfault of the fenced? The ilo boards on the 
two servers are not yet configured, I don't know whether this could cause 
the problem?

kind regards
Sebastian




More information about the Linux-cluster mailing list