<html><body bgcolor="#FFFFFF"><div>I hope you're planning to expand to least a 3 node cluster before you go into production. You know two node clusters are inherently unstable, right?I assume you've read the architectural overview of how the cluster suite achieves quorum.</div><div> </div><div>A cluster requires (n/2)+1 to continue to operate. If you restart or otherwise remove a machine from a two node cluster, you've lost quorum and by definition you've dissolved your cluster while you're in that state.</div><div> </div><div>I'm pretty sure the behavior you are describing is proper. Time flies like an arrow.<div>Fruit flies like a banana.</div></div><div> On May 11, 2009, at 4:08, "Viral .D. Ahire" <<a href="mailto:CISPLengineer.hz@ril.com">CISPLengineer.hz@ril.com</a>> wrote: </div><div></div><blockquote type="cite"><div> Hi, I have configured two node cluster on redhat-5. now the problem is when i relocate,restart or stop, running cluster service between nodes (2 nos) ,the node get fenced and restart server . Other side, the server who obtain cluster service leave the cluster and it's cluster service (cman) stop automatically .so it is also fenced by other server. I observed that , this problem occurred while stopping cluster service (oracle). Please help me to resolve this problem. log messages and cluster.conf file are as given as below. ------------------------- /etc/cluster/cluster.conf ------------------------- <?xml version="1.0"?> <cluster config_version="59" name="new_cluster"> <fence_daemon post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="psfhost1" nodeid="1" votes="1"> <fence> <method name="1"> <device name="cluster1"/> </method> </fence> </clusternode> <clusternode name="psfhost2" nodeid="2" votes="1"> <fence> <method name="1"> <device name="cluster2"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_ilo" hostname="ilonode1" login="Administrator" name="cluster1" passwd="9M6X9CAU"/> <fencedevice agent="fence_ilo" hostname="ilonode2" login="Administrator" name="cluster2" passwd="ST69D87V"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="poy-cluster" ordered="0" restricted="0"> <failoverdomainnode name="psfhost1" priority="1"/> <failoverdomainnode name="psfhost2" priority="1"/> </failoverdomain> </failoverdomains> <resources> <ip address="10.2.220.2" monitor_link="1"/> <script file="/etc/init.d/httpd" name="httpd"/> <fs device="/dev/cciss/c1d0p3" force_fsck="0" force_unmount="0" fsid="52427" fstype="ext3" mountpoint="/app" name="app" options="" self_fence="0"/> <fs device="/dev/cciss/c1d0p4" force_fsck="0" force_unmount="0" fsid="39388" fstype="ext3" mountpoint="/opt" name="opt" options="" self_fence="0"/> <fs device="/dev/cciss/c1d0p1" force_fsck="0" force_unmount="0" fsid="62307" fstype="ext3" mountpoint="/data" name="data" options="" self_fence="0"/> <fs device="/dev/cciss/c1d0p2" force_fsck="0" force_unmount="0" fsid="47234" fstype="ext3" mountpoint="/OPERATION" name="OPERATION" options="" self_fence="0"/> <script file="/etc/init.d/orcl" name="Oracle"/> </resources> <service autostart="0" name="oracle" recovery="relocate"> <fs ref="app"/> <fs ref="opt"/> <fs ref="data"/> <fs ref="OPERATION"/> <ip ref="10.2.220.2"/> <script ref="Oracle"/> </service> </rm> </cluster> ---------------- ------- /var/log/messages ----------------------- following logs during relocate cluster service (oracle) between nodes. <big>Node-1</big> 2 16:17:58 psfhost2 clurgmgrd[3793]: <notice> Starting stopped service service:oracle May 2 16:17:58 psfhost2 kernel: kjournald starting. Commit interval 5 seconds May 2 16:17:58 psfhost2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended May 2 16:17:58 psfhost2 kernel: EXT3 FS on cciss/c1d0p3, internal journal May 2 16:17:58 psfhost2 kernel: EXT3-fs: mounted filesystem with ordered data mode. May 2 16:17:58 psfhost2 kernel: kjournald starting. Commit interval 5 seconds May 2 16:17:58 psfhost2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended May 2 16:17:58 psfhost2 kernel: EXT3 FS on cciss/c1d0p4, internal journal May 2 16:17:58 psfhost2 kernel: EXT3-fs: mounted filesystem with ordered data mode. May 2 16:17:58 psfhost2 kernel: kjournald starting. Commit interval 5 seconds May 2 16:17:58 psfhost2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended May 2 16:17:58 psfhost2 kernel: EXT3 FS on cciss/c1d0p1, internal journal May 2 16:17:58 psfhost2 kernel: EXT3-fs: mounted filesystem with ordered data mode. May 2 16:17:59 psfhost2 kernel: kjournald starting. Commit interval 5 seconds May 2 16:17:59 psfhost2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended May 2 16:17:59 psfhost2 kernel: EXT3 FS on cciss/c1d0p2, internal journal May 2 16:17:59 psfhost2 kernel: EXT3-fs: mounted filesystem with ordered data mode. May 2 16:17:59 psfhost2 avahi-daemon[3661]: Registering new address record for 10.2.220.2 on eth0. May 2 16:18:00 psfhost2 in.rdiscd[5945]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use May 2 16:18:00 psfhost2 in.rdiscd[5945]: Failed joining addresses May 2 16:18:11 psfhost2 clurgmgrd[3793]: <notice> Service service:oracle started May 2 16:19:17 psfhost2 kernel: bnx2: eth1 NIC Link is Down May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] entering GATHER state from 11. May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] Saving state aru 1b high seq received 1b May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] Storing new sequence id for ring 90 May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] entering COMMIT state. May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] entering RECOVERY state. May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] position [0] member 10.2.220.6: May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] previous ring seq 140 rep 10.2.220.6 May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] aru 9 high delivered 9 received flag 1 May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] position [1] member 10.2.220.7: May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] previous ring seq 136 rep 10.2.220.7 May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] aru 1b high delivered 1b received flag 1 May 2 16:19:26 psfhost2 openais[3275]: [TOTEM] Did not need to originate any messages in recovery. May 2 16:19:26 psfhost2 openais[3275]: [CLM ] CLM CONFIGURATION CHANGE May 2 16:19:26 psfhost2 openais[3275]: [CLM ] New Configuration: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] r(0) ip(10.2.220.7) May 2 16:19:27 psfhost2 openais[3275]: [CLM ] Members Left: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] Members Joined: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] CLM CONFIGURATION CHANGE May 2 16:19:27 psfhost2 openais[3275]: [CLM ] New Configuration: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] r(0) ip(10.2.220.6) May 2 16:19:27 psfhost2 openais[3275]: [CLM ] r(0) ip(10.2.220.7) May 2 16:19:27 psfhost2 openais[3275]: [CLM ] Members Left: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] Members Joined: May 2 16:19:27 psfhost2 openais[3275]: [CLM ] r(0) ip(10.2.220.6) May 2 16:19:27 psfhost2 openais[3275]: [SYNC ] This node is within the primary component and will provide service. May 2 16:19:27 psfhost2 openais[3275]: [TOTEM] entering OPERATIONAL state. May 2 16:19:27 psfhost2 openais[3275]: [CLM ] got nodejoin message 10.2.220.6 May 2 16:19:27 psfhost2 openais[3275]: [CLM ] got nodejoin message 10.2.220.7 May 2 16:19:27 psfhost2 openais[3275]: [CPG ] got joinlist message from node 2 May 2 16:19:29 psfhost2 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON May 2 16:19:31 psfhost2 kernel: bnx2: eth1 NIC Link is Down May 2 16:19:35 psfhost2 kernel: bnx2: eth1 NIC Link is Up, 100 Mbps full duplex, receive & transmit flow control ON May 2 16:19:42 psfhost2 kernel: dlm: connecting to 1 May 2 16:20:36 psfhost2 ccsd[3265]: Update of cluster.conf complete (version 57 -> 59). May 2 16:20:43 psfhost2 clurgmgrd[3793]: <notice> Reconfiguring May 2 16:21:15 psfhost2 clurgmgrd[3793]: <notice> Stopping service service:oracle May 2 16:21:25 psfhost2 avahi-daemon[3661]: Withdrawing address record for 10.2.220.7 on eth0. May 2 16:21:25 psfhost2 avahi-daemon[3661]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.2.220.7. May 2 16:21:25 psfhost2 avahi-daemon[3661]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.2.220.2. May 2 16:21:25 psfhost2 clurgmgrd: [3793]: <err> Failed to remove 10.2.220.2 May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] entering RECOVERY state. May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] position [0] member 127.0.0.1: May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] previous ring seq 144 rep 10.2.220.6 May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] aru 31 high delivered 31 received flag 1 May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] Did not need to originate any messages in recovery. May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] Sending initial ORF token May 2 16:21:40 psfhost2 openais[3275]: [CLM ] CLM CONFIGURATION CHANGE May 2 16:21:40 psfhost2 openais[3275]: [CLM ] New Configuration: May 2 16:21:40 psfhost2 openais[3275]: [CLM ] r(0) ip(127.0.0.1) May 2 16:21:40 psfhost2 openais[3275]: [CLM ] Members Left: May 2 16:21:40 psfhost2 openais[3275]: [CLM ] r(0) ip(10.2.220.7) May 2 16:21:40 psfhost2 openais[3275]: [CLM ] Members Joined: May 2 16:21:40 psfhost2 openais[3275]: [CLM ] CLM CONFIGURATION CHANGE May 2 16:21:40 psfhost2 openais[3275]: [CLM ] New Configuration: May 2 16:21:40 psfhost2 openais[3275]: [CLM ] r(0) ip(127.0.0.1) May 2 16:21:40 psfhost2 openais[3275]: [CLM ] Members Left: May 2 16:21:40 psfhost2 openais[3275]: [CLM ] Members Joined: May 2 16:21:40 psfhost2 openais[3275]: [SYNC ] This node is within the primary component and will provide service. May 2 16:21:40 psfhost2 openais[3275]: [TOTEM] entering OPERATIONAL state. May 2 16:21:40 psfhost2 openais[3275]: [<st1:place w:st="on">MAIN</st1:place> ] Killing node psfhost2 because it has rejoined the cluster without cman_tool join May 2 16:21:40 psfhost2 openais[3275]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart May 2 16:21:40 psfhost2 fenced[3291]: cman_get_nodes error -1 104 May 2 16:21:40 psfhost2 kernel: clurgmgrd[3793]: segfault at 0000000000000000 rip 0000000000408c4a rsp 00007fff3c4a9e20 error 4 May 2 16:21:40 psfhost2 fenced[3291]: cluster is down, exiting May 2 16:21:40 psfhost2 groupd[3283]: cman_get_nodes error -1 104 May 2 16:21:40 psfhost2 dlm_controld[3297]: cluster is down, exiting May 2 16:21:40 psfhost2 gfs_controld[3303]: cluster is down, exiting May 2 16:21:40 psfhost2 clurgmgrd[3792]: <crit> Watchdog: Daemon died, rebooting... May 2 16:21:40 psfhost2 kernel: dlm: closing connection to node 1 May 2 16:21:40 psfhost2 kernel: dlm: closing connection to node 2 May 2 16:21:40 psfhost2 kernel: md: stopping all md devices. May 2 16:21:41 psfhost2 kernel: uhci_hcd 0000:01:04.4: HCRESET not completed yet! May 2 16:24:55 psfhost2 syslogd 1.4.1: restart. May 2 16:24:55 psfhost2 kernel: klogd 1.4.1, log source = /proc/kmsg started. May 2 16:24:55 psfhost2 kernel: Linux version 2.6.18-53.el5 (<a class="moz-txt-link-abbreviated" href="mailto:brewbuilder@hs20-bc1-7.build.redhat.com"><a href="mailto:brewbuilder@hs20-bc1-7.build.redhat.com">brewbuilder@hs20-bc1-7.build.redhat.com</a></a>) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Wed Oct </div></blockquote><blockquote type="cite"><div>-- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster">https://www.redhat.com/mailman/listinfo/linux-cluster</a></div></blockquote></body></html>