[Linux-cluster] new cluster acting odd

Mon Dec 1 15:57:19 UTC 2014

On 01/12/14 09:16 AM, Megan . wrote:
> Good Day,
>
> I'm fairly new to the cluster world so i apologize in advance for
> silly questions.  Thank you for any help.

No pre-existing knowledge required, no need to apologize. :)

> We decided to use this cluster solution in order to share GFS2 mounts
> across servers.  We have a 7 node cluster that is newly setup, but
> acting oddly.  It has 3 vmware guest hosts and 4 physical hosts (dells
> with Idracs).  They are all running Centos 6.6.  I have fencing
> working (I'm able to do fence_node node and it will fence with
> success).  I do not have the gfs2 mounts in the cluster yet.

Very glad you have fencing, that's a common early mistake.

7-node cluster is actually pretty large and is around the upper-end 
before tuning starts to become fairly important.

> When I don't touch the servers, my cluster looks perfect with all
> nodes online. But when I start testing fencing, I have an odd problem
> where i end up with split brain between some of the nodes.  They won't
> seem to automatically fence each other when it gets like this.

If you get a split-brain, something is seriously broken. Either the 
fencing isn't properly working (getting a false success from the agent, 
for example). Can you pastebin your cluster.conf (or fpaste or something 
where tabs are preserved to make it more readible)?

> in the  corosync.log for the node that gets split out i see the totem
> chatter, but it seems confused and just keeps doing the below over and
> over:
>
> Dec 01 12:39:15 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c
>
> Dec 01 12:39:17 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c
>
> Dec 01 12:39:19 corosync [TOTEM ] Retransmit List: 22 24 25 26 27 28 29 2a 2b 2c
>
> Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
>
> Dec 01 12:39:39 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
> 21 23 24 25 26 27 28 29 2a 2b 32
> ..
> ..
> ..
> Dec 01 12:54:49 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c
>
> Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c
>
> Dec 01 12:54:50 corosync [TOTEM ] Retransmit List: 1 3 4 5 6 7 8 9 a b
> 1d 1f 20 21 22 23 24 25 26 27 2e 30 31 32 37 38 39 3a 3b 3c

This is a sign of network congestion. This is the node saying "I lost 
some (corosync) data, please retransmit".

> I can manually fence it, and it still comes online with the same
> issue.  I end up having to take the whole cluster down, sometimes
> forcing reboot on some nodes, then brining it back up.  Its takes a
> good part of the day just to bring the whole cluster online again.

Something fence related is not working.

> I used ccs -h node --sync --activate and double checked to make sure
> they are all using the same version of the cluster.conf file.

You can also use 'cman_tool version'.

> Once issue I did notice, is that when one of the vmware hosts is
> rebooted, the time comes off slitty skewed (6 seconds) but i thought i
> read somewhere that a skew that minor shouldn't impact the cluster.

Iirc, before RHEL 6.2, this was a problem. Now though, it shouldn't be. 
I am more curious what might be underlying the skew, rather than the 
skew itself being a concern.

> We have multicast enabled on the interfaces
>
>            UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
> and we have been told by our network team that IGMP snooping is disabled.
>
> With tcpdump I can see the multi-cast traffic chatter.
>
> Right now:
>
> [root at data1-uat ~]# clustat
> Cluster Status for projectuat @ Mon Dec  1 13:56:39 2014
> Member Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   archive1-uat.domain.com                                1 Online
>   admin1-uat.domain.com                                  2 Online
>   mgmt1-uat.domain.com                                   3 Online
>   map1-uat.domain.com                                    4 Online
>   map2-uat.domain.com                                    5 Online
>   cache1-uat.domain.com                                  6 Online
>   data1-uat.domain.com                                   8 Online, Local
>
>
>
> ** Has itself ass online **
> [root at map1-uat ~]# clustat
> Cluster Status for projectuat @ Mon Dec  1 13:57:07 2014
> Member Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   archive1-uat.domain.com                                1 Online
>   admin1-uat.domain.com                                  2 Online
>   mgmt1-uat.domain.com                                   3 Online
>   map1-uat.domain.com                                    4 Offline, Local
>   map2-uat.domain.com                                    5 Online
>   cache1-uat.domain.com                                  6 Online
>   data1-uat.domain.com                                   8 Online

That is really, really odd. I think we'll need one of the red hat folks 
to chime in.

> [root at cache1-uat ~]# clustat
> Cluster Status for projectuat @ Mon Dec  1 13:57:39 2014
> Member Status: Quorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   archive1-uat.domain.com                                1 Online
>   admin1-uat.domain.com                                  2 Online
>   mgmt1-uat.domain.com                                   3 Online
>   map1-uat.domain.com                                    4 Online
>   map2-uat.domain.com                                    5 Online
>   cache1-uat.domain.com                                  6 Offline, Local
>   data1-uat.domain.com                                   8 Online
>
>
>
> [root at mgmt1-uat ~]# clustat
> Cluster Status for projectuat @ Mon Dec  1 13:58:04 2014
> Member Status: Inquorate
>
>   Member Name                                                     ID   Status
>   ------ ----                                                     ---- ------
>   archive1-uat.domain.com                                1 Offline
>   admin1-uat.domain.com                                  2 Offline
>   mgmt1-uat.domain.com                                   3 Online, Local
>   map1-uat.domain.com                                    4 Offline
>   map2-uat.domain.com                                    5 Offline
>   cache1-uat.domain.com                                  6 Offline
>   data1-uat.domain.com                                   8 Offline
>
>
> cman-3.0.12.1-68.el6.x86_64
>
>
> [root at data1-uat ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster config_version="66" name="projectuat">
> <clusternodes>
> <clusternode name="admin1-uat.domain.com" nodeid="2">
> <fence>
> <method name="fenceadmin1uat">
> <device name="vcappliancesoap" port="admin1-uat" ssl="on"
> uuid="421df3c4-a686-9222-366e-9a67b25f62b2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="mgmt1-uat.domain.com" nodeid="3">
> <fence>
> <method name="fenceadmin1uat">
> <device name="vcappliancesoap" port="mgmt1-uat" ssl="on"
> uuid="421d5ff5-66fa-5703-66d3-97f845cf8239"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="map1-uat.domain.com" nodeid="4">
> <fence>
> <method name="fencemap1uat">
> <device name="idracmap1uat"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="map2-uat.domain.com" nodeid="5">
> <fence>
> <method name="fencemap2uat">
> <device name="idracmap2uat"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="cache1-uat.domain.com" nodeid="6">
> <fence>
> <method name="fencecache1uat">
> <device name="idraccache1uat"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="data1-uat.domain.com" nodeid="8">
> <fence>
> <method name="fencedata1uat">
> <device name="idracdata1uat"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="archive1-uat.domain.com" nodeid="1">
> <fence>
> <method name="fenceadmin1uat">
> <device name="vcappliancesoap" port="archive1-uat" ssl="on"
> uuid="421d16b2-3ed0-0b9b-d530-0b151d81d24e"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice agent="fence_vmware_soap" ipaddr="x.x.x.130"
> login="fenceuat" login_timeout="10" name="vcappliancesoap"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="10"
> power_wait="30" retry_on="3" shell_timeout="10" ssl="1"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.47" login="fenceuat" name="idracdata1uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.48" login="fenceuat" name="idracdata2uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.82" login="fenceuat" name="idracmap1uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.96" login="fenceuat" name="idracmap2uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.83" login="fenceuat" name="idraccache1uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> <fencedevice agent="fence_drac5" cmd_prompt="admin1->"
> ipaddr="x.x.x.97" login="fenceuat" name="idraccache2uat"
> passwd_script="/etc/cluster/forfencing.sh" power_timeout="60"
> power_wait="60" retry_on="10" secure="on" shell_timeout="10"/>
> </fencedevices>
> </cluster>

-ENOPARSE

My recommendation would be to schedule a maintenance window and then 
stop everything except cman (no rgmanager, no gfs2, etc). Then 
methodically test crashing all nodes (I like 'echo c > 
/proc/sysrq-trigger) and verify they are fenced and then recover 
properly. It's worth disabling cman and rgmanager from starting at boot 
(period, but particularly for this test).

If you can reliably (and repeatedly) crash -> fence -> rejoin, then I'd 
start loading back services and re-trying. If the problem reappears only 
under load, then that's an indication of the problem, too.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?