[Linux-cluster] new cluster acting odd

Mon Dec 1 18:24:00 UTC 2014

On 01/12/14 01:03 PM, Megan . wrote:
> We have 11 10-20TB GFS2 mounts that I need to share across all nodes.
> Its the only reason we went with the cluster solution.  I don't know
> how we could split it up into different smaller clusters.

I would do this, personally;

2-Node cluster; DRBD (on top of local disks or a pair of SANs, one per 
node), exported over NFS and configured in a simple single-primary 
(master/slave) configuration with a floating IP.

GFS2, like any clustered filesystem, requires cluster locking. This 
locking comes with a non-trivial overhead. Exporting NFS allows you to 
avoid this bottle-neck and with a simple 2-node cluster behind the 
scenes, you maintain full HA.

In HA, nothing is more important than simplicity. Said another way;

"A cluster isn't beautiful when there is nothing left to add. It is 
beautiful when there is nothing left to take away."

> On Mon, Dec 1, 2014 at 12:14 PM, Digimer <lists at alteeve.ca> wrote:
>> On 01/12/14 11:56 AM, Megan . wrote:
>>>
>>> Thank you for your replies.
>>>
>>> The cluster is intended to be 9 nodes, but i haven't finished building
>>> the remaining 2.  Our production cluster is expected to be similar in
>>> size.  What tuning should I be looking at?
>>>
>>>
>>> Here is a link to our config.  http://pastebin.com/LUHM8GQR  I had to
>>> remove IP addresses.
>>
>>
>> Can you simplify those fencedevice definitions? I would wonder if the set
>> timeouts could be part of the problem. Always start with the simplest
>> possible configurations and only add options in response to actual issues
>> discovered in testing.
>
> I can try to simplify.  I had the longer timeouts because what I saw
> happening on the physical boxes, was the box would be on its way
> down/up and the fence command would fail, but the box actually did
> come back online.  The physicals take 10-15 minutes to reboot and i
> wasn't sure how to handle timeout issues, so i made the timeouts a bit
> extreme for testing. I'll try to make the config more vanilla for
> troubleshooting.

I'm not really sure why the state of the node should impact the fence 
action in any way. Fencing is supposed to work, regardless of the state 
of the target.

Fencing works like this (with a default config, on most fence agents);

1. Force off
2. Verify off
3. Try to boot, don't care if it succeeds.

So once the node is confirmed off by the agent, the fence is considered 
a success. How long (if at all) it takes for the node to reboot does not 
factor in.

>>> I tried the method of (echo c > /proc/sysrq-trigger) to crash a node,
>>> the cluster kept seeing it as online and never fenced it, yet i could
>>> no longer ssh to the node.  I did this on a physical and VM box with
>>> the same result.  I had to fence_node node to get it to reboot, but it
>>> came up split brained (thinking it was the only one online). Now that
>>> node has cman down and the rest of the cluster sees it as still
>>> online.
>>
>>
>> Then corosync failed to detect the fault. That is a sign, to me, of a
>> fundamental network or configuration issue. Corosync should have shown
>> messages about a node being lost and reconfiguring. If that didn't happen,
>> then you're not even up to the point where fencing factors in.
>>
>> Did you configure corosync.conf? When it came up, did it think it was
>> quorate or inquorate?
>
> corosync.conf didn't work since it seems the RedHat HA Cluster doesn't
> use that file.  http://people.redhat.com/ccaulfie/docs/CmanYinYang.pdf
>   I tried it since we wanted to try to put the multicast traffic on a
> different bond/vlan but we figured out the file isn't used.

Right, I wanted to make sure that, if you had tried, you've since 
removed the corosync.conf entirely. Corosync is fully controlled by the 
cman cluster.conf file.

>>> I thought fencing was working because i'm able to do fence_node node
>>> and see the box reboot and come back online.  I did have to get the FC
>>> version of the fence_agents because of an issue with the idrac agent
>>> not working properly.  We are running fence-agents-3.1.6-1.fc14.x86_64
>>
>>
>> That tells you that the configuration of the fence agents is working, but it
>> doesn't test failure detection. You can use the 'fence_check' tool to see if
>> the cluster can talk to everything, but in the end, the only useful test is
>> to simulate an actual crash.
>>
>> Wait; 'fc14' ?! What OS are you using?
>
> We are Centos 6.6.  I went with the fedora core agents because of this
> exact issue http://forum.proxmox.com/threads/12311-Proxmox-HA-fencing-and-Dell-iDrac7
>   I read that it was fixed in the next version, which i could only find
> for FC.

It would be *much* better to file a bug report 
(https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%206) 
-> Version: 6.6 -> Component: fence-agents

Mixing RPMs from other OSes is not a good idea at all.

>>> fence_tool dump worked on one of my nodes, but it is just hanging on the
>>> rest.
>>>
>>> [root at map1-uat ~]# fence_tool dump
>>> 1417448610 logging mode 3 syslog f 160 p 6 logfile p 6
>>> /var/log/cluster/fenced.log
>>> 1417448610 fenced 3.0.12.1 started
>>> 1417448610 connected to dbus :1.12
>>> 1417448610 cluster node 1 added seq 89048
>>> 1417448610 cluster node 2 added seq 89048
>>> 1417448610 cluster node 3 added seq 89048
>>> 1417448610 cluster node 4 added seq 89048
>>> 1417448610 cluster node 5 added seq 89048
>>> 1417448610 cluster node 6 added seq 89048
>>> 1417448610 cluster node 8 added seq 89048
>>> 1417448610 our_nodeid 4 our_name map1-uat.project.domain.com
>>> 1417448611 logging mode 3 syslog f 160 p 6 logfile p 6
>>> /var/log/cluster/fenced.log
>>> 1417448611 logfile cur mode 100644
>>> 1417448611 cpg_join fenced:daemon ...
>>> 1417448621 daemon cpg_join error retrying
>>> 1417448631 daemon cpg_join error retrying
>>> 1417448641 daemon cpg_join error retrying
>>> 1417448651 daemon cpg_join error retrying
>>> 1417448661 daemon cpg_join error retrying
>>> 1417448671 daemon cpg_join error retrying
>>> 1417448681 daemon cpg_join error retrying
>>> 1417448691 daemon cpg_join error retrying
>>> .
>>> .
>>> .
>>>
>>>
>>> [root at map1-uat ~]# clustat
>>> Cluster Status for gibsuat @ Mon Dec  1 16:51:49 2014
>>> Member Status: Quorate
>>>
>>>    Member Name                                                     ID
>>> Status
>>>    ------ ----                                                     ----
>>> ------
>>>    archive1-uat.project.domain.com                                1 Online
>>>    admin1-uat.project.domain.com                                  2 Online
>>>    mgmt1-uat.project.domain.com                                   3 Online
>>>    map1-uat.project.domain.com                                    4 Online,
>>> Local
>>>    map2-uat.project.domain.com                                    5 Online
>>>    cache1-uat.project.domain.com                                 6 Online
>>>    data1-uat.project.domain.com                                   8 Online
>>>
>>>
>>> The  /var/log/cluster/fenced.log on the nodes is saying Dec 01
>>> 16:02:34 fenced cpg_join error retrying every 10th of a second.
>>>
>>> Obviously having some major issues.  These are fresh boxes, no other
>>> services right now other then ones related to the cluster.
>>
>>
>> What OS/version?
>>
>>> I've also experimented with the  <cman transport="udpu"/> to disable
>>> multicast to see if that helped but it doesn't seem to make a
>>> difference with the node stability.
>>
>>
>> Very bad idea with >2~3 node clusters. The overhead will be far too great
>> for a 7~9 node cluster.
>>
>>> Is there a document or some sort of reference that I can give the
>>> network folks on how the switches should be configured?  I read stuff
>>> on boards about IGMP snooping, but I couldn't find anything from
>>> RedHat to hand them.
>>
>>
>> I have this:
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#Six_Network_Interfaces.2C_Seriously.3F
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network_Switches
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network_Security_Considerations
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#Network
>>
>> There are comments in there about multicast, etc.
>>
>
> Thank you for the links.  I will review them with our network folks,
> hopefully it will help us sort out some of our issues.
>
> I will use the fence_check tool to see if i can troubleshoot the fencing.
>
> Thank you very much for all of your suggestions.

Happy to help. :)

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?