[Linux-cluster] Starter Cluster / GFS

Fri Nov 12 03:25:41 UTC 2010

Digimer,

I think you do not make distinction between the network that maintains the hearbeat and the networks to use for fence devices. I'll explain this again.

These are two very different things and operated for different purpose.

The hearbeat network is between the nodes for the purpose of maintaining cluster membership.
The connections from the nodes to your fence devices form the other two networks.

In fact speaking of networks in this case is a little limiting. Each of the IP addresses involved may, in principle, be in different IP subnet

In the example that you gave, you have two (possibly different) networks for fence devices, as you have two fence devices.

However, your cluster membership is maintained through the single hearbeat network implicitly defined through the names of the cluster nodes.  I want to have two, independently configurable network like this and heartbeat being sent through both of them. I cannot do this at the moment, as the software will always maintain the hertbeat through the single IP address to which the node name resolves. In your case the heartbeat traffic will always go between an-node01.alteeve.com and an-node02.alteeve.com.

What I want is to have hertbeat traffic going between:
an-node01h1.alteeve.com and an-node02h1.alteeve.com
and between
an-node01h2.alteeve.com and an-node02h2.alteeve.com
Whereas my application would access the cluster through:
an-node01.alteeve.com and an-node02.alteeve.com

So I would need minimum of 3 Ethernet interfaces per server and minimum of 6 if all links will be bonded, but this is OK. 

Regards,

Chris Jankowski

-----Original Message-----
From: Digimer [mailto:linux at alteeve.com] 
Sent: Friday, 12 November 2010 13:42
To: Jankowski, Chris
Cc: linux clustering
Subject: Re: [Linux-cluster] Starter Cluster / GFS

On 10-11-11 09:22 PM, Jankowski, Chris wrote:
> Digimer,
> 
>>>>> I can't speak to heartbeat, but under RHCS you can have multiple fence methods and devices, and they will used in the order that they are found in the configuration file.
> 
> Separate hearbeat networks (not a single network with a bonded interface) is what my customers require.  I believe this is not available in standard Linux Cluster, as distributed by RedHat.  This is completely independent from what fencing device or method is used.

It is possible. ie:

<?xml version="1.0"?>
<cluster name="an-cluster" config_version="1">
    <cman two_node="1" expected_votes="1"/>
    <totem secauth="off" rrp_mode="active"/>
    <clusternodes>
        <clusternode name="an-node01.alteeve.com" nodeid="1">
            <fence>
                <method name="ipmi">
                    <device name="fence_an01" action="reboot" />
                </method>
                <method name="node_assassin">
                    <device name="batou" port="01" action="reboot"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="an-node02.alteeve.com" nodeid="2">
            <fence>
                <method name="ipmi">
                    <device name="fence_an02" action="reboot" />
                </method>
                <method name="node_assassin">
                    <device name="batou" port="02" action="reboot"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <fencedevices>
        <fencedevice name="fence_an01" agent="fence_ipmilan"
ipaddr="192.168.3.61" login="admin" passwd="secret" />
        <fencedevice name="fence_an02" agent="fence_ipmilan"
ipaddr="192.168.3.62" login="admin" passwd="secret" />
        <fencedevice name="batou" agent="fence_na"
ipaddr="batou.alteeve.com" login="username" passwd="secret" quiet="1"/>
    </fencedevices>
</cluster>

In the above case, should 'an-node02' need to be fenced, the first method 'ipmi' would be used. Should it fail, the next method 'node_assassin' would be tried.

>>>>> With the power-based devices I've used (again, just IPMI and NA), the poweroff call is more or less instant. I've not seen, personally, a lag exceeding a second with these devices. I would consider a fence device that does not disable a node in <1 second to be flawed.
> 
> 1.
> In the world where I work separate power-based devices are not an option. Blade servers do not even have power supplies. They use common power from the blade enclosure.  The only access to the power state is through service processor.

Out of curiosity, do the blades have header pins for the power and reset switches? I don't see why they would, but I've not played with traditional blades before.

> 2.
> We are not talking about long delays here. The whole cycle of taking the power off a blade including login to the service processor is less than 1 ms. Delay or lack thereof is not a problem.  The transactional nature of the processing is the issue.
> 
> Regards,
> 
> Chris Jankowski

Let me talk to the Red Hat folks and see what they think about configurable per-node user-defined fence delays.

--
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org