[Linux-cluster] cman startup issue

Wed Nov 7 13:25:53 UTC 2007

On Wed, 7 Nov 2007, Patrick Caulfield wrote:

>>>>>> I'm having a weird problem. I am using a shared GFS root file system,
>>>>>> and the same initrd image on all the machines. The cluster has 3
>>>>>> machines on it at the moment, and 1 refuses to join the cluster,
>>>>>> regardless of which order I bring them up in.
>>>>>>
>>>>>> When cman service is being started, it fails when starting cman:
>>>>>>
>>>>>> cman not started: Can't find local node name in cluster.conf
>>>>>> /usr/local/sbin/cman_tool: aisexec daemon didn't start
>>>>>>
>>>>>> If I try to run aisexec, I get:
>>>>>> aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed.
>>>>>>
>>>>>> Where should I be looking for causes of this? I double checked my
>>>>>> cluster.conf and the MAC addresses, IP addresses and interface
>>>>>> names are
>>>>>> correct in each node's config.
>>>>>
>>>>> Check that the new node can write into /tmp - where it is trying to
>>>>> store the
>>>>> current ring-id.  It could be SElinux or perhaps the permissions on
>>>>> the file it
>>>>> is trying to create.
>>>>
>>>> That fixed the aisexec problem, but the "Can't find local node name in
>>>> cluster.conf" problem remains, and cman still won't start. :-(
>>>
>>> Well, it won't start if it can' find the local node name in
>>> cluster.conf ...
>>> Have you double-checked that the name(s) in cluster.conf match those
>>> on the
>>> ethernet interfaces ?
>>
>> You mean as in:
>> <eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3"
>> mask="255.255.255.0"/>
>> ?
>>
>> If so, then yes, I checked it about 10 times. That was the first thing I
>> thought was wrong. :-(
>
> As I don't have your cluster.conf or access to your DNS server it's hard to say
> from here, but that message does mean what it says. If you have older software
> it might not detect anything other than the node's main hostname, but later
> versions will check all the interfaces on the system for something that matches
> anything in cluster.conf.

Well, the thing that really puzzles me is that the same cluster used to 
work before. All I effectively did was move it to a different IP range and 
changed cluster.conf. I can't figure out what could have changed in the 
meantime to break it, other than cluster.conf. The only other thing that's 
different is that some of the machines have eth1 and eth0 reversed. Before 
they all used eth1 for cluster configuration, and now one of them uses 
eth0 (slightly different model, and the manufacturer mislaeled the ports 
on them). But I have two identical machines, and one connects, the other 
doesn't. It really has me stumped.

> I see you're using eth1 so make sure you do have an up-to-date cman.

I'm running the latest that is available for RHEL5.

Gordan