[Linux-cluster] cman startup issue

Wed Nov 7 13:57:16 UTC 2007

gordan at bobich.net wrote:
> On Wed, 7 Nov 2007, Patrick Caulfield wrote:
> 
>>>>>>> I'm having a weird problem. I am using a shared GFS root file
>>>>>>> system,
>>>>>>> and the same initrd image on all the machines. The cluster has 3
>>>>>>> machines on it at the moment, and 1 refuses to join the cluster,
>>>>>>> regardless of which order I bring them up in.
>>>>>>>
>>>>>>> When cman service is being started, it fails when starting cman:
>>>>>>>
>>>>>>> cman not started: Can't find local node name in cluster.conf
>>>>>>> /usr/local/sbin/cman_tool: aisexec daemon didn't start
>>>>>>>
>>>>>>> If I try to run aisexec, I get:
>>>>>>> aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed.
>>>>>>>
>>>>>>> Where should I be looking for causes of this? I double checked my
>>>>>>> cluster.conf and the MAC addresses, IP addresses and interface
>>>>>>> names are
>>>>>>> correct in each node's config.
>>>>>>
>>>>>> Check that the new node can write into /tmp - where it is trying to
>>>>>> store the
>>>>>> current ring-id.  It could be SElinux or perhaps the permissions on
>>>>>> the file it
>>>>>> is trying to create.
>>>>>
>>>>> That fixed the aisexec problem, but the "Can't find local node name in
>>>>> cluster.conf" problem remains, and cman still won't start. :-(
>>>>
>>>> Well, it won't start if it can' find the local node name in
>>>> cluster.conf ...
>>>> Have you double-checked that the name(s) in cluster.conf match those
>>>> on the
>>>> ethernet interfaces ?
>>>
>>> You mean as in:
>>> <eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3"
>>> mask="255.255.255.0"/>
>>> ?
>>>
>>> If so, then yes, I checked it about 10 times. That was the first thing I
>>> thought was wrong. :-(
>>
>> As I don't have your cluster.conf or access to your DNS server it's
>> hard to say
>> from here, but that message does mean what it says. If you have older
>> software
>> it might not detect anything other than the node's main hostname, but
>> later
>> versions will check all the interfaces on the system for something
>> that matches
>> anything in cluster.conf.
> 
> Well, the thing that really puzzles me is that the same cluster used to
> work before. All I effectively did was move it to a different IP range
> and changed cluster.conf. I can't figure out what could have changed in
> the meantime to break it, other than cluster.conf. The only other thing
> that's different is that some of the machines have eth1 and eth0
> reversed. Before they all used eth1 for cluster configuration, and now
> one of them uses eth0 (slightly different model, and the manufacturer
> mislaeled the ports on them). But I have two identical machines, and one
> connects, the other doesn't. It really has me stumped.
> 
>> I see you're using eth1 so make sure you do have an up-to-date cman.
> 
> I'm running the latest that is available for RHEL5.

If that's what came with 5.0 then there's a bug in the name matching. I can't
figure out from the CVS tags in which package this was fixed unfortunately.

"revision 1.26
 date: 2007/03/15 11:12:33;  author: pcaulfield;  state: Exp;  lines: +16 -13
 If the machine is multi-homed, then using a truncated name in uname but not in
 cluster.conf would fail to match them up."

--
Patrick