[Linux-cluster] gfs2, kvm setup
J. Bruce Fields
bfields at fieldses.org
Fri Jul 11 22:35:39 UTC 2008
On Thu, Jul 10, 2008 at 10:26:54AM +0100, Christine Caulfield wrote:
> J. Bruce Fields wrote:
>> On Wed, Jul 09, 2008 at 04:50:14PM +0100, Christine Caulfield wrote:
>>> J. Bruce Fields wrote:
>>>> On Wed, Jul 09, 2008 at 09:51:02AM +0100, Christine Caulfield wrote:
>>>>> Steven Whitehouse wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote:
>>>>>>> On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
>>>>>>>> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
>>>>>>>>> On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
>>>>>>>>>> - write(control_fd, in, sizeof(struct gdlm_plock_info));
>>>>>>>>>> + write(control_fd, in, sizeof(struct dlm_plock_info));
>>>>>>>>> Gah, sorry, I keep fixing that and it keeps reappearing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Jul 1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
>>>>>>>>>> It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
>>>>>>>>>> in "D" state in dlm_rcom_status(), so I guess the second node isn't
>>>>>>>>>> getting some dlm reply it expects?
>>>>>>>>> dlm inter-node communication is not working here for some reason. There
>>>>>>>>> must be something unusual with the way the network is configured on the
>>>>>>>>> nodes, and/or a problem with the way the cluster code is applying the
>>>>>>>>> network config to the dlm.
>>>>>>>>>
>>>>>>>>> Ah, I just remembered what this sounds like; we see this kind of thing
>>>>>>>>> when a network interface has multiple IP addresses, and/or routing is
>>>>>>>>> configured strangely. Others cc'ed could offer better details on exactly
>>>>>>>>> what to look for.
>>>>>>>> OK, thanks! I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
>>>>>>>> neither, and it's entirely likely there's some obvious misconfiguration.
>>>>>>>> On the kvm host there are 4 virtual interfaces bridged together:
>>>>>>> I ran wireshark on vnet0 while doing the second mount; what I saw was
>>>>>>> the second machine opened a tcp connection to port 21064 on the first
>>>>>>> (which had already completed the mount), and sent it a single message
>>>>>>> identified by wireshark as "DLM3" protocol, type recovery command:
>>>>>>> status command. It got back an ACK then a RST.
>>>>>>>
>>>>>>> Then the same happened in the other direction, with the first machine
>>>>>>> sending a similar message to port 21064 on the second, which then reset
>>>>>>> the connection.
>>>>>>>
>>>>> That's a symptom of the "connect from non-cluster node" error in
>>>>> the DLM.
>>>> I think I am getting a message to that affect in my logs.
>>>>
>>>>> It's got a connection from an IP address that is not known to
>>>>> cman. So it closes it as a spoofer
>>>> OK. Is there an easy way to see the list of ip addresses known to cman?
>>> yes,
>>>
>>> cman_tool nodes -a
>>>
>>> will show you all the nodes and their known IP addresses
>>
>> piglet2:~# cman_tool nodes -a
>> Node Sts Inc Joined Name
>> 1 M 376 2008-07-09 12:30:32 piglet1
>> Addresses: 192.168.122.129 2 M 368 2008-07-09 12:30:31
>> piglet2
>> Addresses: 192.168.122.130 3 M 380 2008-07-09 12:30:33
>> piglet3
>> Addresses: 192.168.122.131 4 M 372 2008-07-09 12:30:31
>> piglet4
>> Addresses: 192.168.122.132
>>
>> These addresses are correct (and are the same addresses that show up in the
>> packet trace).
>>
>> I must be overlooking something very obvious....
>
> Hmm, very odd.
>
> Are those IP addresses consistent across all nodes in the cluster ?
Yes, "cman_tool nodes -a" gives the same IP addresses no matter which of
the four cluster nodes it's run on.
--b.
More information about the Linux-cluster
mailing list