[Linux-cluster] gfs2, kvm setup

Thu Jul 10 09:26:54 UTC 2008

J. Bruce Fields wrote:
> On Wed, Jul 09, 2008 at 04:50:14PM +0100, Christine Caulfield wrote:
>> J. Bruce Fields wrote:
>>> On Wed, Jul 09, 2008 at 09:51:02AM +0100, Christine Caulfield wrote:
>>>> Steven Whitehouse wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote:
>>>>>> On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
>>>>>>> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
>>>>>>>> On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
>>>>>>>>> -	write(control_fd, in, sizeof(struct gdlm_plock_info));
>>>>>>>>> +	write(control_fd, in, sizeof(struct dlm_plock_info));
>>>>>>>> Gah, sorry, I keep fixing that and it keeps reappearing.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Jul  1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
>>>>>>>>> It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
>>>>>>>>> in "D" state in dlm_rcom_status(), so I guess the second node isn't
>>>>>>>>> getting some dlm reply it expects?
>>>>>>>> dlm inter-node communication is not working here for some reason.  There
>>>>>>>> must be something unusual with the way the network is configured on the
>>>>>>>> nodes, and/or a problem with the way the cluster code is applying the
>>>>>>>> network config to the dlm.
>>>>>>>>
>>>>>>>> Ah, I just remembered what this sounds like; we see this kind of thing
>>>>>>>> when a network interface has multiple IP addresses, and/or routing is
>>>>>>>> configured strangely.  Others cc'ed could offer better details on exactly
>>>>>>>> what to look for.
>>>>>>> OK, thanks!  I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
>>>>>>> neither, and it's entirely likely there's some obvious misconfiguration.
>>>>>>> On the kvm host there are 4 virtual interfaces bridged together:
>>>>>> I ran wireshark on vnet0 while doing the second mount; what I saw was
>>>>>> the second machine opened a tcp connection to port 21064 on the first
>>>>>> (which had already completed the mount), and sent it a single message
>>>>>> identified by wireshark as "DLM3" protocol, type recovery command:
>>>>>> status command.  It got back an ACK then a RST.
>>>>>>
>>>>>> Then the same happened in the other direction, with the first machine
>>>>>> sending a similar message to port 21064 on the second, which then reset
>>>>>> the connection.
>>>>>>
>>>> That's a symptom of the "connect from non-cluster node" error in the  
>>>> DLM.
>>> I think I am getting a message to that affect in my logs.
>>>
>>>> It's got a connection from an IP address that is not known to cman.   
>>>> So it closes it as a spoofer
>>> OK.  Is there an easy way to see the list of ip addresses known to cman?
>> yes,
>>
>>   cman_tool nodes -a
>>
>> will show you all the nodes and their known IP addresses
> 
> piglet2:~# cman_tool nodes -a
> Node  Sts   Inc   Joined               Name
>    1   M    376   2008-07-09 12:30:32  piglet1
>        Addresses: 192.168.122.129 
>    2   M    368   2008-07-09 12:30:31  piglet2
>        Addresses: 192.168.122.130 
>    3   M    380   2008-07-09 12:30:33  piglet3
>        Addresses: 192.168.122.131 
>    4   M    372   2008-07-09 12:30:31  piglet4
>        Addresses: 192.168.122.132 
> 
> These addresses are correct (and are the same addresses that show up in the
> packet trace).
> 
> I must be overlooking something very obvious....

Hmm, very odd.

Are those IP addresses consistent across all nodes in the cluster ?

-- 

Chrissie