[Linux-cluster] gfs2, kvm setup

Christine Caulfield ccaulfie at redhat.com
Wed Jul 9 08:51:02 UTC 2008


Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote:
>> On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
>>> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
>>>> On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
>>>>> -	write(control_fd, in, sizeof(struct gdlm_plock_info));
>>>>> +	write(control_fd, in, sizeof(struct dlm_plock_info));
>>>> Gah, sorry, I keep fixing that and it keeps reappearing.
>>>>
>>>>
>>>>> Jul  1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
>>>>> It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
>>>>> in "D" state in dlm_rcom_status(), so I guess the second node isn't
>>>>> getting some dlm reply it expects?
>>>> dlm inter-node communication is not working here for some reason.  There
>>>> must be something unusual with the way the network is configured on the
>>>> nodes, and/or a problem with the way the cluster code is applying the
>>>> network config to the dlm.
>>>>
>>>> Ah, I just remembered what this sounds like; we see this kind of thing
>>>> when a network interface has multiple IP addresses, and/or routing is
>>>> configured strangely.  Others cc'ed could offer better details on exactly
>>>> what to look for.
>>> OK, thanks!  I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
>>> neither, and it's entirely likely there's some obvious misconfiguration.
>>> On the kvm host there are 4 virtual interfaces bridged together:
>> I ran wireshark on vnet0 while doing the second mount; what I saw was
>> the second machine opened a tcp connection to port 21064 on the first
>> (which had already completed the mount), and sent it a single message
>> identified by wireshark as "DLM3" protocol, type recovery command:
>> status command.  It got back an ACK then a RST.
>>
>> Then the same happened in the other direction, with the first machine
>> sending a similar message to port 21064 on the second, which then reset
>> the connection.
>>

That's a symptom of the "connect from non-cluster node" error in the 
DLM. It's got a connection from an IP address that is not known to cman. 
So it closes it as a spoofer.

You'll need to check the routing of the interfaces. The most common 
cause of this sort of error is having two interfaces on the same 
physical (or internal) network.

-- 

Chrissie




More information about the Linux-cluster mailing list