[Linux-cluster] gfs2, kvm setup

Christine Caulfield ccaulfie at redhat.com
Wed Jul 9 08:51:02 UTC 2008

Steven Whitehouse wrote:
> Hi,
> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote:
>> On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
>>> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
>>>> On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
>>>>> -	write(control_fd, in, sizeof(struct gdlm_plock_info));
>>>>> +	write(control_fd, in, sizeof(struct dlm_plock_info));
>>>> Gah, sorry, I keep fixing that and it keeps reappearing.
>>>>> Jul  1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
>>>>> It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
>>>>> in "D" state in dlm_rcom_status(), so I guess the second node isn't
>>>>> getting some dlm reply it expects?
>>>> dlm inter-node communication is not working here for some reason.  There
>>>> must be something unusual with the way the network is configured on the
>>>> nodes, and/or a problem with the way the cluster code is applying the
>>>> network config to the dlm.
>>>> Ah, I just remembered what this sounds like; we see this kind of thing
>>>> when a network interface has multiple IP addresses, and/or routing is
>>>> configured strangely.  Others cc'ed could offer better details on exactly
>>>> what to look for.
>>> OK, thanks!  I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
>>> neither, and it's entirely likely there's some obvious misconfiguration.
>>> On the kvm host there are 4 virtual interfaces bridged together:
>> I ran wireshark on vnet0 while doing the second mount; what I saw was
>> the second machine opened a tcp connection to port 21064 on the first
>> (which had already completed the mount), and sent it a single message
>> identified by wireshark as "DLM3" protocol, type recovery command:
>> status command.  It got back an ACK then a RST.
>> Then the same happened in the other direction, with the first machine
>> sending a similar message to port 21064 on the second, which then reset
>> the connection.

That's a symptom of the "connect from non-cluster node" error in the 
DLM. It's got a connection from an IP address that is not known to cman. 
So it closes it as a spoofer.

You'll need to check the routing of the interfaces. The most common 
cause of this sort of error is having two interfaces on the same 
physical (or internal) network.



More information about the Linux-cluster mailing list