[Linux-cluster] gfs2, kvm setup

J. Bruce Fields bfields at fieldses.org
Wed Jul 9 15:29:46 UTC 2008


On Wed, Jul 09, 2008 at 09:44:24AM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2008-07-08 at 18:15 -0400, J. Bruce Fields wrote:
> > On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
> > > On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
> > > > On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
> > > > > -	write(control_fd, in, sizeof(struct gdlm_plock_info));
> > > > > +	write(control_fd, in, sizeof(struct dlm_plock_info));
> > > > 
> > > > Gah, sorry, I keep fixing that and it keeps reappearing.
> > > > 
> > > > 
> > > > > Jul  1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
> > > > 
> > > > > It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
> > > > > in "D" state in dlm_rcom_status(), so I guess the second node isn't
> > > > > getting some dlm reply it expects?
> > > > 
> > > > dlm inter-node communication is not working here for some reason.  There
> > > > must be something unusual with the way the network is configured on the
> > > > nodes, and/or a problem with the way the cluster code is applying the
> > > > network config to the dlm.
> > > > 
> > > > Ah, I just remembered what this sounds like; we see this kind of thing
> > > > when a network interface has multiple IP addresses, and/or routing is
> > > > configured strangely.  Others cc'ed could offer better details on exactly
> > > > what to look for.
> > > 
> > > OK, thanks!  I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
> > > neither, and it's entirely likely there's some obvious misconfiguration.
> > > On the kvm host there are 4 virtual interfaces bridged together:
> > 
> > I ran wireshark on vnet0 while doing the second mount; what I saw was
> > the second machine opened a tcp connection to port 21064 on the first
> > (which had already completed the mount), and sent it a single message
> > identified by wireshark as "DLM3" protocol, type recovery command:
> > status command.  It got back an ACK then a RST.
> > 
> > Then the same happened in the other direction, with the first machine
> > sending a similar message to port 21064 on the second, which then reset
> > the connection.
> > 
> > --b.
> > 
> An ACK & RST for the same packet? Or was than an ACK SYN for the SYN and
> then an RST for the following data packet? Could you post the trace or
> put it somewhere we can see it?

Sure, thanks.  It's at

	http://www.fieldses.org/~bfields/failed-dlm.pcap
	http://www.fieldses.org/~bfields/failed-dlm-filtered.pcap

(The second is just the dlm traffic, with all the ais, ssh, dns, etc.
filtered out.)

--b.




More information about the Linux-cluster mailing list