[Linux-cluster] gfs2, kvm setup

Tue Jul 8 22:15:33 UTC 2008

On Mon, Jul 07, 2008 at 02:49:28PM -0400, bfields wrote:
> On Mon, Jul 07, 2008 at 10:48:28AM -0500, David Teigland wrote:
> > On Sun, Jul 06, 2008 at 05:51:05PM -0400, J. Bruce Fields wrote:
> > > -	write(control_fd, in, sizeof(struct gdlm_plock_info));
> > > +	write(control_fd, in, sizeof(struct dlm_plock_info));
> > 
> > Gah, sorry, I keep fixing that and it keeps reappearing.
> > 
> > 
> > > Jul  1 14:06:42 piglet2 kernel: dlm: connect from non cluster node
> > 
> > > It looks like dlm_new_workspace() is waiting on dlm_recoverd, which is
> > > in "D" state in dlm_rcom_status(), so I guess the second node isn't
> > > getting some dlm reply it expects?
> > 
> > dlm inter-node communication is not working here for some reason.  There
> > must be something unusual with the way the network is configured on the
> > nodes, and/or a problem with the way the cluster code is applying the
> > network config to the dlm.
> > 
> > Ah, I just remembered what this sounds like; we see this kind of thing
> > when a network interface has multiple IP addresses, and/or routing is
> > configured strangely.  Others cc'ed could offer better details on exactly
> > what to look for.
> 
> OK, thanks!  I'm trying to run gfs2 on 4 kvm machines, I'm an expert on
> neither, and it's entirely likely there's some obvious misconfiguration.
> On the kvm host there are 4 virtual interfaces bridged together:

I ran wireshark on vnet0 while doing the second mount; what I saw was
the second machine opened a tcp connection to port 21064 on the first
(which had already completed the mount), and sent it a single message
identified by wireshark as "DLM3" protocol, type recovery command:
status command.  It got back an ACK then a RST.

Then the same happened in the other direction, with the first machine
sending a similar message to port 21064 on the second, which then reset
the connection.

--b.

> 
> bfields at pig:~$ brctl show
> bridge name	bridge id		STP enabled	interfaces
> vnet0		8000.00ff0823c0f3	yes		vnet1
> 							vnet2
> 							vnet3
> 							vnet4
> 
> vnet0 has address 192.168.122.1 on the host, and the 4 kvm guests are
> statically assigned addresses 129, 130, 131, and 132 on the 192.168.122.*
> network, so a kvm guest looks like:
> 
> piglet1:~# ifconfig
> eth1      Link encap:Ethernet  HWaddr 00:16:3e:16:4d:61  
>           inet addr:192.168.122.129  Bcast:192.168.122.255  Mask:255.255.255.0
>           inet6 addr: fe80::216:3eff:fe16:4d61/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:2464 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1806 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:197099 (192.4 KiB)  TX bytes:165606 (161.7 KiB)
>           Interrupt:11 Base address:0xc100 
> 
> lo        Link encap:Local Loopback  
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:285 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:285 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:13394 (13.0 KiB)  TX bytes:13394 (13.0 KiB)
> 
> piglet1:~# cat /etc/hosts
> 127.0.0.1       localhost
> 192.168.122.129 piglet1
> 192.168.122.130 piglet2
> 192.168.122.131 piglet3
> 192.168.122.132 piglet4
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 
> The network setup looks otherwise fine--they can all ping each other and
> the outside world.
> 
> --b.