[Linux-cluster] usedev directive not working correctly?

Thu Aug 25 19:53:59 UTC 2005

It's been a while since I lst dealt with this issue. Shortly after I posted
this problem I was put on another issue... Alas, here I am again....

I've rebuilt the cluster from the ground up in a clean state. RHEL2 update 4,
with GFS 6.0.2-25 re-compiled for these machines.

Here are my configs....

#/etc/hosts
10.0.0.1	clua
10.0.0.2	club
10.0.0.3	cluc
192.168.0.1 clua-ic
192.168.0.2 club-ic
192.168.0.3 cluc-ic

cluster {
	name = "cluster"
	lock_gulm {
			servers = ["clua", "club", "cluc"]
	}
}

nodes {
	clua {
		ip_interfaces {
			eth1 = "192.168.0.1"
		}
		usedev="eth1"
		fence {
			iLO {
				clua-ilo {
					action="reboot"
				}
			}
		}
	}

	club {
		ip_interfaces {
			eth1 = "192.168.0.2"
		}
		usedev="eth1"
		fence {
			iLO {
				club-ilo {
					action="reboot"
				}
			}
		}
	}

	cluc {
		ip_interfaces {
			eth1 = "192.168.0.3"
		}
		usedev="eth1"
		fence {
			iLO {
				cluc-ilo {
					action="reboot"
				}
			}
		}
	}

}

fence_devices {
	clua-ilo {
		agent="fence_ilo"
		hostname = "10.0.0.10"
		login = "xxxxx"
		passwd = "yyyyyy"
	}
	club-ilo {
		agent="fence_ilo"
		hostname = "10.0.0.11"
		login = "xxxxx"
		passwd = "yyyyyy"
	}
	cluc-ilo {
		agent="fence_ilo"
		hostname = "10.0.0.12"
		login = "xxxxx"
		passwd = "yyyyyy"
	}

}

when I run lock_gulmd -C against this config (after doing a ccs_tool create)
I get the following output

cluster {
	name="cluster"
	lock_gulm {
		heartbeat_rate = 15.000
		allowed_misses = 2
		coreport = 40040
		new_connection_timeout = 15.000
		# server cnt: 3
		# servers = ["clua", "club", "cluc"]
		servers = ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
		lt_partitions = 1
		lt_base_port = 41040
		lt_high_locks = 1048576
		lt_drop_req_rate = 10
		prealloc_locks = 90000
		prealloc_holders = 130000
		prealloc_lkrqs = 60
		ltpx_port = 40042
	}
}

So, the question is, what am I doing blatently wrong? The docs seem fairly
simple
but this is just not working for me...

Any suggestions would be appreciated and acted upon much quicker this time.

Thanks

Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Conrad Tadpol
Tilstra
Sent: Tuesday, June 07, 2005 11:41 AM
To: linux clustering
Subject: Re: [Linux-cluster] usedev directive not working correctly?

On Tue, Jun 07, 2005 at 11:03:36AM -0400, Kovacs, Corey J. wrote:
> Sorry about the previous message subject, too lazy to type the address 
> and didn't change the subject.
> 
> On one hand I agree with that, however I've gone as far as to set up 
> static routes for the addresses and lock_gulmd won't start at all 
> since it can't talk to the other lock servers at all. As I said in the 
> original message, 'gulm_tool nodelist node1' reports that the lock 
> manager on node3 is NOT using the directed interface but node2 and node1
are.

maybe.  but it really looks like the gulm on node3 is miss-configuring
itself.  So look in syslog when you start lock_gulmd on that node, it prints
what it thinks the hostname and ip is.  If its picking hte wrong one there,
gulm is reading the config wrong.  You can run lock_gulmd with the -C option,
and it will just parse config data and dump it out in /tmp/Gulm_config.??????
(the ? will be random chars.)  Look at that to see if it looks like what
you've configured.

And, I'd like to see the complete nodes.ccs, if you don't mind.

--
Michael Conrad Tadpol Tilstra
Push to test.  <click>  Release to detonate.