[Linux-cluster] some questions about setting up GFS
Michael Conrad Tadpol Tilstra
mtilstra at redhat.com
Wed Jan 12 14:49:05 UTC 2005
On Wed, Jan 12, 2005 at 05:10:40PM +0300, Sergey wrote:
> Hello!
>
> > It looks like you are not using pool.
>
> Thanks, I've guided by your examples, so raid can be mounted.
>
> Now I have some questions about Cluster Configuration System Files.
>
> I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
> with ROM Version: 1.55 - 04/16/2004.
>
> Since I have only 2 nodes one of them has to be master, but if first
> of them (master) is correctly shut down, slave experiencing
> serious problems which can be solved by resetting. Is it all right?
> How to make it right?
>
> I tried to make servers = ["hp1","hp2","hp3"] (hp3 is really absent),
> then if master is shut down second node became master. So, if
The nodes in the servers config line for gulm form a mini-cluster of
sorts. There must be quorum (51%) of nodes present in this mini-cluster
for things to continue.
You must have two of the three servers up and running so that the
mini-cluster has quorum, which then will alow the other nodes to
connect.
> nodes are alternately correctly shut down and boot up master is
> switching from one to another and everything seems ok, but if one of
> the nodes is shut down incorrectly (e.g. power cord is pulled out of
> socket), this have written in systemlog:
>
> Jan 12 14:44:33 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530273952756 mb:1)
> Jan 12 14:44:48 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530288972780 mb:2)
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530303992751 mb:3)
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Client (hp2) expired
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Core lost slave quorum. Have 1, need 2. Switching to Arbitrating.
> Jan 12 14:45:03 hp1 lock_gulmd_core[6614]: Gonna exec fence_node hp2
> Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Forked [6614] fence_node hp2 with a 0 pause.
> Jan 12 14:45:03 hp1 fence_node[6614]: Performing fence method, riloe, on hp2.
> Jan 12 14:45:04 hp1 fence_node[6614]: The agent (fence_rib) reports:
> Jan 12 14:45:04 hp1 fence_node[6614]: WARNING! fence_rib is deprecated. use fence_ilo instead parse error: unknown
> option "ipaddr=10.10.0.112"
>
> If start again service lock_gulm on the second node, then on first
> node this have written in systemlog:
>
> Jan 12 14:50:14 hp1 lock_gulmd_core[7148]: Gonna exec fence_node hp2
> Jan 12 14:50:14 hp1 fence_node[7148]: Performing fence method, riloe, on hp2.
> Jan 12 14:50:14 hp1 fence_node[7148]: The agent (fence_rib) reports:
> Jan 12 14:50:14 hp1 fence_node[7148]: WARNING! fence_rib is deprecated. use fence_ilo instead parse error: unknown
> option "ipaddr=10.10.0.112"
> Jan 12 14:50:14 hp1 fence_node[7148]:
> Jan 12 14:50:14 hp1 fence_node[7148]: All fencing methods FAILED!
> Jan 12 14:50:14 hp1 fence_node[7148]: Fence of "hp2" was unsuccessful.
> Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Fence failed. [7148] Exit code:1 Running it again.
> Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Forked [7157] fence_node hp2 with a 5 pause.
> Jan 12 14:50:15 hp1 lock_gulmd_core[6500]: (10.10.0.201:hp2) Cannot login if you are expired.
The node hp2 has to be successfully fenced before it is allowed to
re-join the cluster. If your fencing is misconfigured or not working, a
fenced node will never get to rejoin.
You really should test that fencing works by running
fence_node <node name> for each node in your cluster before running
lock_gulmd. This makes sure that fencing is setup and working
correctly.
Do that, and once you've verified that fencing is correct (without
lock_gulmd running) try things again with lock_gulmd.
> And I can't umount GFS file system and can't reboot systems
> because GFS is mounted, only reset both nodes.
>
> I think I have mistakes in my configuration, may be it is because
> incorrect agent = "fence_rib" or something else.
>
> Please help :-)
>
>
> Cluster Configuration:
>
> cluster.ccs:
> cluster {
> name = "cluster"
> lock_gulm {
> servers = ["hp1"] (or servers = ["hp1,"hp2","hp3"])
> }
> }
>
> fence.ccs:
> fence_devices {
> ILO-HP1 {
> agent = "fence_rib"
> ipaddr = "10.10.0.111"
> login = "xx"
> passwd = "xx"
> }
> ILO-HP2 {
> agent = "fence_rib"
> ipaddr = "10.10.0.112"
> login = "xx"
> passwd = "xx"
> }
> }
>
> nodes.ccs:
> nodes {
> hp1 {
> ip_interfaces { eth0 = "10.10.0.200" }
> fence { riloe { ILO-HP1 { localport = 17988 } } }
> }
> hp2 {
> ip_interfaces { eth0 = "10.10.0.201" }
> fence { riloe { ILO-HP2 { localport = 17988 } } }
> }
> # if 3 nodes in cluster.ccs
> # hp3 {
> # ip_interfaces { eth0 = "10.10.0.201" }
> # fence { riloe { ILO-HP2 { localport = 17988 } } }
> # }
--
Michael Conrad Tadpol Tilstra
Hi, I'm an evil mutated signature virus, put me in your .sig or I will
bite your kneecaps!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050112/855c8f24/attachment.sig>
More information about the Linux-cluster
mailing list