[Linux-cluster] some questions about setting up GFS

Wed Jan 12 14:10:40 UTC 2005

Hello!

> It looks like you are not using pool.

Thanks, I've guided by your examples, so raid can be mounted.

Now I have some questions about Cluster Configuration System Files.

I have 2 nodes - hp1 and hp2. Any of nodes have Integrated Lights-Out
with ROM Version: 1.55 - 04/16/2004.

Since I have only 2 nodes one of them has to be master, but if first
of them (master) is correctly shut down, slave experiencing
serious problems which can be solved by resetting. Is it all right?
How to make it right?

I tried to make servers = ["hp1","hp2","hp3"] (hp3 is really absent),
then if master is shut down second node became master. So, if
nodes are alternately correctly shut down and boot up master is
switching from one to another and everything seems ok, but if one of
the nodes is shut down incorrectly (e.g. power cord is pulled out of
socket), this have written in systemlog:

Jan 12 14:44:33 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530273952756 mb:1)
Jan 12 14:44:48 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530288972780 mb:2)
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: hp2 missed a heartbeat (time:1105530303992751 mb:3)
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Client (hp2) expired
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Core lost slave quorum. Have 1, need 2. Switching to Arbitrating.
Jan 12 14:45:03 hp1 lock_gulmd_core[6614]: Gonna exec fence_node hp2
Jan 12 14:45:03 hp1 lock_gulmd_core[6500]: Forked [6614] fence_node hp2 with a 0 pause.
Jan 12 14:45:03 hp1 fence_node[6614]: Performing fence method, riloe, on hp2.
Jan 12 14:45:04 hp1 fence_node[6614]: The agent (fence_rib) reports:
Jan 12 14:45:04 hp1 fence_node[6614]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
option "ipaddr=10.10.0.112"

If start again service lock_gulm on the second node, then on first
node this have written in systemlog:

Jan 12 14:50:14 hp1 lock_gulmd_core[7148]: Gonna exec fence_node hp2
Jan 12 14:50:14 hp1 fence_node[7148]: Performing fence method, riloe, on hp2.
Jan 12 14:50:14 hp1 fence_node[7148]: The agent (fence_rib) reports:
Jan 12 14:50:14 hp1 fence_node[7148]: WARNING!  fence_rib is deprecated.  use fence_ilo instead parse error: unknown
option "ipaddr=10.10.0.112"
Jan 12 14:50:14 hp1 fence_node[7148]:
Jan 12 14:50:14 hp1 fence_node[7148]: All fencing methods FAILED!
Jan 12 14:50:14 hp1 fence_node[7148]: Fence of "hp2" was unsuccessful.
Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Fence failed. [7148] Exit code:1 Running it again.
Jan 12 14:50:14 hp1 lock_gulmd_core[6500]: Forked [7157] fence_node hp2 with a 5 pause.
Jan 12 14:50:15 hp1 lock_gulmd_core[6500]:  (10.10.0.201:hp2) Cannot login if you are expired.

And I can't umount GFS file system and can't reboot systems
because GFS is mounted, only reset both nodes.

I think I have mistakes in my configuration, may be it is because
incorrect agent = "fence_rib" or something else.

Please help :-)

Cluster Configuration:

cluster.ccs:
cluster {
         name = "cluster"
         lock_gulm {
             servers = ["hp1"]    (or servers = ["hp1,"hp2","hp3"])
         }
}

fence.ccs:
fence_devices {
                ILO-HP1 {
                        agent = "fence_rib"
                        ipaddr = "10.10.0.111"
                        login = "xx"
                        passwd = "xx"
                        }
                ILO-HP2 {
                        agent = "fence_rib"
                        ipaddr = "10.10.0.112"
                        login = "xx"
                        passwd = "xx"
                        }
            }

nodes.ccs:
nodes {
      hp1 {
          ip_interfaces { eth0 = "10.10.0.200" }
          fence { riloe { ILO-HP1 { localport = 17988 } } }
          }
      hp2 {
          ip_interfaces { eth0 = "10.10.0.201" }
          fence { riloe { ILO-HP2 { localport = 17988 } } }
          }
# if 3 nodes in cluster.ccs
#      hp3 {
#          ip_interfaces { eth0 = "10.10.0.201" }
#          fence { riloe { ILO-HP2 { localport = 17988 } } }
#          }

Thanks a lot anyway!
-- 
 Sergey