[Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1

Harri Päiväniemi harri.paivaniemi at tietoenator.com
Thu Apr 17 09:58:10 UTC 2008


Yes, something is wrong.

I made little bit more research:

- stop cluster daemons on both nodes (node a & node b)

- start cluster on node a

It hangs 5 minutes on cman's fencing part like this:


Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing...

... and in process list there is this:

/sbin/fence_tool -w -t 300 join


... so thats the 5 minutes.

Question is: why it waits there 54 minutes?

- after 5 minutes waiting, node a says:

Starting fencing... failed

                                                           [FAILED]
Starting the Quorum Disk Daemon:                           [  OK  ]
Starting Cluster Service Manager:                          [  OK  ]

... and then it loads qdiskd and after a while it has 2 votes and it
starts services normally and voila, I have a running cluster with one
node:

Node  Sts   Inc   Joined               Name
   0   M      0   2008-04-17 12:51:01  /dev/sda
   1   M   1356   2008-04-17 12:45:44  areenasql1
   2   X      0                        areenasql2



[root at areenasql1 ~]# cman_tool status
Version: 6.0.1
Config Version: 4
Cluster Name: areena_sql
Cluster Id: 39330
Cluster Member: Yes
Cluster Generation: 1356
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3
Total votes: 2
Quorum: 2
Active subsystems: 8
Flags:
Ports Bound: 0 177
Node name: areenasql1
Node ID: 1
Multicast addresses: 239.192.153.60
Node addresses: 10.1.1.178



But log says nothing about that failed fencing. Fencing is configured
correctly, I use HP ILO and everything is ok. Fencing works in running
cluster ok, both nodes can fence eachother.

Node a should fence node b in this situation and maby it's trying to do
it somehow, but it logs nothing. It should log at least "fence failed
etc." if it's unable to fence node b...

And what's more important, if we think node a can't fence node b in this
startup situation, it should NOT start services but it starts....

-hjp














On Thu, 2008-04-17 at 11:32 +0200, jr wrote:
> Am Donnerstag, den 17.04.2008, 12:28 +0300 schrieb Harri Päiväniemi:
> > Well,
> > 
> > I don't have any mistakes with firewalls, hosts, names, ip's etc. This
> > is a fact. Communication itself works. Maby it sounds strange when I say
> > I don't have mistakes, but this time it's true ;)
> > 
> > In this case cluster should gain quorum and start running services on
> > node a (it has 2 votes (node-vote + qdisk-vote).
> > 
> > It should fence node b first, because it doesn't know where it is.
> > 
> > So this behaviour is wrong.
> > 
> > -hjp
> 
> i think something is wrong here, like the expected votes or similiar. if
> the one node had 2 votes and those were the expected votes, it would
> maintain quorum and thus fence the other node. that connection refused
> error seems to say that that node doesn't have the quorum nonetheless.
> can you confirm that? (clustat should show you if that node is quorate
> or not)
> regards,
> johannes
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list