[Linux-cluster] share experience migrating cluster suite from centos 5.3 to centos 5.4

Wed Nov 4 05:33:19 UTC 2009

One problem with the below workflow.

7. Your going to need to copy this over manually otherwise it will fail,
I've fallen victim of this before. All cluster nodes need to start on
the current revision of the file before you update it. I think this is a
chicken and egg problem.

One of this things I have configured on my clusters is that all
clustered services start on it's own runlevel, in my case I have cluster
services running on runlevel 3 but default boot to runlelvel 2.  This
allows a node to boot up and get network before racing into the cluster
(ideal for wanting to find out why it got fenced and solving the
problem).

Everything else will work as I've just done this myself (except 5
nodes). Your downtime should be quite minimal.

Regards,

Peter Tiggerdine
HPC & eResearch Specialist
High Performance Computing Group
Information Technology Services
University of Queensland
Phone: +61 7 3346 6634
  Fax: +61 7 3346 6630
Email: peter.tiggerdine at uq.edu.au

________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Gianluca Cecchi
Sent: Tuesday, 3 November 2009 7:29 AM
To: David Teigland
Cc: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] share experience migrating cluster suite
from centos 5.3 to centos 5.4

On Mon, Nov 2, 2009 at 6:25 PM, David Teigland <teigland at redhat.com>
wrote:

	The out-of-memory should be fixed in 5.4:

	https://bugzilla.redhat.com/show_bug.cgi?id=508829

	The fix for dlm_send spinning is not released yet:

	https://bugzilla.redhat.com/show_bug.cgi?id=521093

	Dave

Thank you so much for the feedback.
So I have to expect this freeze and possible downtime...... also if my
real nmodes  a safer method could be this one below for my two nodes +
quorum disk cluster?
1) shutdown and restart in single user mode of the passive node
So now the cluster is composed of only one node in 5.3 without loss of
service, at the moment
2) start network and update the passive node (as in steps of the first
mail)
3) reboot in single user mode of the just updated node, and test correct
funcionality (without cluster)
4) shutdown again of the just updated node
5) shutdown of the active node --- NOW we have downtime (planned)
6) startup of the updated node, now in 5.4 (and with 508829 bug
corrected)
This node should form the cluster with 2 votes, itself and the quorum,
correct?

7) IDEA: make a dummy update to the config on this new running node,
only incrementing version number by one, so that after, when the other
node comes up, it gets the config....
Does it make sense or no need/no problems for this when the second node
will join?

8) power on in single user mode of the node still in 5.3
9) start network on it and update system as in steps 2)
10) reboot the just updated node and let it start in single user mode to
test its functionality (without cluster enabled)
11) reboot again and let it normally join the cluster

Expected result: correct join of the cluster, correct?

12) Test a relocation of the service  ----- NOW another little downtime,
but to be sure that in case of need we get relocation without problems

I'm going to test this tomorrow (here half past ten pm now) after
restore of initial situation with both in 5.3, so if there are any
comments, they are welcome.. 

Thanks
Gianluca

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091104/2af7aa6d/attachment.htm>