[Linux-cluster] share experience migrating cluster suite from centos 5.3 to centos 5.4

Tue Nov 3 00:26:05 UTC 2009

On 02/11/2009 21:29, Gianluca Cecchi wrote:
>
> On Mon, Nov 2, 2009 at 6:25 PM, David Teigland <teigland at redhat.com
> <mailto:teigland at redhat.com>> wrote:
>
>
>     The out-of-memory should be fixed in 5.4:
>
>     https://bugzilla.redhat.com/show_bug.cgi?id=508829
>
>     The fix for dlm_send spinning is not released yet:
>
>     https://bugzilla.redhat.com/show_bug.cgi?id=521093
>
>     Dave
>
>
> Thank you so much for the feedback.
> So I have to expect this freeze and possible downtime...... also if my
> real nmodes  a safer method could be this one below for my two nodes +
> quorum disk cluster?
> 1) shutdown and restart in single user mode of the passive node
> So now the cluster is composed of only one node in 5.3 without loss of
> service, at the moment
> 2) start network and update the passive node (as in steps of the first mail)
> 3) reboot in single user mode of the just updated node, and test correct
> funcionality (without cluster)
> 4) shutdown again of the just updated node
> 5) shutdown of the active node --- NOW we have downtime (planned)
> 6) startup of the updated node, now in 5.4 (and with 508829 bug corrected)
> This node should form the cluster with 2 votes, itself and the quorum,
> correct?
>
> 7) IDEA: make a dummy update to the config on this new running node,
> only incrementing version number by one, so that after, when the other
> node comes up, it gets the config....
> Does it make sense or no need/no problems for this when the second node
> will join?
>
> 8) power on in single user mode of the node still in 5.3
> 9) start network on it and update system as in steps 2)
> 10) reboot the just updated node and let it start in single user mode to
> test its functionality (without cluster enabled)
> 11) reboot again and let it normally join the cluster
>
> Expected result: correct join of the cluster, correct?
>
> 12) Test a relocation of the service  ----- NOW another little downtime,
> but to be sure that in case of need we get relocation without problems
>
> I'm going to test this tomorrow (here half past ten pm now) after
> restore of initial situation with both in 5.3, so if there are any
> comments, they are welcome..

FWIW, I just updated one of my DRBD+GFS clusters from 5.3 (and early 5.3 
at that) to 5.4 with a rolling re-start, and it "just worked". It's a 
2-node cluster with a shared GFS root, and I updated it, rebuilt the 
initrd, rebooted one node, which came up and rejoined find, then 
rebooted the other. No service downtime.

Gordan