[Linux-cluster] Xen / Redhat cluster : 5.3 to 5.4 upgrade question and best practice

Sat Feb 13 05:50:23 UTC 2010

> I need to upgrade a Xen cluster using RH servers from 5.3 to 5.4.
>
> It is a 3 nodes cluster (+qdisk) and 1 luci server for management.
>
> All VM are in a GFS2 FS in /home/domU/FQDN folders.
>
> Each FQDN folder contains :
> -The xen config file
> -the VM FQDN.img  file

We run an apparently very similar setup, based on CentOS.
A difference may well be that /etc/xen contains symbolic links to the VM
config files. The files themselves are stored in the same location as
the *.img files in the GFS2 file system.

> I read about some issues with rgmanager from 5.4 (unable to create VM
> config files that were not in /etc/xen). This bug has been fixed in
> rgmanager-2.0.52-1.el5_4.1
> Can I safely apply the update without modifying any config made with
> 5.3 ? Do I need to tweak the cluster.conf between 5.3 and 5.4 ?

As Paras mentioned before, we added in cluster.conf for each VM service
use_virsh="0", but we also added max_restarts="0" and
restart_expire_time="0".

> Also,
> Is it better to upgrade the luci server before the nodes ?

Did not do that.
It is my understanding that you can do everything by modifying
cluster.conf directly on the nodes as well. As a precaution, we always
increased the version number manually and saved all cluster.conf within
a short time interval.

> I am also curious about the nodes, what is the best practice : moving
> VM , removing the node from the cluster, upgrading and then reboot,
> see if everything is fine and go for the next one after that ? Can I
> stay with a mix of 5.3 and 5.4 for several days ?

After a 5.3 update went all wrong, we had set to set vm autostart="0"
and start them from the nodes directly. Luci then showed all VM services
as disabled (they were running nevertheless).

Our upgrade path went like this:
The separate server running Luci was upgraded long time ago (independently).

Upgraded each VM by simple "yum update" and shut them down properly from
SSH terminal ("poweroff").
Afterward, the corresponding node was upgraded by "yum update" and then
rebooted.
The VMs on this node were then manually restarted (note that at this
point cluster.conf still contained vm autostart="0").
Repeated the procedure with the next node.

At this point, we started all VM services from the nodes (not through Luci).

Up to here, Luci still considered the VM services as disabled. To
re-integrate:
Shut them down properly from SSH terminal.
Modify each cluster.conf by adding use_virsh="0", max_restarts="0" and
restart_expire_time="0". Also changed vm autostart="0" to vm
autostart="1". (Do not forget to increase version number at the
beginning of cluster.conf.)
Then went to Luci and the corresponding VM service was listed afterwards
(again) as available. They were not running, though. Afterwards, we used
Luci command "Enable" for each VM service.
Unfortunately, I don't recall whether the VM was started automatically
at this point or if we had to restart it separately from Luci. Anyway,
the communication Luci <-> cluster worked again. Even live migration (as
ordered through Luci) worked flawlessly.

Our update 5.3 -> 5.4 was done about a week ago, so it seemed to me as
if the present state of CentOS 5.4 is stable and performing well.

Final remarks:
I understood from various sources that it seems to be advisable to have
nodes and VMs running at the same OS level, at least at the same kernel
version.
Luci page "Storage" still fails. Seems to be a timeout problem.
We came to consider Luci as a nice tool, but found it very reassuring to
know how to perform the corresponding actions also directly from
terminal on the nodes (i. e. by "xm ..." commands).

Regards,
Wolf