[libvirt-users] VM's in a HA-configuration - synchronising vm config files

Wed Mar 2 23:51:29 UTC 2016

Good evening,

The shared-nothing sync of local files/directories within a cluster is one 
of my favorite topics about failures and data loss.. :)

This kind of thing is usually best achieved by an external configuration 
management tool (puppet, chef, whatever, etc..). or by an external storage 
system..

If you really are determined to do this, then please considering the 
following items:

1) when host1 will go down, host2 will take over and restart your VMs just 
fine (I hope).
After fixing host1, you bring it up and the cron host1->host2 kicks 
in, overwriting the live files on host2 with those from the just-salvaged 
host1 (might be corrupt, outdated or gone).

That's why you need to make the 'scheduler' follow the service group of 
your VM's (it must only run where the vg is active and imported).

2) What if you need to bring host1 up with networking after its crash?
Perhaps you don't want to have host2 sync -back- the VMs to host1 when 
you're not ready.

3) even if you solve [1] and [3], a manual failover from host2 to host1 
during the day might leave some corrupted files if there's an rsync in 
progress (if it hasn't finished yet).

If the quantity of data you are syncing is low or if you are very careful, 
you might avoid hitting [3] but once the duration of the 'rsync' processes 
becomes noticeable you're increasing your risk (I once had a few hundreds 
of GB's suddenly re-appear because a failover had occured).

So when I had to do something like this (and it wasn't even safe in all 
cases), it was by using 'lsyncd' with inotify and rsync in an HA 
configuration (the 'lsyncd' process would follow the main process service 
group) where the rsync's spawned by lsyncd would only attempt to reach out 
to the peer's service IP. Those IPs were brought up by the clustering 
software on each of the node so they could only be up if the recovered 
node was fully up.

The HA config prevents [1] and the service IPs brought up by the 
clustering software prevent [2]. Avoiding [3] is a matter of luck.

That being said, having a shared-nothing system for keeping local files in 
sync within an HA cluster has some good uses.. :)

Kind regards and good luck,

Vincent

On Wed, 2 Mar 2016, Lentes, Bernd wrote:

> Hi,
> 
> i'd like to establish a HA-Cluster with two nodes. My services will run inside vm's, the vm's are stored on a FC SAN, so every host has access to the vm's. But how can i keep
> the config files (xml-files under /etc/libvirt/qemu) synchronised ? Is there a possibility to store the config files somewhere else ? E.g. a partitition with ocfs2 on the SAN
> ?
> If not, what would you do ? Otherweise i'm thinking of a cron-job who synchronises the file each minute with rsync.
> 
> 
> Bernd
> 
> --
> Bernd Lentes
> 
> Systemadministration
> institute of developmental genetics
> Gebäude 35.34 - Raum 208
> HelmholtzZentrum München
> bernd.lentes at helmholtz-muenchen.de
> phone: +49 (0)89 3187 1241
> fax: +49 (0)89 3187 2294
> 
> Wer Visionen hat soll zum Hausarzt gehen
> Helmut Schmidt
> 
> 
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Alfons Enhsen, Renate Schlusen (komm.)
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
>