[linux-lvm] Snapshots on clustered LVM

Wed Aug 26 16:35:39 UTC 2015

On 25/08/15 06:09 AM, Bram Klein Gunnewiek wrote:
> Currently we are using LVM as backing storage for our DRBD disks in HA
> set-ups. We use QEMU instances on our node's using (local) DRBD targets
> for storage. This enables us to do live migrations between the DRBD
> primary/secondary nodes.
> 
> We want to support iSCSI targergets in our HA enviroment. We are trying
> to see if we can use (c)lvm for that by creating a volume group of our
> iSCSI block devices and use that volume group on all nodes to create
> logical volumes. This seems to work fine if we handle locking etc
> properly and make sure we only activate the logical volumes on one node
> at a time. As long as we only have a volume active on one node snapshots
> seem to work fine also.

DRBD, like an iSCSI LUN, is just another block device to LVM. So I see
no reason why clvmd won't work just fine. Main advantage is that you can
scale iscsi to 3+ nodes, but you lose data being replicated unless you
have a very nice SAN.

Once the LV is visible on all nodes though, it's up to you to make sure
they're used by apps/fses that understand clustering. I use clustered
LVs to back gfs2 and to back VMs (LV dedicated to a VM, and the cluster
resource manager ensures that a VM is only on one node at a time).

> However, we run into problems when we want to perform a live migration
> of a running QEMU instance. In order to do a live migration we have to
> start a second similar QEMU on the node we want to migrate to and start
> a QEMU live migration. In order for us to do that we have to make the
> logical volume active on the target node otherwise we can't start the
> QEMU instance. During the live migration QEMU ensures that data is only
> written on one node (e.g. during the live migration data will be written
> on the source node, QEMU wil then pause the instance for a short while
> when copying the last data and will then continue the instance on the
> target node).

If you're using clustered LVM, live migration will work just fine. This
is exactly what I do. The LV will need to be ACTIVE on both nodes though.

> This use case works fine with a clustered LVM set-up except for
> snapshots. Changes are not saved in the snapshot when the logical volume
> is active on both nodes (as expected if the manual is correct:
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).

Note that your link is very old, for RHEL 5.

Snapshotting is a problem. As Zdenek said, you have to set the other
nodes to inactive and then set the current host node's LV to
'exclusive'. Trick I found though was that you can't mark it as
exclusive while it's ACTIVE, and you can't make the LV inactive while
it's hosting a VM... So in practical terms, snapshotting clustered LVs
is not feasible.

> If we are correct it means we can use lvm for as clustered "file system"
> but can't trust our snapshots to be 100% reliable if a volume group has
> been made active on more then one node. E.G. when doing a live migration
> between two nodes of a QEMU instance our snapshots become unreliable.

You can never trust a snapshot 100%; It doesn't capture information in
the VM's memory. So at best, using a snapshot to recover is like
recovering from a sudden power loss. It's then up to your apps and OS to
recover, and that's not always the case with many DBs, unless they're
carefully configured.

This is the core reason why our company won't support snapshots at all.
It gives people a false sense of having good backups.

> Are these conclusions correct? Is there a solution for this problem or
> is this simply a known limitation of clustered lvm without a work-around?

Clustered LVs over a SAN-backed PV will work perfectly fine for live
migrations. Snapshots are not feasible though, and not recommended in
any case.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?