[linux-lvm] Snapshots on clustered LVM

Wed Aug 26 16:23:27 UTC 2015

On 26/08/15 08:22 AM, Bram Klein Gunnewiek wrote:
> On 08/26/2015 12:59 PM, Zdenek Kabelac wrote:
>> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
>>> Currently we are using LVM as backing storage for our DRBD disks in HA
>>> set-ups. We use QEMU instances on our node's using (local) DRBD
>>> targets for
>>> storage. This enables us to do live migrations between the DRBD
>>> primary/secondary nodes.
>>>
>>> We want to support iSCSI targergets in our HA enviroment. We are
>>> trying to see
>>> if we can use (c)lvm for that by creating a volume group of our iSCSI
>>> block
>>> devices and use that volume group on all nodes to create logical
>>> volumes. This
>>> seems to work fine if we handle locking etc properly and make sure we
>>> only
>>> activate the logical volumes on one node at a time. As long as we
>>> only have a
>>> volume active on one node snapshots seem to work fine also.
>>>
>>> However, we run into problems when we want to perform a live
>>> migration of a
>>> running QEMU instance. In order to do a live migration we have to
>>> start a
>>> second similar QEMU on the node we want to migrate to and start a
>>> QEMU live
>>> migration. In order for us to do that we have to make the logical volume
>>> active on the target node otherwise we can't start the QEMU instance.
>>> During
>>> the live migration QEMU ensures that data is only written on one node
>>> (e.g.
>>> during the live migration data will be written on the source node,
>>> QEMU wil
>>> then pause the instance for a short while when copying the last data
>>> and will
>>> then continue the instance on the target node).
>>>
>>> This use case works fine with a clustered LVM set-up except for
>>> snapshots.
>>> Changes are not saved in the snapshot when the logical volume is
>>> active on
>>> both nodes (as expected if the manual is correct:
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).
>>>
>>>
>>>
>>> If we are correct it means we can use lvm for as clustered "file
>>> system" but
>>> can't trust our snapshots to be 100% reliable if a volume group has
>>> been made
>>> active on more then one node. E.G. when doing a live migration
>>> between two
>>> nodes of a QEMU instance our snapshots become unreliable.
>>>
>>> Are these conclusions correct? Is there a solution for this problem
>>> or is this
>>> simply a known limitation of clustered lvm without a work-around?
>>
>> Yes - snapshots are supported ONLY for exclusively actived volumes
>> (means LV with snapshot is active only on a single node in cluster).
>>
>> There is no dm target which would support clustered usage of snapshots.
>>
>> Zdenek
>>
> 
> Thanks for the confirmation. It's a pitty we can't get this done with
> LVM ... we will try to find an alternative.
> 
> Out of curiosity, how does a node know the volume is opened at another
> node? In our test set-up we don't use CLVM or anything (we are just
> testing), so there is no communication between the nodes. Is this done
> through meta data in the volume group / logical volume?

Clustered LVM uses DLM. You can see which nodes are using a given lock
space with 'dlm_tool ls'. When a node joins or leaves, it joins or
leaves whatever lock spaces it's has resources using.

A nodes doesn't have to be actively using a resource, but if it's in a
cluster, it needs to coordinate with the other nodes, even if just to
say "I ACK the changes" or"I'm not using the resources" when
coordinating locks.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?