[linux-lvm] Snapshots on clustered LVM

Wed Aug 26 12:44:13 UTC 2015

Dne 26.8.2015 v 14:22 Bram Klein Gunnewiek napsal(a):
> On 08/26/2015 12:59 PM, Zdenek Kabelac wrote:
>> Dne 25.8.2015 v 12:09 Bram Klein Gunnewiek napsal(a):
>>> Currently we are using LVM as backing storage for our DRBD disks in HA
>>> set-ups. We use QEMU instances on our node's using (local) DRBD targets for
>>> storage. This enables us to do live migrations between the DRBD
>>> primary/secondary nodes.
>>>
>>> We want to support iSCSI targergets in our HA enviroment. We are trying to see
>>> if we can use (c)lvm for that by creating a volume group of our iSCSI block
>>> devices and use that volume group on all nodes to create logical volumes. This
>>> seems to work fine if we handle locking etc properly and make sure we only
>>> activate the logical volumes on one node at a time. As long as we only have a
>>> volume active on one node snapshots seem to work fine also.
>>>
>>> However, we run into problems when we want to perform a live migration of a
>>> running QEMU instance. In order to do a live migration we have to start a
>>> second similar QEMU on the node we want to migrate to and start a QEMU live
>>> migration. In order for us to do that we have to make the logical volume
>>> active on the target node otherwise we can't start the QEMU instance. During
>>> the live migration QEMU ensures that data is only written on one node (e.g.
>>> during the live migration data will be written on the source node, QEMU wil
>>> then pause the instance for a short while when copying the last data and will
>>> then continue the instance on the target node).
>>>
>>> This use case works fine with a clustered LVM set-up except for snapshots.
>>> Changes are not saved in the snapshot when the logical volume is active on
>>> both nodes (as expected if the manual is correct:
>>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html-single/Logical_Volume_Manager_Administration/#snapshot_volumes).
>>>
>>>
>>>
>>> If we are correct it means we can use lvm for as clustered "file system" but
>>> can't trust our snapshots to be 100% reliable if a volume group has been made
>>> active on more then one node. E.G. when doing a live migration between two
>>> nodes of a QEMU instance our snapshots become unreliable.
>>>
>>> Are these conclusions correct? Is there a solution for this problem or is this
>>> simply a known limitation of clustered lvm without a work-around?
>>
>> Yes - snapshots are supported ONLY for exclusively actived volumes (means LV
>> with snapshot is active only on a single node in cluster).
>>
>> There is no dm target which would support clustered usage of snapshots.
>>
>> Zdenek
>>
>
> Thanks for the confirmation. It's a pitty we can't get this done with LVM ...
> we will try to find an alternative.
>
> Out of curiosity, how does a node know the volume is opened at another node?
> In our test set-up we don't use CLVM or anything (we are just testing), so
> there is no communication between the nodes. Is this done through meta data in
> the volume group / logical volume?

I've no idea what you are using then - I'm clearly talking only about lvm2 
solution which is ATM based on clvmd usage (there is now integrated support 
for another locking manager - sanlock)

If you are using some other locking mechanism - it's then purely up-to-you to 
maintain integrity of the whole system - i.e. ensuring there are not multiple 
metadata writes from various nodes or where and how are the LVs activated.

Also there are already existing solutions for what you describe, but I assume 
you prefer your own home-brewed solution - but it's long journey ahead of you...

Zdenek