[linux-lvm] lvconvert --merge invalidating snapshot
luke.bigum at lmax.com
Fri May 5 13:09:34 UTC 2017
We're running in to an issue with merging snapshots and need some direction on how to debug further. We have some jobs in Jenkins that are repeatedly creating and merging snapshots. After several consecutive runs they fail inside the lvconvert command. Sometimes this is 3 runs, sometimes it's hundreds, but eventually it will fail. This is on CentOS 6.8 with lvm2-2.02.143-7.el6.x86_64, but I can reproduce the same problem when upgrading to lvm2-2.02.143-12.el6.x86_64 from 6.9. The error is:
journal: Merging snapshot invalidated. Aborting merge.
Which is thrown by lvconvert during it's progress polling code when it gets back -1 (DM_PERCENT_INVALID) as the merge percentage. The snapshot is healthy before merging, there are very few (if any) block changes in the LV when our Jenkins jobs are running, and the snapshot and source volume are quite small (few hundred MiB). I've managed to reproduce the issue outside of the Jenkins tests with this Bash loop. It generally takes 15-30 minutes of running this to fail:
while [[ 1 ]]; do
/sbin/lvremove -f vg_os/journal_reserved_snap
/sbin/lvcreate -s -n journal_snap -L 160.00m vg_os/journal
/sbin/lvconvert --merge -i 5 vg_os/journal_snap
/sbin/lvcreate -L 160.00m -n vg_os/journal_reserved_snap
When the merge fails, the snapshot is left in the merging state but invalid, and oddly 100% full:
# lvs --all
[journal_snap] vg_os Swi-I-s--- 160.00m journal 100.00
Device-Mapper says this about the merge:
May 5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Cancelling snapshot handover.
May 5 10:45:55 lddev-build-scotty04 kernel: device-mapper: snapshots: Snapshot is invalid: can't merge
And the *-cow and *-real DMs still exist:
# dmsetup ls | grep journal
I can clean up the snapshot with 'lvremove' and start the process all over again. I can also reproduce the problem on bare metal hardware and in a KVM instance (not that it should make a difference).
I'm at a bit of a loss on how to debug this any further. I've done a little bit of experimenting with rolling back metadata changes to before the merge, but I don't really know what I'm looking for, and I generally always end up locking up Device-Mapper in some way and having to reboot :-)
Can anyone suggest a way forward here?
As an aside, while ruling out possible causes I tried the same Bash loop on a CentOS 6.8 machine with a non-standard kernel by accident. The result is different; it never fails to merge, but the LVM operations start really fast and get slower and slower overnight. When it would take less than a second to complete a loop first it was taking 30+ seconds in the end, and giving an interesting message about reserved memory. A resource leak? I mention it just in case it's sheds light on the first problem, I don't really expect you to help me when using our custom kernel :-)
Internal error: Reserved memory (15560704) not enough: used 25010176. Increase activation/reserved_memory?
Logical volume "journal_reserved_snap" successfully removed
LMAX Exchange, Yellow Building, 1A Nicholas Road, London W11 4AN
Recognised by the most prestigious technology and business awards
Financial technology awards
2017 Best FX trading venue, Fund Technology & WSL Awards
2016 Best trading & execution venue, HFM US Technology Awards
FX industry awards
2016, 2015, 2014, 2013 Winner, Profit & Loss Readers' Choice Awards
2016, 2015, 2014, 2013 Winner, WSL Institutional Trading Awards
Business growth awards
2016, 2015 Winner, Deloitte UK Technology Fast 50
2015, 2014, 2013, Winner, The Sunday Times Tech Track 100
2016, 2015 Winner, Deloitte EMEA Technology Fast 500
2015 Winner, Tech City UK Future 50
FX and CFDs are leveraged products that can result in losses exceeding your deposit. They are not suitable for everyone so please ensure you fully understand the risks involved.
This message and its attachments are confidential, may not be disclosed or used by any person other than the addressee and are intended only for the named recipient(s). This message is not intended for any recipient(s) who based on their nationality, place of business, domicile or for any other reason, is/are subject to local laws or regulations which prohibit the provision of such products and services. This message is subject to the following terms (http://lmax.com/pdf/general-disclaimers.pdf), if you cannot access these, please notify us by replying to this email and we will send you the terms. If you are not the intended recipient, please notify the sender immediately and delete any copies of this message.
LMAX Exchange is the trading name of LMAX Limited. LMAX Limited operates a multilateral trading facility. LMAX Limited is authorised and regulated by the Financial Conduct Authority (firm registration number 509778) and is a company registered in England and Wales (number 6505809).
LMAX Hong Kong Limited is a wholly-owned subsidiary of LMAX Limited. LMAX Hong Kong is licensed by the Securities and Futures Commission in Hong Kong to conduct Type 3 (leveraged foreign exchange trading) regulated activity with CE Number BDV088.
More information about the linux-lvm