[linux-lvm] cache on SSD makes system unresponsive
o1e9 at member.fsf.org
Fri Oct 20 09:59:01 UTC 2017
On 20. okt. 2017 08:46, Xen wrote:
> matthew patton schreef op 20-10-2017 2:12:
>>> It is just a backup server,
>> Then caching is pointless.
> That's irrelevant and not up to another person to decide.
>> Furthermore any half-wit caching solution
>> can detect streaming read/write and will deliberately bypass the
> The problem was not performance, it was stability.
>> Furthermore DD has never been a useful benchmark for anything.
>> And if you're not using 'odirect' it's even more pointless.
> Performance was not the issue, stability was.
>>> Server has 2x SSD drives by 256Gb each
>> and for purposes of 'cache' should be individual VD and not waste
>> capacity on RAID1.
> Is probably also going to be quite irrelevant to the problem at hand.
>>> 10x 3Tb drives. In addition there are two
>>> MD1200 disk arrays attached with 12x 4Tb disks each. All
>> Raid5 for this size footprint is NUTs. Raid6 is the bare minimum.
> That's also irrelevant to the problem at hand.
I mostly agree with Xen about stability vs usability issues. I have a
stable system and available SSD partition with unused 240Gb so decided
to run tests with LVM caching using different cache modes. The _test_
results are in my posts so LVM caching has stability issues indeed
regardless how I did set it up.
I do agree I would need to make a separate Virtual hardware volume for
the cache and the most likely do not mirror it. However, the
performance of the system is defined by a weakest point so it may be
indeed the slow SSD of course. I may expect performance degradation
because of that but not whole system lock down, deny of any services and
follow with reboot.
Your assumptions about streaming operations of _just a backup server_
are not quite right. Bareos Directory configuration running on a
separate server pushes that Storage to run multiple backups in parallel
and eventually restores at the same time. Therefore even there are just
few streams going in and out the RAID is really doing random read and
DD is definitely is not a good way to test any caching system, I do
agree, however it is first first to try and see any good/bad/ugly
results before running other tests like bonnie++. In my case, the right
next command after 'lvconvert' to cache and 'pvs' to check the status,
were 'dd if=some_250G_file of=/dev/null bs=8M status=process' and that
was the moment everything went completely unexpected with an unplanned
About RAID5 vs RAIS6, well, as I mentioned in a separate message there
is a logical volume built of 3 hardware RAID5 virtual disks so it is not
30+ disks in one RAID5 or something. Besides, that server is a
front-end to LTO-6 library so even unexpected happens it would take 3-4
days to pile-up it from client hosts anyway. And I have few disks in
stock so replacing and rebuilding RAID5 takes no more than 12 hours.
RAID5 vs RAID6 is a matter of operational activities efficiency: watch
dog system logs with Graylog2 and Dell OpenManage/MegaRAID, have spare
disk and do everything on time.
More information about the linux-lvm