[linux-lvm] Performance penalty for 4k requests on thin provisioned volume
zkabelac at redhat.com
Thu Sep 14 09:00:46 UTC 2017
Dne 14.9.2017 v 00:39 Dale Stephenson napsal(a):
>> On Sep 13, 2017, at 4:19 PM, Zdenek Kabelac <zkabelac at redhat.com> wrote:
>> Dne 13.9.2017 v 17:33 Dale Stephenson napsal(a):
>>> Distribution: centos-release-7-3.1611.el7.centos.x86_64
>>> Kernel: Linux 3.10.0-514.26.2.el7.x86_64
>>> LVM: 2.02.166(2)-RHEL7 (2016-11-16)
>>> Volume group consisted of an 8-drive SSD (500G drives) array, plus an additional SSD of the same size. The array had 64 k stripes.
>>> Thin pool had -Zn option and 512k chunksize (full stripe), size 3T with metadata volume 16G. data was entirely on the 8-drive raid, metadata was entirely on the 9th drive.
>>> Virtual volume “thin” was 300 GB. I also filled it with dd so that it would be fully provisioned before the test.
>>> Volume “thick” was also 300GB, just an ordinary volume also entirely on the 8-drive array.
>>> Four tests were run directlyagainst each volume using fio-2.2.8, random read, random write, sequential read, sequential write. Single thread, 4k blocksize, 90s run time.
>> Can you please provide output of:
>> lvs -a -o+stripes,stripesize,seg_pe_ranges
>> so we can see how is your stripe placed on devices ?
> Sure, thank you for your help:
> # lvs -a -o+stripes,stripesize,seg_pe_ranges
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert #Str Stripe PE Ranges
> [lvol0_pmspare] volgr0 ewi------- 16.00g 1 0 /dev/md127:867328-871423
> thick volgr0 -wi-a----- 300.00g 1 0 /dev/md127:790528-867327
> thin volgr0 Vwi-a-t--- 300.00g thinpool 100.00 0 0
> thinpool volgr0 twi-aot--- 3.00t 9.77 0.13 1 0 thinpool_tdata:0-786431
> [thinpool_tdata] volgr0 Twi-ao---- 3.00t 1 0 /dev/md127:0-786431
> [thinpool_tmeta] volgr0 ewi-ao---- 16.00g 1 0 /dev/sdb4:0-4095
> md127 is an 8-drive RAID 0
> As you can see, there’s no lvm striping; I rely on the software RAID underneath for that. Both thick and thin lvols are on the same PV.
>> SSD typically do needs ideally write 512K chunks.
> I could create the md to use 512k chunks for RAID 0, but I wouldn’t expect that to have any impact on a single threaded test using 4k request size. Is there a hidden relationship that I’m unaware of?
Yep - it seems the setup in this case is the best fit.
If you can reevaluate different setups you may possibly get much higher
My guess would be - the best targeting layout should be probably striping no
more then 2-3 disks and use bigger striping block.
And then just 'join' 'smaller' arrays together in lvm2 in 1 big LV.
>> (something like 'lvcreate -LXXX -i8 -I512k vgname’)
> Would making lvm stripe on top of an md that already stripes confer any performance benefit in general, or for small (4k) requests in particular?
Rule #1 - try to avoid 'over-combining' things together.
- measure performance from 'bottom' upward in your device stack.
If the underlying devices gives poor speed - you can't make it better by any
super0smart disk-layout on top of it.
>> Wouldn't be 'faster' to just concatenate 8 disks together instead of striping - or stripe only across 2 disk - and then you concatenate 4 such striped areas…
> For sustained throughput I would expect striping of 8 disks to blow away concatenation — however, for small requests I wouldn’t expect any advantage. On a non-redundant array, I would expect a single threaded test using 4k requests is going to end up reading/writing data from exactly one disk regardless of whether the underlying drives are concatenated or stripes.
It always depends which kind of load you expect the most.
I suspect spreading 4K blocks across 8 SSD is likely very far away from ideal
Any SSD is typically very bad with 4K blocks - it you want to 'spread' the
load on mores SSDs do not use less the 64K stripe chunks per SSD - this gives
you (8*64) 512K stripe size.
As for thin-pool chunksize - if you plan to use lots of snapshots - keep the
value lowest possible - 64K or 128K thin-pool chunksize.
But I'd still suggest to reevaluate/benchmark setup where you will use much
lower number of SSD for load spreading - and use bigger strip chunks per each
device. This should nicely improve performance in case of 'bigger' writes
and not that much slow things down with 4K loads....
> What is the best choice for handling 4k request sizes?
Possibly NVMe can do a better job here.
More information about the linux-lvm