[linux-lvm] LVM2 robustness w/ large (>100TB) name spaces?

Steve Costaras stevecs at chaven.com
Mon Dec 29 03:06:16 UTC 2008

Thanks, yes the workload that this system will do is a lot of small I/O
requests.    From the system that I have logs from (I am trying to gather
logs from all systems to fine tune the sizes) the requests are in the
~64-128KiB range to the drive subsystem and very random in nature.    So a
4MiB PE or even 8MiB PE shouldn't be much of a problem (assuming the
workloads on the other boxes is similar).    

This is why I am planning on adding as many spindles as I can to grow that
way more so than streaming I/O.    I understand the limits of outstanding
commands IOPS (read/write) and scsi command queue depth, which the later
(command queue depth) is the biggest item that I see what would push me to
create more/smaller (instead of raid6 (13+2+1) for each PV something like a
raid-6 of (5+2+1) which would lower storage efficiency but increase the
number of PV's and command queue depth aggregate (as well as increasing
write iops but this is mainly a read-request array not many writes).

I am more looking for examples of builds that have real-world broached the
100+TB range under linux and what kinds of gotcha's I'm in for.   My current
arrays that are going to be merged into this are on average ~20-30TiB in
size each, since all serve similar (and somewhat overlapping functions)
merging them is in order.


-----Original Message-----
From: linux-lvm-bounces at redhat.com [mailto:linux-lvm-bounces at redhat.com] On
Behalf Of Marek Podmaka
Sent: Tuesday, December 23, 2008 04:28
To: LVM general discussion and development
Subject: Re: [linux-lvm] LVM2 robustness w/ large (>100TB) name spaces?


Tuesday, December 23, 2008, 1:15:28, Steve Costaras wrote:

> - What are the limits on PE/LE's per logical volume (>200,000,000? A
> problem?)  (I will be attaching multiple external chassis like above to
> several HBA's and will be using LVM striping to increase performance.   So
> small PE size (4MB-8MB) would be best to aid in the distribution of 
> requests across the physical subsystems.)

I think 4-8 MB for PE size is too small when you will be using such big (and
probably advanced arrays).
LVM stripping (strip size in hundreds of kB) would kill any array, because
when you request for example 512 kB from one array and next
512 kB from another array, they can't handle it efficiently. You won't see
the benefit of reading from all 16 spindles - everytime it will just load
512 kB from one physical disk. Also detection of sequential read might not
work well in array in this case.

In HP-UX LVM with enterprise arrays like HP EVA or HP XP we use 32-64 MB PE
and enable distribution - that means "stripe" size = PE size.
LE1 = PV1_1
LE2 = PV2_1
LE3 = PV1_2
LE4 = PV2_2 and so on.
Using this you request for example 32 MB from one array. Given the cache
sizes of arrays and readahead, so should get much better performance,
because those 32 MB will be fetched partially from all 16 drives.
Also we don't use ditribution among 2 arrays, just using different paths to
one array (different HBA, different SAN switch and different array FC
controller). We use 2 arrays only for mirroring data to other datacentre for
The main reason for us for that PE distribution is that HP-UX does not have
loadbalancing multipath built-in. But even when you will have it, using more
PVs is better because of the architectural limits of arrays (no. of
outstanding request for single virtual drive, scsi queue depth on server and
on array, cache memory limits per virtual drive, etc.)

  bYE, Marki

linux-lvm mailing list
linux-lvm at redhat.com
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3916 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20081228/2bdfa6ad/attachment.bin>

More information about the linux-lvm mailing list