Fwd: problems with large directories?

Wed Mar 10 01:51:20 UTC 2010

On 03/09/2010 09:36 AM, Charles Riley wrote:
> Sorry, I meant to send this to the list, not just Ric.
>
>
> ----- Forwarded Message -----
> From: "Charles Riley"<criley at erad.com>
> To: "Ric Wheeler"<rwheeler at redhat.com>
> Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
> Subject: Re: problems with large directories?
>
>
>
>
> ----- "Ric Wheeler"<rwheeler at redhat.com>  wrote:
>
>> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
>>> Hi,
>>>
>>> I have an application that deals with 100,000 to 1,000,000 image
>> files.
>>>
>>> I initially structured it to use multiple directories, so that file
>>> 123456 would be stored in /12/34/123456.  I'm now wondering if
>> that's
>>> pointless, as it would simplify things to simply store the file in
>> /123456.
>>>
>>> Can anyone indicate whether I'm gaining anything by using smaller
>>> directories in ext3/ext4?  Thanks.
>>>
>>> Mitch
>>>
>>
>> I think that breaking up your files into subdirectories makes it
>> easier to
>> navigate the tree and find files from a human point of view. Even
>> better if the
>> bytes reflect something like year/month/day/hour/min (assuming your
>> pathname has
>> a date based guid or similar encoding).
>>
>> You can have a million files in one large directory, but be careful to
>> iterate
>> and copy them in a sorted order (sorted by inode) to avoid nasty
>> performance
>> issues that are side effects of the way we hash file names in ext3/4.
>>
>> Good luck!
>>
>> Ric
>>
>
> Hi Ric,
>
> Can you elaborate on the performance issues you mention above?
>
> We use rhel4/ext3 on our pacs (medical imaging) servers.
> We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm.  Now we store images for a given patient's study in a path something like:
> aa/ab/ac/1.2.3/
>
> where 1.2.3 is the dicom study instance uid (a wwuid for a medical study)
> and aa/ab/ac/ is the directory hash we derived from that study instance uid.
>
> The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/.
> Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files.
> Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/
>
> In this context, would we gain filesystem performance by sorting by inode before copying?
> Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance?
>
> Thanks for any insight you can provide,
>
> Charles
>

Hi Charles,

The big issue with touching a lot of files (reading, stating, unlinking them) in 
ext3/4 is that readdir gives us back a list in effectively random order. This 
makes the accesses very seeky.

Not an issue with a handful of files (say a couple of hundred), but when you get 
to thousands (or millions) of files, performance really tanks.

To avoid that, you can sort the list returned by readdir() into ascending order 
by inode in reasonably large batches and get your performance up.

Several core tools have been looking at doing this automatically, but it is 
important for any home grown applications as well.

In your scenario with the directory hierarchy, I suspect that you won't hit 
this. If you had one very large directory, you certainly would.

Best regards,

Ric