Disk defragmenter in Linux

Fri Dec 30 23:46:51 UTC 2005

On Fri, 2005-12-30 at 16:40 -0600, Mike McCarty wrote:
> Guy Fraser wrote:
> > 
> > Finally were back to the original post.
> > 
> > I am not a guru either, but have been administrating Unix systems 
> > since the 1980's. I have not found fragmentation to be a significant 
> > cause of performance problems on any Unix or Linux machines. Although 
> > fragmentation does occur, most Unix and Linux file systems are 
> > designed to minimize fragmentation and maximize utilization. Many 
> > Unix and Linux file systems try to write files using multiple contiguous
> > blocks. Each block is made up of a number of fragments, the number of 
> > fragments per block will depend on the drive size and other parameters.
> > The terminology for fragment confuses this discussion, but may also be 
> > the cause of the initial posting. This forum is not well suited to 
> > discussing how files are allocated, because there are too many 
> > different file systems that use different algorithms to determine 
> > when to allocate space for a file in a fragment. In basic terms 
> 
> Untrue in this context, as the OP specifically requested to find
> a defragmenter for ext3. That's what led to the claim that
> a defragmenter is not necessary for ext3, as it has some inherent
> immunities to fragmentation.

Hi Mike,

Even if there is fragmentation, it simply DOES NOT MATTER if it doesn't
result in a measurable performance hit.  So, what benchmarks can you
cite that show us how fragmentation degrades performance on a Linux
(specifically, ext3) filesystem? 

Or, can you create your own test?  I mean this very sincerely.  If you
want to argue that something matters then you need to back it up with
some actual measurements.  If fragmentation matters then you should be
able to devise a test case that demonstrates it.

> Another question, which AFAIK remains unaswered, though posed
> by Ed Hill, is just what is the performance degradation which
> might be suffered. Unfortunately, that is completely dependent
> on the use to which the file is put, and how often it is read.

Its not another question.  Its the only good reason for getting into
this discussion.

> Most (all today?) disc drives have read-ahead caching built into
> the drive, so that reads to sequential sectors are quite a bit
> faster than random reads, even when no seek is necessary.

Yes, but such things only matter on the initial read from the disk.  The
Linux VFS+VM will in all likelihood obviate any need to repeatedly read
blocks from a disk for frequently accessed files.  So for commonly used
blocks, the cost is in all likelihood amortized.

Can you demonstrate that the *initial* read really costs more?  And, if
so, how much?

Ed

-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464