Disk defragmenter in Linux

Sat Dec 31 00:27:38 UTC 2005

Ed Hill wrote:
> On Fri, 2005-12-30 at 16:40 -0600, Mike McCarty wrote:
> 
>>Guy Fraser wrote:
>>
>>>Finally were back to the original post.
>>>

[snip]

>>>the cause of the initial posting. This forum is not well suited to 
>>>discussing how files are allocated, because there are too many 
>>>different file systems that use different algorithms to determine 
>>>when to allocate space for a file in a fragment. In basic terms 
>>
>>Untrue in this context, as the OP specifically requested to find
>>a defragmenter for ext3. That's what led to the claim that
>>a defragmenter is not necessary for ext3, as it has some inherent
>>immunities to fragmentation.
> 
> 
> Hi Mike,

Hi! I preface this by saying that nothing in here is intended to
be or sound rude. Ok?

> Even if there is fragmentation, it simply DOES NOT MATTER if it doesn't
> result in a measurable performance hit.  So, what benchmarks can you

I never said otherwise.

> cite that show us how fragmentation degrades performance on a Linux
> (specifically, ext3) filesystem? 
> 
> Or, can you create your own test?  I mean this very sincerely.  If you
> want to argue that something matters then you need to back it up with

I don't want to spend the time necessary to try to devise a test.
I possibly could, though it would take some study of the file system,
and might require mods to the file system. I don't know.

> some actual measurements.  If fragmentation matters then you should be
> able to devise a test case that demonstrates it.

What I *did* claim is that ext3 is subject to fragmentation.
I don't recall stating that it was something I was particularly
concerned about. I responded to a claim which was demonstrably
false, and was being used as an argument to tell someone
asking a polite question that he shouldn't be asking the question.

>>Another question, which AFAIK remains unaswered, though posed
>>by Ed Hill, is just what is the performance degradation which
>>might be suffered. Unfortunately, that is completely dependent
>>on the use to which the file is put, and how often it is read.
> 
> 
> Its not another question.  Its the only good reason for getting into
> this discussion.

It is not. When someone asks a question, politely, in a reasonable
forum, he deserves a reasonable answer, not an argument. YOU don't
know what all his reason for asking was. Perhaps he wants to compress
all the used space down to one end of the drive for purposes of
splitting the partition. What difference does it make? If he posed
a reasonable question, he deserves a reasonable answer. He does
not deserve being told that his question has no basis, because
the circumstance doesn't occur, when in fact it *does* occur.

>>Most (all today?) disc drives have read-ahead caching built into
>>the drive, so that reads to sequential sectors are quite a bit
>>faster than random reads, even when no seek is necessary.
> 
> Yes, but such things only matter on the initial read from the disk.  The
> Linux VFS+VM will in all likelihood obviate any need to repeatedly read
> blocks from a disk for frequently accessed files.  So for commonly used
> blocks, the cost is in all likelihood amortized.

You seem argue against points I don't make, and then don't respond to
the points I *do* make.

Perhaps I'm not being clear enough. I dunno. Truly, I'm getting
confused by this thread.

> Can you demonstrate that the *initial* read really costs more?  And, if
> so, how much?

It matters, as I said, depending on what use the file gets
read. (I didn't say how often it gets read sequentially, I said
what use.) It also depends on how large it is. You apparently have not
actually written disc caching code. I have. In particular, I have
made some mistakes writing disc caching code. When the size of the
file is somewhat larger than the cache can hold, and the whole file
gets read repeatedly sequentially, then many caching algorithms
thrash badly. In particular, LRU often thrashes badly. It is a
theorem that *any* caching algorithm has a circumstance which causes it
to behave *worse* than no caching. (The particular circumstance
may not be sequential read, BTW.)

Which is why I said that it unfortunately depends on the use the
file gets. I just didn't get into gritty details, because I
didn't expect anyone to argue against a theorem of computer science.

Mike
-- 
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!