Filesystem fragmentation and scatter-gather DMA

Ric Wheeler ric at emc.com
Mon Mar 17 17:11:24 UTC 2008


Jon Forrest wrote:
> David Schwartz wrote:
> 
>> That's not really the issue. The issue is whether a read of a chunk of a
>> file can take place without any extra seeks or whether it does require 
>> extra
>> seeks. Further, for the vast majority of cases, there is only one I/O 
>> stream
>> going on at a time. The disk will read ahead. If that can satisfy even a
>> small fraction of the subsequent I/Os the OS issues, that's a big win.
> 
> Maybe on a single user PC, some of the time there is only one I/O
> stream going on a time. But, once you start doing anything in parallel,
> or have multiple users, the number of sources (and destinations) of I/O
> goes way up. This, the arm is going to have to be moving around randomly
> even if the files involved aren't fragmented. Some (most?) OSs sort
> I/Os so that the movement is minimized but it still occurs.

You should keep in mind that big servers also have higher end storage 
systems (or at least multiple devices).  Heads don't tend to move about 
randomly - they will normally try to read (or write) in a specific 
order. Normally, that order is in increasing sector order.

Every level of the the system tries to guess how to combine and read 
ahead, all the way from the file system down to the internal firmware in 
  the storage.

The best way to get read-ahead to work is to use really obvious patterns 
- sequential, increasing and large IO's work best ;-)

> 
>>> 3) Modern disks do all kind of internal block remapping so there's
>>> no guarantee that what appears to be contiguous to the operating
>>> system is actually really and truly contiguous on the disk. I have
>>> no idea how often this possibility occurs, or how bad the skew is
>>> between "fake" blocks and "real" blocks. But, it could happen.
>>
>> Not bad enough to make a significant difference on any but a 
>> nearly-failing
>> drive.
> 
> It would be interesting to see what I'm calling the skew between
> the true sector layout and what an O/S sees on modern SATA drives.
> I'm not aware of any way to see this. Does anybody know?

I would not spend any time worrying about the sector remapping. SMART 
can tell you how many sectors have been remapped, but even with a really 
large disk the maximum number of remapped sectors is tiny (say 2000 or 
so for a 500GB disk).  Your chances of hitting them are tiny, especially 
since most drives end up with very, very few remapped sectors before 
they get tossed. Those with more than 100 sectors, for example, tend to 
complain a lot.

The short answer is to look at the sector level order of your file and 
assume (pretend) that it reflects the media layout as well.

Note that the whole deal changes when you have multi-drive RAID devices 
(software or hardware).

> I stand by my assertion that while disk fragmentation is in no way
> a good thing, it isn't something to fear, at least not in the way
> shown in the advertisements for defragmentation products.
> 

I think that fragmentation is a bad performance hit, but that we 
actually do relatively well in keeping our files contiguous in normal cases.

I have a simple bit of c code that uses fibmap to dump the 
sectors/blocks for a specific file. If you like, I can send it over to you.

Regards,

Ric




More information about the Ext3-users mailing list