Online resizing of ext3 filesystems {shrink}

Mon Jan 9 15:27:01 UTC 2006

On Sun, Jan 08, 2006 at 08:23:50PM -0800, goemon at anime.net wrote:
> what about if you're shrinking the filesystem to a point where nothing 
> is/has ever been used/mapped and where no data needs to be compacted?
> 
> eg a 100gb filesystem where only 50gb has _ever_ been used.

That's not the way Ext2/Ext3 allocation works; directories and files
get spread out across the disk, for a variety of reasons.  The new
Orlov allocator spreads less, but even so, there is no guarantee that
the higher block groups are not allocated. Details are in
fs/ext3/ialloc.c.

Would it be nice to be able to shrink a filesystem online? Of course --
but nobody has been sufficiently motivated to do it, for any of the
commonly used filesystems.  The workaround is to not allocate all of
the space initially, and grow filesystems as needed. If you really
need to shrink the filesystem, boot from a rescue CD, or install a
rescue initramfs image in /boot, and boot from that.

> >But nobody ever bothered to write the userland code for an online 
> >defragmenter.
> 
> This is a major advantage microsoft has with NTFS over linux :-(

Doubtful.

Linux file systems generally don't need a defragmenter, except in a few
cases.  Reservations and/or delayed allocation help to alleviate problems
with files written incrementally, even on busy multi-user servers.  For
an overview of the state of Ext3, see:

   http://ext2.sourceforge.net/2005-ols/paper-html/cao.html

The principal problem is with slow-growing log files and mbox-style
mailboxes that may be written, closed, reopened, written, etc. 
For mailboxes a tar/untar of your mail directory works just fine.

There is a tool called Disk Allocation Viewer here:

   http://davtools.sourceforge.net/

It provides a nice graphical display of file allocation, just like the
Windows defragmenters.  It just won't do anything about it; that's
left as an exercise. :-)

More interesting than plain defragmenting would be clustering based upon
access pattern. E.g., I'd like to know the performance difference, if
any, between a system installed one RPM at at time, and updated via YUM,
where files are scattered across the the filesystem directory structure
as each RPM is installed, vs. a full file-level backup/restore.

Demand-paging of libraries and executables complicates the whole picture,
because the file is not read linearly.  The access pattern of huge C++
monoliths like OpenOffice.org is stomach-churning, though efforts have
been made to improve it.

Using one of the many tracing tools (Jens Axboe's blktrace, or LTT)
and a virtualization environment (Xen, QEMU, etc.), one could pretty
easily set up controlled experiments and record traces.

Regards,

	Bill Rugolsky