[Cluster-devel] [GFS2 Patch] [Try 4] GFS2: Reduce file fragmentation

Fri Jul 13 12:55:14 UTC 2012

----- Original Message -----
| > allows future block allocations to follow in line. This continuity
| > may allow it to be somewhat faster than the previous version.
| > Thanks,
| > Steve!
| > 
| Yes, that would be interesting to know. I'll have a look in more
| detail
| a bit later, but some comments follow....

Preliminary results are in: In the hour-long test I've been using,
this improvement shaves about 4 minutes off (just under 1 percent).

| > +	/* Tricky: The newly created inode needs a reservation so it can
| > +	   allocate xattrs. At the same time, we don't want the directory
| > +	   to retain its reservation, and here's why: With directories,
| > items
| > +	   are often created and deleted in the directory in the same
| > breath,
| > +	   which can create "holes" in the reservation. By holes I mean
| > that
| > +	   your next "claim" may not be the next free block in the
| > reservation.
| > +	   In other words, we could get into situations where two or more
| > +	   blocks are reserved, then used, then one or more of the
| > earlier
| > +	   blocks is freed. When we delete the reservation, the rs_free
| > +	   will be off due to the hole, so the rgrp's rg_free count can
| > get
| > +	   off. The solution is that we transfer ownership of the
| > reservation
| > +	   from the directory to the new inode. */
| 
| This comment still doesn't make sense to me. What are these
| operations
| that are freeing up blocks in the directory? There should be no
| blocks
| freed in a directory unless we deallocate the entire directory at the
| moment.

Again, the "holes" I'm talking about are in the reservation, not in the
directory. Suppose you have an empty directory and "touch" files
a,b,c,d and e. Suppose the directory gets a multi-block reservation of
8 blocks for those allocations. After the 5 files are created, the
directory's reservation has rs_start=S, rs_len=8, and rs_free=3.
The bitmap representing those dinodes, which corresponds to the
reservation, looks something like this: 11 11 11 11 11 00 00 00.

Now suppose you delete file "b". The directory's blocks won't change,
nor will its hash table. However, the dinode for "b" will be deleted
and the corresponding bitmap for the dinodes will then look something like:
11 00 11 11 11 00 00 00. The corresponding reservation will have:
rs_start=S, rs_len=8 and rs_free=4.

The problem is if you now create file f in that directory,
it essentially "claims" the blocks at rs_start + rs_len - rs_free,
but S + 8 - 4 = S + 4, and that block is already claimed by file "e".

The alternative, as I stated in an earlier email, is to make the
starting block, S, a moving target, adjusting it with each allocation.
In that case, block S + 1, which was freed when file b was deleted,
will be "left behind" and add to the fragmentation.

We can't just keep marching rs_free forward because then we get into
rgrp accounting problems if/when the reservation is freed and has
unclaimed blocks that we need to return to the pool of blocks in the rgrp.

I don't understand what you're saying about this being a bug,
nor what needs to be fixed. Can you elaborate?

Regards,

Bob Peterson
Red Hat File Systems