[Cluster-devel] [GFS2 Patch] [Try 4] GFS2: Reduce file fragmentation

Bob Peterson rpeterso at redhat.com
Fri Jul 13 12:55:14 UTC 2012


----- Original Message -----
| > allows future block allocations to follow in line. This continuity
| > may allow it to be somewhat faster than the previous version.
| > Thanks,
| > Steve!
| > 
| Yes, that would be interesting to know. I'll have a look in more
| detail
| a bit later, but some comments follow....

Preliminary results are in: In the hour-long test I've been using,
this improvement shaves about 4 minutes off (just under 1 percent).

| > +	/* Tricky: The newly created inode needs a reservation so it can
| > +	   allocate xattrs. At the same time, we don't want the directory
| > +	   to retain its reservation, and here's why: With directories,
| > items
| > +	   are often created and deleted in the directory in the same
| > breath,
| > +	   which can create "holes" in the reservation. By holes I mean
| > that
| > +	   your next "claim" may not be the next free block in the
| > reservation.
| > +	   In other words, we could get into situations where two or more
| > +	   blocks are reserved, then used, then one or more of the
| > earlier
| > +	   blocks is freed. When we delete the reservation, the rs_free
| > +	   will be off due to the hole, so the rgrp's rg_free count can
| > get
| > +	   off. The solution is that we transfer ownership of the
| > reservation
| > +	   from the directory to the new inode. */
| 
| This comment still doesn't make sense to me. What are these
| operations
| that are freeing up blocks in the directory? There should be no
| blocks
| freed in a directory unless we deallocate the entire directory at the
| moment.

Again, the "holes" I'm talking about are in the reservation, not in the
directory. Suppose you have an empty directory and "touch" files
a,b,c,d and e. Suppose the directory gets a multi-block reservation of
8 blocks for those allocations. After the 5 files are created, the
directory's reservation has rs_start=S, rs_len=8, and rs_free=3.
The bitmap representing those dinodes, which corresponds to the
reservation, looks something like this: 11 11 11 11 11 00 00 00.

Now suppose you delete file "b". The directory's blocks won't change,
nor will its hash table. However, the dinode for "b" will be deleted
and the corresponding bitmap for the dinodes will then look something like:
11 00 11 11 11 00 00 00. The corresponding reservation will have:
rs_start=S, rs_len=8 and rs_free=4.

The problem is if you now create file f in that directory,
it essentially "claims" the blocks at rs_start + rs_len - rs_free,
but S + 8 - 4 = S + 4, and that block is already claimed by file "e".

The alternative, as I stated in an earlier email, is to make the
starting block, S, a moving target, adjusting it with each allocation.
In that case, block S + 1, which was freed when file b was deleted,
will be "left behind" and add to the fragmentation.

We can't just keep marching rs_free forward because then we get into
rgrp accounting problems if/when the reservation is freed and has
unclaimed blocks that we need to return to the pool of blocks in the rgrp.
 
| This is really just a bug. We need to create the new inodes earlier
| so
| that we can use their reservations when allocating them "on disk".
| Then
| the directory and new inode reservations can be entirely separate.
| 
| The current problem is that we are using the directory's allocation
| state to allocate new inodes, which is wrong and causes other
| problems
| too. It has been on my list to fix, but it is complicated - we will
| have
| to address it at some stage in order to make the Orlov allocator work
| though,

I don't understand what you're saying about this being a bug,
nor what needs to be fixed. Can you elaborate?

Regards,

Bob Peterson
Red Hat File Systems




More information about the Cluster-devel mailing list