smarter sparse files?

Rasmus Munk Larsen rmunk at quake.Stanford.EDU
Wed Nov 9 02:09:47 UTC 2005


Question: Does ext2/3 (or any other filesystem you know of) support
a system call turning blocks within a file back into "sparse zeros", 
i.e. giving the blocks back to the filesystem? 


Background:

I am working on a slotted fileformat where internal fragmentation 
occurs. One such occurrence is growth of the data in a given slot, 
which currently requires me to handle the fragmentation explicitly.
For example:

...==|== slot 1 ==|=== slot 2 ===|==...

Now assume that contents of slot 1 is replaced with a larger chunk of 
data. I must either append additional data e.g. at the end of the file

...==|== slot 1a =|=== slot2 ===|==...==|=== slot1b ==|

(and add my own data structures and code infrastructure to read 
fragmented slots) or leave the old (defunct) slot 1 data in place 
and garbage collect it later:

...==|= deadbeef =|=== slot2 ===|==...==|====== slot1' =====|

It's my impression that mechanisms for handling similar types of 
fragmentation is already implemented quite well in most modern 
filesystems, and hence I was wondering:

Question: Does ext2/3 (or any other filesystem you know of) support
turning blocks within a file back into "sparse zeros", i.e. giving 
the blocks back to the filesystem? 

If that was the case I could simply free the disk blocks belonging 
entirely to slot1 (as in turning it into zeros in a sparse file) and
append the new data at the end:

...==|XXXXXXXXXXXX|=== slot2 ===|==...==|===== slot1' =====|

and in effect having the file system do my garbage collection for me.
I would (probably very naively) think that this should be possible and
cheap since it only involves by manipulating trees/freelists/whatever 
and perhaps "massaging" a small number of actual data blocks. 
(I should mention that my slots are typically much larger than a 
disk block 100kB-1MB, say) 

Simple example: On a file system supporting sparse files, the following

fd = open("sparse1",w");
write(fd, buf, 10);
lseek(fd,1000000,SEEK_CUR);
write(fd, buf, 10);

will create a file occupying a small number of 
disk blocks. Ideally I would like to be able to do 
the following

fd = open("sparse2",w");
write(fd, buf, 1000020);
lseek(fd,10,SEEK_SET);
giveback(fd, 1000000, SEEK_CUR);   
lseek(fd,1000010,SEEK_SET);
write(fd, buf, 10);

and end up with a sparse2 not much larger than 
sparse1. "giveback" is my imaginary system call
that tells the file system that n bytes starting 
at a given offset should no longer be considered part 
of the file and the associated blocks given back to the 
file system's freelist. I realize that this might not be
possible currently if the chunk you wish to free is 
not aligned with the start of a block etc. etc. 

A "cruder" interface, just giving whole disk blocks back 
would be acceptable, though.

Your comments would be much appreciated,

Rasmus Munk Larsen, Stanford University.




More information about the Ext3-users mailing list