How to generate a large file allocating space

Ted Ts'o tytso at mit.edu
Thu Nov 4 16:16:13 UTC 2010


On Tue, Nov 02, 2010 at 07:58:02AM +0000, Alex Bligh wrote:
> On 2 Nov 2010, at 01:49, "Ted Ts'o" <tytso at mit.edu> wrote:
> > But why not just use O_DIRECT?  Do you really need to access the
> > disk directly, as opposed to using O_DIRECT?
> > 
> Because more than one machine will be accessing the data on the ext4
> volume (over iSCSI), though access to the large files is mediated by
> locks higher up. To use O_DIRECT each accessing machine would need
> to have the volume mounted, rather than merely receiving a list of
> extents.

Well, I would personally not be against an extension to fallocate()
where if the caller of the syscall specifies a new flag, that might be
named FALLOC_FL_EXPOSE_OLD_DATA, and if the caller either has root
privs or (if capabilities are enabled) CAP_DAC_OVERRIDE &&
CAP_MAC_OVERRIDE, it would be able to allocate blocks whose extents
would be marked as initialized without actually initializing the
blocks first.

I don't know whether it will get past the fs-devel bike shed painting
crew, but I do have some cluster file system users who would like
something similar.  In their case they will be writing the files using
Direct I/O, and the objects are all checksumed at the cluster file
system level, and if the object has the wrong checksum, then the
cluster file system will ask another server for the object.  Since the
cluster file system is considered trusted, and it verifies the
expected object checksum before releasing the data, there is no
security issue.

You do realize, though, that it sounds like with your design you are
replicating the servers, but not the disk devices --- so if your disk
device explodes, you're Sadly Out of Luck.  Sure you can use
super-expensive storage arrays, but if you're writing your own cluster
file system, why not create a design which uses commodity disks and
worry about replicating data across servers at the cluster file system
level?

						- Ted




More information about the Ext3-users mailing list