[Cluster-devel] What's the issue with read-ahead ?

Mathieu Avila mathieu.avila at seanodes.com
Fri Mar 2 16:54:08 UTC 2007


Hello,

I'm having troubles with the read-ahead feature in GFS, when doing
long sequential reads. I understand that there are 3 different code
paths when doing a read:
- If it is accessed in Direct I/O then i get into:
do_read_direct (ops_file.c :: 406)
otherwise it will go in "do_read_buf", and therefore :
- if the inode is stuffed or its data is journaled, i get into :
do_read_readi(file, buf, size, offset); (ops_file.c :: 376)
- otherwise i get into the common kernel functions:
generic_file_read(file, buf, size, offset); (ops_file.c :: 378)

- In the 2nd case, the tunable parameter "max_readahead" is used to get
blocks ahead in gfs_start_ra.
- In the 3rd case, everything is handled by the kernel code, and the
read-ahead value is the one associated to the "struct file*" :
file->f_ra.ra_pages . This value is set by default by the kernel, by
using the read-ahead value that comes from the block device. The block
device here is the diapered ones (diaper.c). The block device structure
is initialized to zero, so that i get a 0 read-ahead device.

I've changed this by initializing the read ahead of the "struct file*"
when entering the read_gfs function and it worked successfully. I was
able to get a factor 2 improvement  in reading unstuffed unjournaled
files, in buffered mode.
But i'm not sure that this is correct according to the
lock states and data coherency among nodes. So, did the diaper
read-ahead was volontarily set to zero to avoid those kind of
coherency problems ? If not, can we safely set it to the value of the
underlying block device, so that the kernel will do the
same than what is performed in do_read_readi/gfs_start_ra ?

Thanks in advance,

--
Mathieu




More information about the Cluster-devel mailing list