parallel I/O on shared-memory multi-CPU machines

i.vlad at yahoo.com i.vlad at yahoo.com
Mon Apr 2 12:57:13 UTC 2007


Dear ext3-users,

I write scientific number-crunching codes which deal with large input and output files (Gb, tens of Gb, as much as hundreds of Gygabytes on occasions). Now there are these multi-core setups like Intel Core 2 Duo becoming available at a low cost. Disk I/O is one of the biggest bottlenecks. I would very much like to put to use the multiple processors to read or write in parallel (probably using OMP directives in a C program). 

My question is -- can the filesystem actually read/write to files truly in parallel (more processors -- faster read/write), or if I write such a code the I/O commands from the CPUs will just queue after each other and it would be the same thing as using a single CPU? If parallel I/O is possible, can it be accomplished entirely transparently, or using special libraries, or only in special circumstances, like reading in parallel with N CPUs from N different physical disks? Or only on some types of hardware? Is there a max nr of threads/processes that can write to disk in parallel? If ext3 does not do this, which (stable) Linux filesystem does it?

I know that the vast majority of clusters use something else than ext3 (NFS, Lustre, etc), but the question still stands because: (1) individual nodes in commodity clusters do have very often individual ext3 disks that are used for temporary files (intermediate computational results); (2) grid computers made of standalone user machines are likely to have the most common filesystem, ext3; (3) There are scientific data processing steps that need to be done on a single shared-memory machine because of intensive data exchange between CPUs. (4) Software development is easier to do on a single machine (i.e. powerful multi-core laptop).

Thank you,
I. Vlad





More information about the Ext3-users mailing list