parallel I/O on shared-memory multi-CPU machines

Wed Apr 4 01:25:55 UTC 2007

On Apr 03, 2007  15:53 -0700, Larry McVoy wrote:
> On Tue, Apr 03, 2007 at 03:50:27PM -0700, David Schwartz wrote:
> > > I write scientific number-crunching codes which deal with large
> > > input and output files (Gb, tens of Gb, as much as hundreds of
> > > Gygabytes on occasions). Now there are these multi-core setups
> > > like Intel Core 2 Duo becoming available at a low cost. Disk I/O
> > > is one of the biggest bottlenecks. I would very much like to put
> > > to use the multiple processors to read or write in parallel
> > > (probably using OMP directives in a C program).
> > 
> > How do you think more processors is going to help? Disk I/O doesn't take
> > much processor. It's hard to imagine a realistic disk I/O application that
> > was somehow limited by available CPU.
> 
> Indeed.  S/he is mistaking CPUs for DMA engines.  The way you make this go
> fast is a lot of disk controllers running parallel.  A single CPU can
> handle a boatload of interrupts.

Rather, I think the issue is that for an HPC application which needs to
do frequent checkpoint and data dumps can be slowed down dramatically
by lack of parallel IO.  Adding more CPUs on a single node generally
speeds up the computation part, but can slow down the IO part.

That said, the most common IO model for such applications is to have
a single input and/or output file for each CPU in the computation.
That avoids a lot of IO serialization within the kernel and is efficient
as long as the amount of IO per CPU is relatively large (as it sounds
like in the initial posting.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.