[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: how to counteract slowdown

Daniel Pittman wrote:
> ...
> >> This mans that journal_stop waits in log_wait_commit for the
> >> transaction to be flushed out of the journal and onto it's final
> >> location?
> >
> > yes.
> Now, this is the bit that puzzles me:  if I write the following:
> (assume the C is correct, please :)
> int file = open("blah", O_WRONLY);
> write(file, buffer, 40000);
> fsync(file);
> After the fsync call, where is the content of buffer? Is it:
> (a) in the journal only
> (b) in the journal and on disk
> This is the one bit that confuses me from your answer.

After the fsync that data is in the journal, and it is
in memory.  Once the in-memory copy has been written out
to the main filesystem, its journal space can be recycled.
> > Possibly log_do_checkpoint() could be smarter, and not write
> > out all dirty buffers.  Or it could start IO on all of them,
> > but stop waiting on writeout once sufficient journal space has
> > become available.
> The second option actually sounds good to me, but that's because it lets
> real work proceed as soon as possible, but still works hard to empty
> something that's been very busy recently and, thus, is likely to be so
> again.

> > Writing them all out gets good clustering and hence throughput. But
> > introduces latency for these bursty loads.
> Yes, it would. Now, this latency would happen only if the journal got to
> be so full that we couldn't fit more data, right? The code looks that
> way (unless the journal is destroyed or the FS remounted, of course.)

Pretty much, yes.  I think the problem we're seeing is to do with the
fact that once the journal is 1/4 used, we force checkpointing of
the in-memory data into the main fs, and this effectively blocks
the fs.   For something like ext2, we start async writeout of dirty
data when it reaches 40% of all memory.  We start sync writeout (to
throttle writers) at 60%.  So on a 512 megabyte machine, the writer
can pump an additional 100 megs of data into the fs after IO has
started.   With ext3 in journalled data mode, or with metadata-intensive
loads we don't have that extra buffer.

I suspect that for long-term worloads it doesn't make a lot of difference.
With ext2, writers will still end up getting blocked.  But later, and for
> I suspect that increasing the flush delay will help smooth the load /I/
> see, but that it's relevant only because I have a specific case that it
> helps: enough ram that it's better to buffer until the 30 second write
> burst is done before forcing some (lazy) writeback...

mm..  So you'd need a monstrous journal, and we need to start
async checkpointing at 25% journal occupancy (wakeup_bdflush()?)
and synchronous checkpointing at 75%....

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]