Re: stalled 'sync' on ext3+quota over drbd

On Wed, 2004-03-31 at 16:46, Stephen C. Tweedie wrote:

> > Now, the setup mostly works fine.  But if you actively use the
> > filesystem for some time (hour of copying a large tree over NFS), then
> > then try 'sync' command, the latter runs very long (10 minutes or more),
> > eating 99% CPU according to top, and the system becomes very sluggish
> > (leading to stalled replication, heartbeat misbehavior) and in fact
> > unusable.
> You'd need to try capturing a profile of the 99% cpu loop for us to be
> able to investigate this any further.

That'd be tricky: it is somewhere in the kernel (top shows 99% CPU used
by "system", and strace attaced to sync does not show anything).

Another thing, possibly related: when I try `quotaoff', machine hangs
for 10+ minutes, and does not respond to *anything* but ping.  Then it
gets alive again.

I'd be happy to provide more information but so far I cannot decide
where to look...  Should I learn to use "kernel profiling"?


