stalled 'sync' on ext3+quota over drbd

I don't know yet if this is an ext3, quota or drbd issue, but I'll ask
anyway.  I am building a HA NFS server using two Dell-1750's and drbd. 
I have ext3 filesystem with quota built on drbd device running over
200Gb disk partition (hardware raid0+1), drdb-mirrored across servers. 
The kernel is 2.4.25, so hopefully quota deadlock should not be a
problem (it was on 2.4.24).

Now, the setup mostly works fine.  But if you actively use the
filesystem for some time (hour of copying a large tree over NFS), then
then try 'sync' command, the latter runs very long (10 minutes or more),
eating 99% CPU according to top, and the system becomes very sluggish
(leading to stalled replication, heartbeat misbehavior) and in fact

Any ideas why this happens and/or suggestions for further investigation?


