[patch] Re: stalled 'sync' on ext3+quota over drbd

Eugene Crosser crosser at rol.ru
Sat Apr 17 10:40:52 UTC 2004


On Sat, 2004-04-17 at 01:19, Stephen C. Tweedie wrote:

> > after moving about 10,000 files and setting quota for a million
> > groupids, and then several hours of inactivity(!) I zeroed profile
> > counters (readprofile -r), ran `time sync' and then `readprofile'.  Here
> > are the results.  Yes, that's true, it took 3 (three) hours for `sync'
> > to complete!
> 
> Turns out there's a nasty O(N^2) behaviour in vfs_quota_sync().  That
> function walks the dquot list looking for things to sync, and it drops
> the lock when doing the actual syncing --- so each item synced causes it
> to start again at the beginning of the list.  If each item starts off
> dirty, then the list walk is N^2.
> 
> An obvious cure is to shift the start of the list to point just after
> the item just synced.  I've done only limited testing of this patch, but
> does it help for you?

Cool!  I've already began to build testing environment with oprofile
enabled ;-)  During the weekend, I am out of the office, but I'll
certainly verify your fix on Monday.

> 2.4 and 2.6 seem to share this problem.

Apparently things are worse 2.6.  I have an impression (did not check it
yet) that 2.6.5 still suffers from the same deadlock problem that was
fixed in 2.4.24 -> 2.4.25 diff.

Unrelated question: is quotacheck necessary after mounting an ext3 to
ensure consistent status?  I am building a 200Gb HA NFS server hosting a
million or two files that belong to 1/3 million userids; failover
without fsck and quotacheck takes about 30-40 seconds which is pretty
good.  fsck on this filesystem takes about 7 minutes, quotacheck - about
4 minutes.  So, having to run quotacheck has significant impact in
availability...

Eugene





More information about the Ext3-users mailing list