[patch] Re: stalled 'sync' on ext3+quota over drbd

Eugene Crosser crosser at rol.ru
Mon Apr 19 14:37:24 UTC 2004

On Mon, 2004-04-19 at 17:38, Jan Kara wrote:

> > > > after moving about 10,000 files and setting quota for a million
> > > > groupids, and then several hours of inactivity(!) I zeroed profile
> > > > counters (readprofile -r), ran `time sync' and then `readprofile'.  Here
> > > > are the results.  Yes, that's true, it took 3 (three) hours for `sync'
> > > > to complete!
> > > 
> > > Turns out there's a nasty O(N^2) behaviour in vfs_quota_sync().  That
> > > function walks the dquot list looking for things to sync, and it drops
> > > the lock when doing the actual syncing --- so each item synced causes it
> > > to start again at the beginning of the list.  If each item starts off
> > > dirty, then the list walk is N^2.
> > > 
> > > An obvious cure is to shift the start of the list to point just after
> > > the item just synced.  I've done only limited testing of this patch, but
> > > does it help for you?
> > 
> > Cool!  I've already began to build testing environment with oprofile
> > enabled ;-)  During the weekend, I am out of the office, but I'll
> > certainly verify your fix on Monday.
>   Do you already have results? I'd be interested in them...

From the first impression, it did not help.  But it takes a full day of
copying around files to reproduce that nasty 3hr sync.  So far, after a
couple hours of activity, sync takes 4+ minutes (99.9 cpu use of course)
which is approximately the same as it took before the patch.  But I will
only know for sure tomorrow.

> > > 2.4 and 2.6 seem to share this problem.
> > 
> > Apparently things are worse 2.6.  I have an impression (did not check it
> > yet) that 2.6.5 still suffers from the same deadlock problem that was
> > fixed in 2.4.24 -> 2.4.25 diff.
>   2.6.5 should have the same fixes as 2.4.25 wrt ext3. What deadlock
> do you see? There are some more bugfixes on a way to Linus which fix
> some possible deadlocks but I think they should be hard to trigger.

I only observed it once, on my workstation (2.6.5) where I was setting
up oprofile environment.  I created 10,000 files belonging to 10,000
uids (with quota set for all of them), and ran 'sync'.  The system
worked for another 10 or 20 minutes, 'sync' did not finish but *was not*
using any cpu, being in 'D' state.  Then the system hung and since it
was in X11 I do not have any stack trace or anything.  I did not try to
reproduce it yet, but I will.

> > Unrelated question: is quotacheck necessary after mounting an ext3 to
> > ensure consistent status?  I am building a 200Gb HA NFS server hosting a
> > million or two files that belong to 1/3 million userids; failover
> > without fsck and quotacheck takes about 30-40 seconds which is pretty
> > good.  fsck on this filesystem takes about 7 minutes, quotacheck - about
> > 4 minutes.  So, having to run quotacheck has significant impact in
> > availability...
>   You need to run quotacheck only if you didn't correctly unmount the
> filesystem. I've written journalled quota patch which removes the need
> of running quotacheck after unclean shutdown.

Of couse I am only interested in recovery after unclean shutdown (this
is a HA server).  I was hoping that maybe quota changes are logged along
with the rest of filesystem changes...  I.e. that your "journalled
quota" is already in the mainstream kernel.

> It is currently included
> in Andrew Morton's kernels (-mm tree) and maybe it will be in vanilla
> kernels but that depends on Linus. The quota fix and journalled quota
> patch are attached if you are interested... The patches are against
> 2.6.4 but should apply to 2.6.5 well.

Hmm, as DRBD supports 2.6 nowdays, I might give it a try...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20040419/f612afe1/attachment.sig>

More information about the Ext3-users mailing list