Extended Attribute Write Performance

Fri Jan 13 01:52:14 UTC 2006

Andreas,

Thanks for your helpful reply.

On Thu, 2006-01-12 at 12:52 -0700, Andreas Dilger wrote: 
> On Jan 12, 2006  12:07 -0500, Charles P. Wright wrote:
> > I'm writing an application that makes pretty extensive use of extended
> > attributes to store file attributes on Ext2.  I used a profiling tool
> > developed by my colleague Nikolai Joukov at SUNY Stony Brook to dig a
> > bit deeper into the performance of my application.
> 
> Presumably you are using ext3 and not ext2, given posting to this list?
Actually this test case was on Ext2, not Ext3.  I did a quick search for
an ext2-users list and didn't immediately see results, so I figured that
as Ext2 and Ext3 have similar EA implementations, this list would be
appropriate.

> > In the course of my benchmark, there are 54247 setxattr operations
> > during a 54 seconds.   They use about 10.56 seconds of the time, which
> > seemed to be a rather outsized performance toll to me (~40k writes took
> > only 10% as long).
> > 
> > After looking at the profile, 27 of those writes end up taking 7.74
> > seconds.  That works out to roughly 286 ms per call; which seems a bit
> > high.
> > 
> > The workload is not memory constrained (the working set is 50MB + 5000
> > files).  Each file has one extended attribute block that contains two
> > attributes totaling 32 bytes.  The attributes are unique (random
> > actually), so there isn't any sharing.
> > 
> > Can someone provide me with some intuition as to why there are so many
> > writes that reach the disk, and why they take so long.  I would expect
> > that the operations shouldn't take much longer than a seek (on the order
> > of 10ms, not 200+)?
> 
> I suspect the reason is that the journal is getting full and jbd is
> doing a full journal checkpoint because it has run out of space for
> new transactions.  This is because using external EA blocks consume
> a lot of space (4kB) regardless of how small the EA is, and this can
> eat up the journal quickly.  54247 * 4kB = 211MB, much larger than
> the default 32MB (or maybe 128MB with newer e2fsprogs) journal size.
> 
> Solutions to your specific problem are to use large inodes and the
> fast EA space ("mke2fs -j -I 256 ..." makes 256-byte inodes, 128 bytes
> left for EAs) and/or increasing the journal size ("mke2fs -J size=400",
> though even 400MB won't be enough for this test case).
Increasing the inode size to 256 bytes made a huge difference under
Ext3.  The spikes that I mentioned for Ext2 also existed in Ext3, and
were eliminated by this change.  My application's performance increased
by about 40%, and the standard deviations dropped from around 20% to 4%.

However, for Ext2 it made very little difference.  I still have a
handful of operations (.05%) that account for 73% of the time.  I know
that Ext2 is optimized for shared attribute blocks (for the case of
ACLs).  Is there something about having lots of unique attributes that
results in poor performance?

> We implemented the large inodes + fast EAs (included in 2.6.12+ kernels)
> to avoid the need to do any seeking when reading/writing EAs, in addition
> to the benefit of not writing so much data (mostly unused) to disk.
> This showed a huge performance increase for Lustre metadata servers
> (which use EAs on every file) and also with Samba4 testing.
I can see why, especially on a journalled file system.

Thanks,
Charles