Ext3 Performance Tuning - the journal

Thu Dec 13 09:20:01 UTC 2007

On Dec 11, 2007  13:29 +0100, Sven Rudolph wrote:
> I didnt manage to determine the size of the journal of an already
> existing filesystem. tunefs tells me the inode:
> 
>   ~# tune2fs -l  /dev/vg0/lvol0 | grep -i journal
>   Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>   Journal inode:            8
>   Journal backup:           inode blocks
> 
> Is there a way to get the size of the journal?

dumpe2fs -c -R "stat <8>" /dev/vg0/lvol0

> And how do I find out how much of the journal is used? Or how often a
> journal flush actually happens? Or whether the journal flushes happen
> because the commit interval has finished or because the journal was
> full? This would give me hints for the sizing of the journal.

There is a patch for jbd2 (part of the ext4 patch queue, based on a
patch for jbd from Lustre) that records transactions and journal stats.

> And I tried to increase the journal flush interval.
> 
>   ~# umount /data/
>   ~# mount -o commit=30 /dev/vg0/lvol0 /data/
>   ~# grep /data /proc/mounts 
>   /dev/vg0/lvol0 /data ext3 rw,data=ordered 0 0
>   ~#
> 
> Watching the disk activity LEDs makes me believe that this works, but
> I expected the mount option "commit=30" to be listed in
> /proc/mounts. Did I do something wrong, or is there another way to
> explain it?

No, /proc/mounts doesn't report all of the mount options correctly.

> As you see above in /proc/mounts I use data=ordered. The fileserver
> offers both NFS and Samba. "data=journal" might be better for NFS, but
> I believe that NFS is the smaller part of the fileserver load. Is
> there a way to measure or estimate how large the impact of NFS on the
> journal size and transfer rate is?
> 
> If I used "data=journal" I would need a larger journal and the journal
> data transfer rate would increase. I fear this might induce a new
> bottleneck, but I have no idea how to measure this or how to estimate
> it in advance.

Increasing the journal size is a good idea for any metadata-heavy load.
We use a journal size of 400MB for Lustre metadata servers.

> Currently I have an internal journal, the filesystem resides on
> RAID6. I guess this is another potential performance problem.

For the journal this doesn't make much difference since the IO is
sequential writes.  The RAID6 is bad for metadata performance because
it has to do read-modify-write on the RAID stripes.

> When discussions on external journals appeared some years ago it was
> mentioned that the external journal code was quite new (see
> <http://marc.info/?l=ext3-users&m=101466148203469&w=2>).
> 
> I think nowadays I have the option to use an external journal and
> place it on a dedicated RAID1. Did anyone experience performance
> advantages by doing this? Even while using "data=journal"?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.