EXT2 vs. EXT3: mount w/sync or fdatasync

brian stone skye0507 at yahoo.com
Sat Mar 24 15:19:58 UTC 2007


Final configuration and performance results.

Changed machines (for a RAID test):
 - 3ware 9550SX with BBU
 - Pentium D 940
 - 2G DDR2 667
- (4) 750G Seagate SATAII drives (AS series)

RAID levels:
 - machine was configured for RAID5 but that was horribly slow, 12 MB/Sec
 - created a (2) drive RAID0, then sliced out a 100G partition
 - journal was on a separate JBOD disk
- write caching was enabled for the RAID0 and journal disk
- 64K stripes was used on RAID0 and JBOD journal

File system configuration:
- 100G ext3 file system
- Used a 32M journal on a physically separate device
- used "ordered" mode for the journal
- mounted with "noatime,nodiratime,noauto,noacl,nouser_xattr,dirsync"
- used the mkfs.ext3 -E option to set stripes to 16
   - RAID0 was using 64K stripes.
   - fs was using 4K blocks
- each file transaction did: open(),write(),fsync(),close() 
- slammed 1024 1MB chucks at it

I got 36 MB/Sec consistently.  A good sign because with the proper hardware, this would perform really well.

In production, I would probably use a RAID10 with at least 12 15K SAS/FC drives with dual controllers in Active-Active mode: failover+load balancing.  Either fiber or SAS connected.  That should scream!

Fortunately, this config needs very little space ... maybe 500G in total.  So the hardware cost is not terrible.  This config is for a queue directory that is crawled by a background process.  That process moves the data from this queue to mass "slow" storage, fiber attached SATAII 7200RPM RAID5.  The queue needs to be as fast as possible and must sync the data.  Tricky problem :)

thanks.

Andreas Dilger <adilger at clusterfs.com> wrote: On Mar 22, 2007  20:44 -0700, brian stone wrote:
> Machine A connects to machine B on a gigabit lan.  Machine A sends 
> 1024 1MB chucks of data; 1 GB in total. Machine B, the server, reads 
> in the MB and writes it to a file.
> 
> NOTE: server and client are little test programs written in C.  
> 
> Machine B (Server) hardware:
> - Single (no raid) Seagate Cheetah 70G Ultra320 15K
> - Quad Opteron 870
> - 16G DDR400
> - Backplane: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 8)
> 
> Sync methods include:
> 1. mount with sync option
>   - tried sync,dirsync which added no additional overhead
> 2. use O_SYNC open() flag
> 3. use fdatasync() just before closing the file
>   - fsync() and fdatasync() produced the same results
> 
> 
> EXT2 tests
> ==========================================
> No sync                     12.3 seconds  (83 MB/Sec)
> mount=sync                  44.3 seconds  (23 MB/Sec)
> O_SYNC                      31.7 seconds  (32 MB/Sec)
> fdatasync()                 31.3 seconds  (32 MB/Sec)
> 
> 
> EXT3 tests
> ===========================================
> No sync data=writeback      14.5 seconds  (70 MB/Sec)
> No sync data=ordered        17 seconds    (60 MB/Sec)
> No sync data=journal        65 seconds    (15 MB/Sec)
> data=ordered O_SYNC         49 seconds    (20 MB/Sec)
> data=ordered,sync           52 seconds    (19 MB/Sec)
> data=ordered fdatasync()    45.5 seconds  (22 MB/Sec)
> data=journal O_SYNC         72.5 seconds  (14 MB/Sec)
> data=journal,sync           81 seconds    (12 MB/Sec)
> data=journal fdatasync()    60.5 seconds  (17 MB/Sec)

If you are doing a large number of 1MB writes then I agree that
data=journal is probably not the way to go because it means you
can get at most 1/2 of the bandwidth of the disk (unless you
create the journal on a separate disk).  data=journal is good
for small writes and lots of transactions, like mail servers
that need lots of sync operations.

For large writes, I'd suggest you put the journal on a separate
device, and make it 1 or 2 GB (your server has plenty of RAM,
so that isn't a problem).  Are you using EAs, like selinux or
similar?  If yes, then you should also format your filesystem
with large inodes (-I 256).

You may also want to try out ext4dev with the mballoc and delalloc
patches from Alex Tomas, as this code has been optimized for
doing large power-of-two allocations in the filesystem.  They've
been posted to the ext4-devel lists a couple of times.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20070324/fad4bcc2/attachment.htm>


More information about the Ext3-users mailing list