Large File Copy to Large ext3 RAID5 Array Often Stalls

calinb at calinb at
Fri Jul 30 01:46:41 UTC 2004

I'm experiencing strange behavior from my ext3 RAID5 array and my Fedora Core 2 system.  Before I go crazy varying all sorts of tuning parameters, I thought some list subscribers might provide me with useful advice. 

The problematic array is:

3x Promise Technology Ultra 100 TX2 PCI cards
6x Maxtor 250GB IDE drives (one drive per cable)
RAID level 5, 128Kb chunk size, EXT3: "mkfs -t ext3 -b 4096 -m 0 -R stride=16 /dev/md2"

I'm running Samba 3 and I first noticed this problem when 3 out of my 5 Windows clients (2 XP machines and 1 Server 2K3 machine) failed to copy any large files (~1GB)  to a subdirectory on the server containing about 220 other such large files.  Two XP machines on my network have no problems whatsoever copying large files to the very same subdirectory on the server.  

A failing file transfer begins at a reasonable data rate (~6 MB / sec) but grinds to a near standstill after about 30 seconds and the copy continues to crawl until I cancel it (maybe 10kB / second--just a rough guess.) The two well behaived clients transfer the 1GB files in about 2-3 minutes, as expected. 

Yup, Samba--that's what I thought at first so I tried FTP and obtained the same results.

I can't correlate the problem to anything on the five Windows clients, or the NICs or the switch, etc.  I can't find any configuration differences amongst the clients that correlates to the 3 failing or 2 fully functional clients.

However, I can successfully copy the large files across the network from all 5 clients to an empty or nearly empty subdirectory on the raid5 array.  Then I move the files down to the subdirectory as desired.  That's my workaround for the 3 "bad" macines.  (Yuck!)

Now, here's what happens from the server console:  if I copy a large file from a different drive (a mirror pair) on the server to the raid5 ext3 array, I have the same kind of problems that I have with the 3 networked clients.  If I copy the large files from the mirror pair to an empty or nearly empty subdirectory on the raid5 ext3 filesystem, then the performance varies widely (from about 10MB /sec to 40 MB / sec.) I'd probably never notice this problem with smaller file (<100 - 200 MB or so) because the copy completes before the stall. 

The subdirectory containing the 220 1GB files was originally populated by copying the directory structure and files from one of the "good" XP Samba clients across the network.

Any ideas or suggestions are greatly appreciated.

I've tried data=journal, ordered, and writeback and, though there are performance differences, the problem remains in all three modes.

Cal Brabandt, Linux System Admin newbie

More information about the Ext3-users mailing list