[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

External Journal scenario - good idea?

Hello everyone,

I've just recently joined the ext3-users list. I spent much of the weekend browsing over list archives and other tidbits I could find on the net, regarding using an external journal, and running in data=journal mode. From what I have seen looking around at what other folks are doing, data=journal with an external journal may be able to help our problem here.

If I could pick the brains of the resident gurus for a moment, and solicit some advice, I thank everyone in advance who can take the time to offer their opinions.

We are running a file server, which currently has as its "hard drive" an ATA-to-SCSI external RAID subsystem. The file server is a dual Pentium-III Tualatin 1.4GHz (512K cache) server, built on a Serverworks HESL-T chipset, with 2GB ECC Registered SDRAM.

The RAID unit is a Promise UltraTrak100-TX8, with 8 Western Digital WD1200JB 120GB ATA100 7200rpm hard drives installed. 7 of the 8 drives are joined to a RAID5 array, the 8th is an unassigned hot spare. The UltraTrak's SCSI interface is an Ultra2-LVD (80MB/sec) interface, connected via its external 68-pin MicroD cable, to a custom Granite Digital internal-to-external "Gold TPO" ribbon cable - which leads to the "B" channel of the onboard AIC7899W Ultra160 SCSI interface. The RAID unit is the only SCSI device attached to this channel at this time, and is terminated with a Granite Digital SCSI-Vue active diagnostic terminator. I have no indication or suspicion whatsoever of any SCSI bus problems. (I have also run same UltraTrak unit with same diag terminator to an AHA2940U2W in the "old" file server, with same write performance issues, to be described below).

Currently, the array is partitioned with a /boot partition, and a / partition, each as ext3 with the default data=ordered journaling mode. I have begun to realize gradually why it is a decent idea to break up the filesystem into separate mount points and partitions, and may yet end up doing that. But that's a rabbit to hunt another day, unless taking care of this is also required to solve this problem.

This file server performs 5 key fileserver-related roles, due to its having the large RAID5 file storage for the network:

1. Serves the mailboxes for our domain to the two frontend mail/web servers via NFS mount

2. Runs the master SQL server - the two mail/web servers run local slave copies of the mail account databases

3. Stores the master copy of web documents served by the web servers (and will replicate them to web servers when documents change, still working on this though)

4. Samba file server for storage needs on the network

5. Limited/restricted-access FTP server for web clients

For the most part, the file server runs great and does its job quite well. However there are two main circumstances in which things run quite poorly to "go downhill":

1. Daily maintenance-type cron events (like updatedb)

2. Other heavy file WRITE activity, such as when Samba clients are backing up their files to this server from the network. We regularly have some very large files being copied over to the file server via Samba (1 GB drive image files, for example)

In both cases, or other cases of heavy file I/O (mainly writes), this server pretty much grinds to a halt. It starts grabbing up all of the available RAM to use as dcache, presumably because the RAID unit cannot write it to disk that fast. The inevitable is stalled as long as possible, but eventually the backlog uses up all available system RAM (we have 2GB in this puppy now), until it is forced to write synchronously to free up some dcache for fresh data coming in. While this is going on, might as well forget delivering/retrieving an email to/from mailboxes, or getting much anything else out of the server. We have seen "NFS Server Not Responding" errors, and MySQL errors too (from the vpopmail libs trying to look up the username/pw and mailbox location).

Once the "emergency/panic" sync writing to disk is complete, the server reverts back to running great (although linux never seems to de-allocate RAM it has grabbed for dcache -- that is until it absolutely HAS to give it up).

From what I've been reading this seems to be normal for 2.4-series kernels (I'm running a modded 2.4.18 on this server, patched with the various NFS suite of patches, plus recent iptables), it seems to really like to use RAM for cache. And I suppose that RAM works better doing SOMETHING, than just sitting there looking pretty under the available column. ;)

I also understand that RAID5 is not known for its great writing performance. Add to that running an ext3 filesystem, which does add some overhead to it for the extra work.

We really need to solve this problem. We're also seeing "NFS Server not responding" errors in the logs every day during maintenance runs, and pretty much any other time heavy disk activity is going on, so mail performance is being affected. Mail users get username/pw errors (it even tells them it couldn't contact MySQL update server sometimes).

It's definitely not a server horsepower problem. ;) But I can see where it could be a write speed issue with the RAID unit. Unless this is just the way the linux kernel does things (which I am afraid may be the case).

After reading many posts in the archives here and other things I could find, I have considered setting up a separate pair of quick drives in a RAID1 array as an external journal, and setting DATA=JOURNAL mode on the root filesystem mount.

This strikes me as a possible write performance improver, if doing so will allow the larger writes to be "satisfied" faster because they only have to be written to the journal drive pair, without all the overhead of having to write to the RAID5 array. I realize that the data still has to be written to the main filesystem on the RAID5 array, and that this will actually cause more work. I'm just wondering if the journal updating to the actual filesystem is more of a background thing which does not affect the responsiveness of the file server. We would probably make the journal size close to the full size of the RAID1 array (40GB?)

Does this seem like a viable option to improve or eliminate the server responsiveness problems? Or do any of the gurus out there have any better suggestions? We can't fit an NVRAM-based external journal device in the budget.

One other snag it seems we may run into is the fact that the / partition already has a journal (/.journal, I presume), since it's already an ext3 partition. Is it possible to tell the system we want the journal somewhere else instead? Strikes me that when we're ready to move to the external journal, we may have to mount the / partition ext2, then remove the journal, and create the new one and point the / partition to it with the e2fs tools?

Thanks in advance for all thoughts, opinions, and suggestions. I'll provide whatever other details necessary.

Thanks in advance,

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]