Opinions on new Fedora Core 2 install with LVM 2 and snapshots?

Mon Jul 26 21:49:05 UTC 2004

On Mon, Jul 26, 2004 at 03:45:26PM -0400, Bryan J. Smith wrote:
> > We are currently using NFS/Ext3/LVM2/MD on a 2.6.8-rc1 kernel as our
> > backup NFS server,
> 
> That's going to be my usage, as a backup NFS server to a _real_ NetApp
> filer.  It's largely more for Windows users than UNIX clients, but I'll
> still need some production NFS support.

Well, that is also what we are doing.  We need on-site and off-site backup
of our NetApp filer, and can do it with a Linux system for $2K apiece.

> > and initial testing with snapshots under load uncovered some
> > performance problems that I need to track down.
> 
> What kind of memory-I/O do you have in your system?

P4 2.8GHz, 1GB RAM, dual SATA 250GB in MD RAID1.

My personal system is a dual Opteron 246, 4GB RAM, 4x200GB SATA, with
each drive split into 3 equal partitions, for playing around with various
MD configs.  I'm looking at tuning the whole NFS I/O path on the latter,
and then building a few multi-terabyte fileservers.  I want to experiment
with various configs first, e.g., filesystem LV on one RAID1 PV, journal
and/or snapshot LV on the other PV.  RAID6 also.  Once GFS clustering
stabilizes on 2.6, I suppose I'll start over with a cluster config ...

> I'm hoping to do this on 1GB of RAM, but it's not my primary NFS server.

That should be fine.

I've been working from Arjan's Fedora test kernels, dropping the 4G/4G
and turning off highmem completely.  I've also added kexec and a few
other goodies.

> Yep, I saw that.  I also noticed the Red Hat 2.6.7 development kernels
> are now patching them in (or are 2.6.8RC-based?).

Arjan has been tracking the BitKeeper snapshots pretty closely.

> I can deal with performance issues.  If they get bad enough, I'll just
> not use snapshots and enable them later when they get the quirks worked out.

Well, of course, we want to get them fixed, and bug reports are useful. :-)

> I've done similar with 1GB PCI NVRAM boards, using it as an off-device
> full-data Ext3 journal.  Makes NFS v3 sync performance far better.

> I use Ext3 in meta-data journaling mode (ordered writes), so I don't
> see that much difference.  I was just mentioning XFS in case it is
> considered a better option, especially if SGI has a GPL 
> for LVM2 on Linux.  But I assume not.

FWIW, several commercial appliances apparently use XFS.  I feel no
compelling need to abandon Ext3; in my experience, the filesystem and
tools are extraodinarily robust, and performance has always been
adequate for my purposes.  If you want to do hardcore testing, you
need to choose one of the several methods to switch off writes
to the device at the block layer, and then loop randomly wrecking
and recovering the filesystem and looking for corruption.
(See Andrew Morton's test tools in Jeff Garzik's gkernel.sourceforge.net
repository.)

> I _always_ use hardware RAID, so badblock handling is handled by
> the intelligent controller.  In this case, it's going to be a 3Ware
> Escalade 9000 series.

I like the 3ware controllers, but until their meta-data is supported by
dmraid or the like, I'll pass.

> > 2. Cron a job to snapshot and fsck the filesystem, so any
> > filesystem problems are revealed early.
> 
> Why do I need to fsck the filesystem?

Because every kernel has bugs, and hardware can be flakey.  Corruption
can occur irrespective of journaling.

> > 3. If using Ext3 with data journaling, specify a large journal when
> > creating the filesystem (e.g., mke2fs -j -J size=400 ...).
> 
> So you recommend Ext3 with full data journaling?
>
> I used to do that back in the 2.2 days with VA Linux kernel, and I
> might if I use a PCI NVRAM board.
> 
> But I've found Ext3 with ordered writes in 2.4 to be 100% reliable.
> Is it not for LVM2/snapshots?

Well, here's the theory: when doing synchronous NFS commits, full data
journaling only requires a sequential write to the journal; the data gets
written back to the filesystem asynchronously.  If it is on a separate
spindle or in NVRAM, it is decoupled from both the read traffic and the
asynchronous writeback.  With NFS, the latency of write acknowledgements
typically affects throughput, so improving one improves the other.

I haven't done much experimenting, but over the years folks have
posted mixed results on ext3-users and nfs mail lists with various
combinations of data journal mode and internal, external, or
NVRAM journals.

> > 4. Tune the filesystem and VM variables: flush time, readahead, etc.
> 
> Is there a good reference based on CPU, I/O, memory, etc...?

None that I'm aware of, but I know that you've been lurking on the nfs
and ext3-users list for years -- search the archives. ;-p  Seriously,
there are quite a few performance discussions and tuning suggestions
over the years involving Neil Brown, Tom McNeal, Chuck Lever and others
mostly on the NFS side of things, Andrew Morton, Stephen Tweedie, and
Andreas Dilger mostly on the Ext3/VM side.

You should measure the difference between NFS async and sync operation.
If things are working correctly, 2.6 sync should not be too shabby.

As for CIFS, I have no clue.

Now, I need to go take my own advice, when I find a few free hours ...

Regards,

	Bill Rugolsky