Opinions on new Fedora Core 2 install with LVM 2 and snapshots?
Bryan J. Smith
b.j.smith at ieee.org
Mon Jul 26 19:45:26 UTC 2004
[ Thank you very much for your response ]
"Bill Rugolsky Jr." wrote:
> There are fundamental differences between what a NetApp filer is
> doing, and what LVM2 snapshots provide.
Yeah, it's hard to beat WAFL's well-integrated design.
> In particular, when using LVM2 snapshots, kcopyd has to constantly
> move blocks from your filesystem LV to the snapshot LV. Device Mapper
> is much more sensible and efficient at this than LVM1,
So I don't even want to look at LVM1, good.
> but it is still non-trivial overhead, and ends up generating a lot
> of mixed read/write traffic.
That's what I figured.
> We are currently using NFS/Ext3/LVM2/MD on a 2.6.8-rc1 kernel as our
> backup NFS server,
That's going to be my usage, as a backup NFS server to a _real_ NetApp
filer. It's largely more for Windows users than UNIX clients, but I'll
still need some production NFS support.
> and initial testing with snapshots under load uncovered some
> performance problems that I need to track down.
What kind of memory-I/O do you have in your system?
I'm hoping to do this on 1GB of RAM, but it's not my primary NFS server.
> [Snapshots and mirroring were only recently added to the Device Mapper
> code in the Linus kernel tree.]
Yep, I saw that. I also noticed the Red Hat 2.6.7 development kernels
are now patching them in (or are 2.6.8RC-based?).
> Either grab the most recent kernel from kernel.org, or an FC3 development
> kernel, and test extensively.
I can deal with performance issues. If they get bad enough, I'll just
not use snapshots and enable them later when they get the quirks worked out.
> The NetApp WAFL filesystem encapsulates all meta-data in a tree structure,
> and uses persistent copy-on-write multi-rooted trees. When writing, it
> places data wherever it is convenient (i.e., in the free space), and then
> adjusts block pointers up toward the root of the tree. Every few seconds
> it checkpoints its state (i.e., takes a snapshot).
Yep. It's not using disseparate volume management from filesystem, WALF is
an "all-in-one" for great efficiency.
> [The NetApp also uses NVRAM to hold state that hasn't been flushed to
> disk.]
I've done similar with 1GB PCI NVRAM boards, using it as an off-device
full-data Ext3 journal. Makes NFS v3 sync performance far better.
> When one wants to save a snapshot, the filesystem tags it and maintains
> its allocation data, instead of releasing stale blocks back into the free
> pool.
Right.
> Based on what I've read of Reiser4, the design should allow a similar
> level of functionality to be incorporated at some point. Unfortunately,
> it is not done yet.
I've seen ReiserFS v4 promise a lot, but compatibility always seems to be
an issue. I'll stick with XFS.
> To summarize: LVM2 will do what you want (modulo some tuning and
> perhaps bug fixes), but it is not an NetApp.
Yeah, it's not WAFL. But if it works, that's what I want. I'm only
concerned about data integrity, not performance, since it is my backup
NFS server.
> IIRC, XFS does not do data journaling. So while it may be much
> faster than Ext3, you need to consider data integrity.
I use Ext3 in meta-data journaling mode (ordered writes), so I don't
see that much difference. I was just mentioning XFS in case it is
considered a better option, especially if SGI has a GPL
for LVM2 on Linux. But I assume not.
> I haven't been following EVMS development, but you might want
> to look into the current state of affairs to find out if there
> is any functionality there that you need (e.g., badblock handling).
I _always_ use hardware RAID, so badblock handling is handled by
the intelligent controller. In this case, it's going to be a 3Ware
Escalade 9000 series.
> LVM2 installs work fine.
Good. That's my #1 issue. I can do snapshots later if needbe, or limit
their usage to select filesystems.
> Some things you might want to do:
> 1. Script some infrastructure to monitor snapshot space usage.
I do that anyway for disk usage, so not much there.
> 2. Cron a job to snapshot and fsck the filesystem, so any
> filesystem problems are revealed early.
Why do I need to fsck the filesystem?
> 3. If using Ext3 with data journaling, specify a large journal when
> creating the filesystem (e.g., mke2fs -j -J size=400 ...).
So you recommend Ext3 with full data journaling?
I used to do that back in the 2.2 days with VA Linux kernel, and I
might if I use a PCI NVRAM board.
But I've found Ext3 with ordered writes in 2.4 to be 100% reliable.
Is it not for LVM2/snapshots?
I would _not_ use Ext3 with writeback though, not worth the potential
data loss for small performance gain.
> 4. Tune the filesystem and VM variables: flush time, readahead, etc.
Is there a good reference based on CPU, I/O, memory, etc...?
> 5. Test whether an external journal in the form of an NVRAM card
> or additional disks would improve performance. (You can try with
> a ramdisk for test purposes).
I'd love to throw such a board in the system, but that's only going
to add costs. I'm hoping using Ext3 with ordered writes (meta-data)
and NFS v3 async operation will work fine.
Do you see any issues?
--
Linux Enthusiasts call me anti-Linux.
Windows Enthusisats call me anti-Microsoft.
They both must be correct because I have over a
decade of experience with both in mission critical
environments, resulting in a bigotry dedicated to
mitigating risk and focusing on technologies ...
not products or vendors
--------------------------------------------------
Bryan J. Smith, E.I. b.j.smith at ieee.org
More information about the fedora-list
mailing list