[Linux-cluster] gfs2 v. zfs?

Tue Jan 25 10:01:28 UTC 2011

Hi,

On Mon, 2011-01-24 at 20:37 -0800, Wendy Cheng wrote:
> Comments in-line ...
> 
> On Mon, Jan 24, 2011 at 6:55 PM, Jankowski, Chris
> <Chris.Jankowski at hp.com> wrote:
> > A few comments, which might contrast uses of GFS2 and XFS in enterprise class production environments:
> >
> > 1.
> > SAN snapshot is not a panacea, as it is only crash consistent and only within a single LUN.
> > If you have your data or database spread over multiple LUNs each with its own filesystem,
> > then you are on your own.
> 
> It depends on the SAN box. Some products have aggregate level
> snapshots that can contain multiple LUNs.
> 
> However, the argument here is correct; that is, SAN snaphost is not a
> panacea. Other than different SAN vendors may have different setup(s),
> snapshot restore could require specific knowledge of the filesystem
> involved (e.g. how the journal is replayed). So there are integration
> and test  efforts required for the restore to work well.
> 
> >
> > 2.
> > Therefore, we still need at least OS level (filesystem level) consistent backup
> > if the application itself does not provide a hot backup mechanism, which very few do.
> > The consistent filesystem level backup requires freeze and thaw commands.
> > XFS offers them, GFS2 does not.
> 
> I seem to see GFS2 having freeze/thaw patches in the past ? But for
> backup to work well, it requires more than freeze/thaw.
> 
Yes, GFS2 has freeze/thaw just like many local filesystems. It uses the
same interface. The main difference is that with GFS2 the freeze will
freeze all nodes when the freeze is run from a single node. The thaw
must be run from the same node which did the freeze.

Currently due to lack of supported cluster snapshots, SAN backup is our
suggested solution. Obviously there are some caveats with that, as
mentioned above.

> >
> > 3.
> > GFS2 provides only tar(1) as a backup mechanism.
> > Unfortunately, tar(1) does not cope efficiently with sparse files,
> > which many applications create.
> > As an exercise create a 10 TB sparse file with just one byte of non-null data at the end.
> > Then try to back it up to disk using tar(1).
> > The tar image will be correctly created, but it will take many, many hours.
> > Dump(8) would do the job in a blink, but is not available for GFS2 filesystem.
> > However, XFS does have XFS specific dump(8) command and will backup sparse files
> > efficiently.
> >
You don't need dump in order to do this (since dump reads directly from
the block device itself, that would be problematic on GFS/GFS2 anyway).
All that is required is a backup too which support the FIEMAP ioctl. I
don't know if that has made it into tar yet, I suspect probably not.

> > 4.
> > GFS2 is very convenient to use, as by its nature is clusterised.
> > However, there is huge performance cost to pay for all this convenience.
> > This cost stems from serialization imposed by distributed lock manager.
> >
That depends largely on the I/O pattern, hence my original question. It
can sometimes be difficult to arrange for the I/O to follow a pattern
which allows GFS2 to work at full efficiency, but by doing so, it will
make a big difference to the performance and retains the advantage of
the unified name space.

> > 5.
> > For these reason, for the HA applications running on one node at a time,
> > I found that XFS on top of LVM gives me the best mix of performance and functionality:
> > - high performance
> > - efficient backup of sparse files
> > - backup consistency through freeze/thaw
> > - zero downtime backup through use of LVM snapshots
> > - short failover times due to efficient XFS transaction logs
> >
> > So, for this type of HA applications (failover HA) and environment,
> > it makes perfect sense to use XFS in a cluster instead of GFS2.
> >
> > Having said that, GFS2 can, in principle, be engineered to be much better
> > for failover HA applications.
> >
> > It would require development of:
> > - GFS2 specific dump(8)
I don't agree that we need GFS2 specific dump. It is something that, to
date, nobody has requested. I suspect that there is another solution to
achieving what you are after.

> > - GFS2 specific freeze and thaw commands
Already exists

> > - CLVM wide snapshots
That would be nice, but you need to ask the LVM team.

> > - more efficient DLM
Also nice, but I very much doubt that this has any effect on the case in
question. It is usually the disk I/O that causes the performance issues
rather than the DLM.

I hope that answers a few questions,

Steve.

> 
> You did a great summary here. By looking at the list, I would imagine
> CLVM snapshoting is probably the easiest, technically and politically.
> It's all up to GFS2 engineers to take the note.
> 
> >
> > It certainly is possible to do. Digital/Compaq/HP TruCluster Cluster File System (CFS) built on top of AdvFS had all of these features and much, much more by circa year 2000.
> >
> 
> Yep, I met a TruCluster developer 3 years ago. Based on his
> description, I was impressed. Not sure HP is still marketing it
> though.
> 
> Again, a great summary !
> 
> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster