[Linux-cluster] GFS Tunables

Thu Oct 16 20:21:05 UTC 2008

Wendy,

We have searched high and low for an alternative to file-to-file backups,
especially looking block level backups.  The only product we've found that
"supports" GFS is Bak Bone Replicator.  My first crack at installing it was
late last week.  The experience was worrisome.  The replicator service
inserts a kernel module, which by itself is livable; but in our particular
case, we found a changed behavior in error codes the kernel returns for
things like non existent files, while this module is loaded.  Ultimately, if
the kernel module was the root cause of that behavior (we're still
investigating), that's unworkable.

As for LVM snapshotting ... I am under the impression that those features
are unavailable in GFS (and are slated for GFS2?  Which is not "production
ready", yet?)  It has certainly occured to me to try that feature, if only
it were available.  Am I misinformed?  Perhaps I need some more education on
how exactly LVM mirroring will help me.  I am *attempting* to approximate a
traditional backup scheme, atleast on this particular filesystem.  Am I
correct in believing that I could snapshot a volume (assuming the feature is
available) and run a traditional backup (using, say, rdiff-backup) in a
shorter time than I can now, where I'm running it straight off a live GFS
volume?

--
Brandon

On Thu, Oct 16, 2008 at 10:50 AM, Wendy Cheng <s.wendy.cheng at gmail.com>wrote:

> Brandon Young wrote:
>
>> Hi all,
>>
>> I currently have a GFS deployment consisting of eight servers and several
>> GFS volumes.  One of my GFS servers is a dedicated backup server with a
>> second replica SAN attached to it through a second HBA.  My approach to
>> backups has been with tools such as rsync and rdiff-backup, run on a nightly
>> basis.  I am having a particular problem with one or two of my filesystems
>> taking a *very* long time to backup.  For example, I have /home living on
>> GFS.  Day-to-day performance is acceptable, but backups are hideously slow.
>>  Every night, I kick off an rdiff-backup of /home from my backup server,
>> which dumps the backup onto an XFS filesystem on the replica SAN.  This
>> backup can take days in some cases.
>>
>
> Not only GFS, the "getdents()" has been more than annoying on many
> filesystems if entries count within the directory is high - but, yes,
> GFS is particularly bloody slow with its directory read. There have been
> efforts contributed by Red Hat POSIX and LIBC folks to have new
> standardized light-weight directory operations. Unfortunately I lost
> tracks of their progress ... On the other hand, integrating these new
> calls into GFS would take time anyway (if they are available) - so
> unlikely it can meet your need. There were also few experimental GFS
> patches but none of them made into the production code.
>
> Unless other GFS folks can give you more ideas, I think your best bet at
> this moment is to think "outside" the box. That is, don't do
> file-to-file backup if all possible. Check out other block level backup
> strategies. Are Linux LVM mirroring and/or snapshots workable for you ?
> Does your SAN vendor provide embedded features (e.g. Netapp SAN box
> offers snapshot, snapmirror, syncmirror, etc) ?
>
> -- Wendy
>
>
>> We have done some investigating, and found that it appears that
>> getdents(2) calls (which give the list of filenames present in a directory)
>> are spectacularly slow on GFS, irrespective of the size of the directory in
>> question.  In particular, with 'strace -r', I'm seeing a rate below 100
>> filenames per second.  The filesystem /home has at least 10 million files in
>> it, which doing the math means 29.5 hours just to do the getdents calls to
>> scan them, which is more than a third of wall-clock time.  And that's before
>> we even start stat'ing.
>>
>> I google'd around a bit and I can't see any discussion of slow getdents
>> calls under GFS.  Is there any chance we have some sort of tunable turned
>> on/off that might be causing this?  I'm not sure which tunables to consider
>> tweaking, even.  This seems awfully slow, even with sub-optimal locking.  Is
>> there perhaps some tunable I can try tweaking to improve this situation?
>>  Any insights would be much appreciated.
>>
>> --
>> Brandon
>> ------------------------------------------------------------------------
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081016/3325a7a8/attachment.htm>