[Linux-cluster] Diskless Shared-Root GFS/Cluster
jvantuyl at engineyard.com
Thu Feb 1 11:57:46 UTC 2007
We are talking about application servers.
One of the toughest things about clustering in general and GFS in
particular is the failure scenarios.
When you have any sort of cluster issue, if your root is on a shared
GFS, that GFS freezes in various ways until fencing happens. The
problem with this is that certain binaries that are on the same GFS
may need to be used to recover. How do you execute fence_apc to
fence a failed node when it is on a GFS that is hung waiting on that
same fencing operation?
There are ways around this involving RAM disks and the like, but
eventually we just settled on having a minimal flash disk that would
get us onto our SAN (but not clustered). Only after we were on a non-
clustered-FS on our SAN would we then start up our clustered
filesystem. This gave us the ability to move our nodes around
easily. This is an often overlooked benefit of a shared root that
putting your root FS on SAN gives you as well. There's nothing like
booting up a dead node on spare hardware. This also gives you a
solid way to debug a damaged root system. With shared-root it's all
or nothing. It's not so with this configuration. You also have
separate syslog files and other things that are one more special case
on a shared root. It's also easy to set up nodes with slightly
different configurations (shared-root makes this another special
case). As for the danger of drive failure, a read-only IDE flash
disk (Google for Transcend) is simple, easy, and dead solid.
After consolidating your shared configuration files into /etc/shared
and placing appropriate symlinks into that directory, it is a simple
matter of rsync / csync / tsync / cron+scp to keep them synchronized.
It is tempting to want to have a shared root to minimize management
requirements. It is tempting to want to play games with ramfs and
the like to provide a support system that will function when that
shared root is hung due to clustering issues. It is tempting to
think that having a shared GFS root is really useful.
However, if you value reliability and practicality, it's much easier
to script up an occasional Rsync than it is to do so many acrobatics
for such little gain. For a cluster (and its apps) to be reliable at
all, it needs to be able to function, recover, and generally have a
stable operating environment. Putting GFS under the userspace that
drives it is asking for trouble.
On Jan 31, 2007, at 1:34 PM, isplist at logicore.net wrote:
> I'm thinking for application servers/cluster only, not workstation
> On Wed, 31 Jan 2007 11:10:55 -0800, Tom Mornini wrote:
>> We boot from flash drives, then pivot root to SAN storage.
>> I agree with no drives in servers, but shared root is a
>> whole different ball game if you mean everyone using a
>> single filesystem for root.
>> -- Tom Mornini, CTO
>> -- Engine Yard, Ruby on Rails Hosting
>> -- Reliability, Ease of Use, Scalability
>> -- (866) 518-YARD (9273)
jvantuyl at engineyard.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-cluster