[Linux-cluster] Diskless Shared-Root GFS/Cluster

Thu Feb 1 11:57:46 UTC 2007

We are talking about application servers.

One of the toughest things about clustering in general and GFS in  
particular is the failure scenarios.

When you have any sort of cluster issue, if your root is on a shared  
GFS, that GFS freezes in various ways until fencing happens.  The  
problem with this is that certain binaries that are on the same GFS  
may need to be used to recover.  How do you execute fence_apc to  
fence a failed node when it is on a GFS that is hung waiting on that  
same fencing operation?

There are ways around this involving RAM disks and the like, but  
eventually we just settled on having a minimal flash disk that would  
get us onto our SAN (but not clustered).  Only after we were on a non- 
clustered-FS on our SAN would we then start up our clustered  
filesystem.  This gave us the ability to move our nodes around  
easily.  This is an often overlooked benefit of a shared root that  
putting your root FS on SAN gives you as well.  There's nothing like  
booting up a dead node on spare hardware.  This also gives you a  
solid way to debug a damaged root system.  With shared-root it's all  
or nothing.  It's not so with this configuration.  You also have  
separate syslog files and other things that are one more special case  
on a shared root.  It's also easy to set up nodes with slightly  
different configurations (shared-root makes this another special  
case).  As for the danger of drive failure, a read-only IDE flash  
disk (Google for Transcend) is simple, easy, and dead solid.

After consolidating your shared configuration files into /etc/shared  
and placing appropriate symlinks into that directory, it is a simple  
matter of rsync / csync / tsync / cron+scp to keep them synchronized.

It is tempting to want to have a shared root to minimize management  
requirements.  It is tempting to want to play games with ramfs and  
the like to provide a support system that will function when that  
shared root is hung due to clustering issues.  It is tempting to  
think that having a shared GFS root is really useful.

However, if you value reliability and practicality, it's much easier  
to script up an occasional Rsync than it is to do so many acrobatics  
for such little gain.  For a cluster (and its apps) to be reliable at  
all, it needs to be able to function, recover, and generally have a  
stable operating environment.  Putting GFS under the userspace that  
drives it is asking for trouble.

On Jan 31, 2007, at 1:34 PM, isplist at logicore.net wrote:

> I'm thinking for application servers/cluster only, not workstation  
> users.
>
>
> On Wed, 31 Jan 2007 11:10:55 -0800, Tom Mornini wrote:
>> We boot from flash drives, then pivot root to SAN storage.
>>
>> I agree with no drives in servers, but shared root is a
>> whole different ball game if you mean everyone using a
>> single filesystem for root.
>>
>> --
>> -- Tom Mornini, CTO
>> -- Engine Yard, Ruby on Rails Hosting
>> -- Reliability, Ease of Use, Scalability
>> -- (866) 518-YARD (9273)

-- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl at engineyard.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070201/c0caed55/attachment.htm>