[Linux-cluster] Alternative? Diskless Shared-Root GFS/Cluster

Jayson Vantuyl jvantuyl at engineyard.com
Thu Feb 1 12:26:36 UTC 2007


> Ok, might as well ask this... since I can't seem to find anything  
> on it. How
> about just a central storage that can be split up into many small  
> segments so
> that blades can boot over the network, then joint the GFS cluster?
We use an IDE flash disk in each server.  It's just too easy to put a  
readonly bootstrap image on the flash and boot up off of that.  With  
affordable 256MB flash disks you can even have a powerful repair  
environment there in case things get broken.

> I mean, all I want to do is to remove the drives since they really  
> aren't
> being used. All of the work is being done on the GFS cluster once a  
> machine is
> up and running. It barely does anything with it's drive other than  
> the OS of
> course, even logging is all remote.
Don't remove the drives, use IDE flash drives instead.  I think you  
can also use USB thumb drives if your BIOS supported it.  A 256MB  
flash for $26.30 is hard to beat.  Order directly from the  
manufacturer at:

http://www.transcendusa.com/Products/ModDetail.asp?ModNo=26&LangNo=0

We put the boot loader, kernel, and a simple maintenance environment  
on the flash.  We still boot our root off of the SAN.  Interestingly,  
our SAN supports partitioning.  What we do here is have partitions  
for each node (automatically mounted using a LABEL= mount).  After  
that boots up, we run CLVM with our GFSes on top of it.  Quite handy  
(and CLVM isn't really necessary for your case).

> Isn't there a simpler way of getting this done without having to  
> get into
> whole new technologies? All of the blades have PXE boot  
> capabilities, there
> must be some simple way of doing this?
I'd avoid this.  I've tried the PXE boot thing before and the PXE  
only becomes one more single point of failure / maintenance.  There's  
nothing like rebooting your cluster only to find that the PXE server  
has a failed disk.  :(

Basically with a SAN set up as follows:

/dev/san0p1 (FS for node 0, labeled node0)
/dev/san0p2 (FS for node 1, labeled node1)
...
/dev/san1 (CLVM / GFS / other stuff)

Your boot flash doesn't need much more than a very tiny Linux system  
(busybox is your friend), a file containing the node id (in this  
case /node_id) and a /linuxrc containing:

#!/bin/sh
NODEID=`cat /node_id`
# SET UP SAN HERE IF NECESSARY
mount /proc # Necessary because LABEL-mounting requires /proc/partitions
mount -o ro -L root-${NODEID} /newroot
cd /newroot
pivot_root . oldroot/
exec sbin/init

Considering that your flash hardly ever changes, and you can script  
creating of the flash image and node partitions, this quickly becomes  
very low maintenance.  If you want them to be identical, grab the MAC  
address off of the first NIC and generate the label with that...

A shared root is a nice idea.  However you just end up creating a  
fragile custom environment that is hostile to lots of software and  
creates new single points of failure and contention (making it  
neither high-performance nor high-availability).

-- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl at engineyard.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070201/49c66c01/attachment.htm>


More information about the Linux-cluster mailing list