[Linux-cluster] Advice on Storage Hardware

Fri Nov 11 23:14:26 UTC 2005

On 11/11/05, Steven Dake <sdake at mvista.com> wrote:
> David,
> Check out wackamole.  It may be a good fit for your application
> requirements.

I looked around but wasn't able to find a website for this project.

>
> If you have static content, it may be possible to use a "shared-nothing"
> architecture in which each node contains an exact replica of the content
> to be served.  The replicas can be maintained with something like rsync,
> or the like.  This is likely to provide higher reliability (less
> components like scsi bus, fibre channel hardware which is finicky,
> cabling, etc to fail) and higher availability with wackamole (MTTR is
> virtually zero because if a node fails, it is taken out of the server
> pool).

We do have some static content, however we have a large number of
large images and pdfs for one of our offerings as well as user created
content that would be difficult to keep in sync with rsync.

The way I was planning on dealing with the multiple webservers is to
setup a private webserver on one of the backend machines. This would
then be where all our code changes are made as well as background
tasks are run. Any changes to the configuration as well as code
changes would instantly be available to the front facing webservers
since they'll mount the same partition. Logging will have to be
handled so that they won't be writing to the same files I would
imagine, but overall this seems to me like a fairly straight forward
way to add webservers without a ton of configuration hassles.

>
> One problem you may have if you use GFS with a scsi array is that the
> array becomes a single point of failure.  If the array is not RAID,
> individual disks become a single point of failure.  Even with RAID 1, it
> is possible to suffer a catastrophic failure of two disks at the same
> time and result in offline time and restoration from tape (if your
> company does tape backups).  This failure scenario will annoy your
> customers surely :)

The array we are looking at would be fully populated (14 disks) and
would be RAID-ed mirror plus stripe. As mentioned before it would be a
dual controller array in hopes that if one controller died we'd  still
be ok with the other controller on the separate machine

We're willing to accept some level of risk and to me an external array
with 14 drives, two controllers and two hosts is only slightly more
risky than a real SAN if only because it's something not made
specifically for this purpose.

>
> The same problem applies to fibre channel, except now hubs/switches are
> SPOFs in addition to the other components unless storage I/Os are
> replicated using multipathing.
>
> With dual Ethernet, you add the possibility of replicating the network,
> but bonding is really a poor fit to serve your availability
> requirements.  What you really need is a redundant network protocol
> (such as Totem Redundant Ring Protocol available from the openais
> project sometime in the near future) merged with Wackamole's
> functionality.

Why wouldn't bonding be a good application here? Are there problems
with this that I'm not seeing?

Thanks for your input.