[Linux-cluster] Re: GFS + GNDB Hi Available Storage sollution without expencive SAN Infrastructure.

Lon Hohberger lhh at redhat.com
Mon Dec 18 18:12:38 UTC 2006


On Sun, 2006-12-17 at 21:36 +0100, Jan Hugo Prins wrote:

> Hi Lon,
> 
> Thanks for your answer. Another question though, if I just make just one 
> node of the storage array writable and use heartbeat to find out which 
> node to use and to set the master node? Is this possible or is this also 
> something for the future?

The short answer is "I don't know.".

The (very) long answer is:

You could try it; I don't think anyone has done it yet.

Here's something similar to what you describe that Leonardo de Mello
came up with awhile ago, which I was supposed to make an attempt at
building in my spare (ha!) time.

Here is what the theoretical (never been done, untested, etc.) design
looks like... I hope I remember this correctly:

           This part is served by a floating IP address
           as part of a cluster service.
                  |
                  v

(disk servers)   (server cluster)        (clients)
  gnbd export -> gnbd import
                 RAID5 assemble 
                 gnbd export of RAID5 -> gnbd import

                                             ^
                                             |
                           Clients only import using the
                           floating IP address.  They do
                           not need to know where the
                           cluster service exists.

Here is the idea about how it should work:

If any of the disk servers fail, the node in the server cluster keeps
working (like one disk died in the RAID set).

If the RAID server fails, it gets restarted by rgmanager (or another
resource manager) somewhere else in the cluster, it assembles the RAID
volume, exports it, and the clients are expected to seamlessly
reconnect.

*Why* it might work:
(a) Only ONE server assembles the RAID volume at a time; ergo, only
*one* server is writing to the disk servers at a time.
(b) Clients *never* access the disk server gnbd exports directly; they
only access the assembled RAID set.
(c) Data exports/imports are all based on GNBD, so fencing is available
(part of assembling the RAID volume would require fencing the previous
owner).

Obvious Caveats:
(a) In order to preserve data integrity, it is likely required that
synchronous I/O be used at all levels.
(b) There is an obvious bottleneck in the RAID server; how much of a
bottleneck this is in practice is unknown.
(c) Placing a load on even one the disk servers will likely have a
dramatic negative impact on overall performance.
(d) Power consumption is *awful* given the data density.
(e) "Hot-swapping" a disk server will probably painful (maybe some
special tool could make this work more efficiently?).

If memory serves me right, in Leonardo's original architecture, there
are only two RAID servers (for redundancy), and they are separate from
the disk servers.  Each disk server is *only* acting as a disk server.
By specializing, the idea is to derive maximum performance given the
limitations of the architecture.

Now, in the above architecture, nothing strictly prevents the above
3-tier model (disk server, server cluster, clients) from being collapsed
into one cluster, but performance will likely suffer.

I couldn't begin to speculate about the performance of such a system.
I've CC'd linux-cluster so that others who know more about the
performance and configuration implications can comment.  It is very
possible that the above design would not work.

If it were me, I would just buy a mid-range iSCSI-capable SAN array.
The up-front cost might seem like a lot.  However, you will save on
power, cooling, rack space, and man power costs - which add up over time
*shrug*.

-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061218/bfc79702/attachment.sig>


More information about the Linux-cluster mailing list