[Linux-cluster] DB Clustering Question

Fri Apr 15 16:46:09 UTC 2005

On Fri, 2005-04-15 at 10:19 -0400, Chris Darroch wrote:
> Matt wrote:
> 
> > What I really want to do is parallel processing with mySQL or
> > Postgresql.  If I can't do that, then simply having multiple SQL servers
> > share the same DB files is the next option.  Can anyone push me in the
> > right direction?
> 
>    I'm new to the list as well, but having just gone through the process
> of evaluating exactly this kind of problem, I have a few cents I can
> throw in.
> 
>    I think the very short answer to your question is that databases and
> multiple servers don't mix well at all, as a general rule, and if you
> need full transactional SQL support in a cluster, you're likely looking
> at a commerical solution.
> 
>    The fundamental problem is that transactional databases, of which
> SQL databases are a subset, need to ensure that all transactions occur
> atomically, and to do this, they need very robust, very fast locking
> subsystems.  For example, before updating a row in a table, a
> database process needs to be sure that it acquires a lock on that data
> first, so that other database processes handling other client requests
> don't read partially altered data.
> 
>    Now locking is hard enough to do when you have just one machine
> (either single CPU or multiple CPUs), but can be done quite effectively
> and efficiently through the use of in-memory mutexes and other such
> devices.  Oracle, for example, takes out a big chunk of shared memory,
> which all processes use to coordinate locking.
> 
>    Doing this in a cluster of machines is much, much more difficult.
> It's compounded by the problem that one or more machines could fail,
> or the network could fail in various ways, and the DB software must
> ensure that under no conditions does the data become corrupted.
> (See all the work involved in the GFS DLM, for example, involving
> handling "split brain" conditions and the such like.)
> 
>    Oracle RAC (Real Application Cluster) provides this functionality
> at considerable expense, for instance, by requiring that you have
> a high-speed interconnection network between your machines, and
> then by providing its own internal lock manager and cluster monitor
> and so forth.  Essentialy, many of the components of GFS are
> provided inside Oracle RAC, for its own purposes, but are unavailable
> to outside processes.  You can also run Oracle RAC on Linux, in
> various ways:
> 
> http://www.redhat.com/software/rha/gfs/
> http://www.oracle.com/technology/tech/linux/index.html
> http://www.veritas.com/van/articles/7655.jsp
> 
> If I understand the RedHat option correctly, Oracle relies on GFS
> to manage the shared storage in the cluster, but still uses its own
> lock manager, cluster monitor, etc., for its own internal cache
> management and transaction handling.  However, I haven't read the
> installation white paper, so I'm not sure about that.  (Note to
> RedHat folks: trying to register on the Web site leads to an
> access denied error for the /info/ page.)
> 
>    Open source SQL databases like PostgreSQL and MySQL just don't
> have this kind of feature, so far as I can determine.  MySQL provides a
> cluster mechanism over regular TCP, but as far as I could tell from the
> documentation, this works by keeping the entire database in RAM on each
> cluster node:
> 
> http://dev.mysql.com/doc/mysql/en/multi-hardware-software-network.html
> 
>    PostgreSQL can be run in a cluster by emulating a single operating
> system underneath it, using high-speed interconnections and special
> kernel modifications:
> 
> http://www.linuxlabs.com/clusgres.html
> 
>    I don't know much about Ingres myself, but I didn't see anything
> about clustering for that, either.
> 
>    It's perhaps worth noting that PostgreSQL and Oracle face special
> complexities regarding data consistency and locking because they
> provide MVCC (Multi-Version Concurrency Control), which means that
> each database client sees a "snapshot" of the entire database
> as it was when they began their transaction.  As long as their
> transaction remains active, the database retains previous versions
> of all data modified by all other active transactions, so that
> the snapshot remains accurate to a past point in time.  Only once
> the transaction has closed can the database clean up old versions of
> data.  This is subtly different from just providing row-level locking
> in a table; if one transaction is slowing reading through all the
> rows of a table while another one performs updates of selected rows,
> the old versions of the updated rows are kept around until the
> reader's transaction closes, in case they are needed to provide an
> accurate view of what the data in the table looked like when the
> reader's transaction began.  So that's all just to say that the
> business of locking and shuffling data around is especially complex
> for such databases, and doing it in a cluster even more so.
> 
>    What you are able to do with the available options depends partly
> on your requirements, obviously.  If you don't mind having multiple
> read-only copies of your database files, and allowing them to be
> somewhat out of date, there are various ways you could replicate
> your data files from a master read-write node to multiple read-only
> nodes.  You'd want to ensure that the copying process performed
> the necessary interactions with the master database to ensure that
> it never copied partially complete data files; performing a hot
> backup and then replicating those files to the read-only nodes would
> work.
> 
>    Another related option if you don't mind having read-only and
> slightly out-of-date copies is to use memcached:
> 
> http://www.danga.com/memcached/
> 
> This functions as a data cache between your client programs and
> the database, and spreads the data around to multiple machines.
> But obviously write requests need to go to the master database,
> and then be replicated to the caches, and there's a period of time
> when you might not read up-to-date data from the cache.  But this
> may be OK for your application.
> 
>    If you need true full transactional SQL support spread across a
> cluster, I believe you'll have to look at Oracle or another commerical
> solution like the ClusGres one I referenced above.  I'd love to
> stand correctly, though, if anyone knows more about this.
> 
> Chris.
> 

If the database is mainly used for reads, you should check out emic
networks' product. It will allow you to cluster mysql across multiple
boxes, and if a node fails, it doesn't matter. If you want to add more
boxes you can too. It's load balanced, and writes are atomic across the
cluster. Interestingly, it does NOT require an in-RAM database.

See http://www.emicnetworks.com/