[Linux-cluster] High availability mail server

Tue Oct 27 00:50:46 UTC 2009

On 26/10/2009 23:54, Ray Burkholder wrote:
>>
>> High avail. Mail? That's what MX records are for.  Performance, would
>> be a side effect of multiple MXs.  Having it "clustered" wouldn't make
>> mail deliver any quicker.  Why make something so simple into something
>> complex?
>>
>
> Mail delivery and MX records are easy.  But once mail is received, you have
> to get it to user's mail boxes, and users have to gain access to the
> repository.  The repository should be 'highly available' in some fashion:
> partitioned storage units, redundant storage, replicated storage, backup
> storage, or whatever.  I believe that is the hard bit:  making the
> repository 'highly available'.
>
> How do people do it?

Here are some options you have:

1) Use a NAS/NFS box for shared storage - not really a solution for high 
availability per se, as this becomes a SPOF unless you mirror it somehow 
in realtime. Performance over NFS will not be great even in a high state 
of tune due to latency overheads.

2) Use a SAN with a clustered file system for shared storage. Again, not 
really a solution for high availability unless the SAN itself is 
mirrored, plus the performance will not be great especially with a lot 
of concurrent users due to locking latencies.

3) Use a SAN with exclusively mounted non-shared file system (e.g. 
ext3). Performance should be reasonably good in this case because there 
is no locking latency overheads or lack of efficient caching. Note, 
however, that you will have to ensure in your cluster configuration that 
this ext3 volume is a service that can only be active on one machine at 
a time. If it ends up accidentally multi-mounted, your data will be gone 
in a matter of seconds.

2b) Split your user data up in such a way that a particular user will 
always hit a particular server (unless that server fails), and all the 
data for users on that server goes to a particular volume, or subtree of 
a cluster file system (e.g. GFS). This will ensure that all locks for 
that subtree can be cached on that server, to overcome the locking 
latency overheads.

In options 2 and 3 you could use DRBD instead of a SAN, which would give 
you advantages of mirroring data between servers and not needing a SAN 
(this ought to reduce your budget requirements to a small fraction of 
what it would be with a SAN). Two birds with one stone.

You could also use GlusterFS for your mirrored data storage (fuse based, 
backed by a normal file system, doesn't live on a raw block device). 
Performance is similar to NFS, but be advised, you'll need to test it 
for your use case as it is till a bit buggy.

There is also another option, that doesn't involve block level or file 
system level mirroring - DBMail. You can back your mail storage in an 
SQL database rather Maildir. Point it at MySQL, set up MySQL 
replication, and you're good to go. At this point you may be thinking 
about master-master replication and sharing load between the servers. 
This would be unreliable due to the race conditions inherent in MySQL's 
master-master replication. You won't lose data, but mail clients assume 
that the message IDs always go up. That means of two messages get 
delivered in quick succession, the app might see the later message 
delivered to the local server, but not the earlier message that got 
delivered to the other server that hasn't replicated yet. Next time it 
checks for updates in the inbox, it'll not spot the other message with a 
lower message ID! The client would have to purge local caches and resync 
data to see the missing message. This means that with this solution you 
would still have to run it in fail-over mode (even if both MySQL 
instances would run at the same time to achieve real-time data 
mirroring). The only way you could overcome this with MySQL is to use 
NDB tables, but that brings you back to clustered storage performance 
issues (performance on NDB tables is pretty attrocious compared to the 
likes of MyISAM and InnoDB).

Anyway, that should be enough to get you started.

Gordan