[Linux-cluster] High availability mail server

Mon Oct 26 09:05:23 UTC 2009

Samer ITS wrote:
> Dear Madi,
> 
> I have 2 years of working with sun cluster 3.x
> 
> So the concept is there, so I want to know how linux clustering is work for
> mail system
> Because I still planning for the project.
> 
> The cluster is needed for Performance & high availability.
> 
> So I need your advice in the planning & the best requirement for best
> performance (such servers)
> Already I have SAN storage (EMC Clariion - CX300)

Unfortunately, the very nature of the way Maildir behaves (lots of files 
in few directories) makes it very at odds with your intention of using 
clustering for gaining performance through parallel operation using SAN 
backed storage with a cluster file system (being GFS1, GFS2 or OCFS2). 
You'll be better off with a NAS, but even so, you'll find that one 
machine with local storage will still perform a pair of clustered 
machines running in parallel under heavy load, because the clustered 
machines will spend most of their time bouncing locks around. You may 
find that configuring them as fail-over with an ext3 volume on the SAN 
that gets mounted _ONLY_ on the machine that's currently running the 
service works faster.

The problem is that most people overlook why clustered file systems are 
so slow, given the apparently low ping times to the SAN and between 
machines on gigabit ethernet (or something faster). The generally 
erroneous assumption is that given that the ping time is typically < 
0.1ms, this is negligible compared to the 4-8ms access time of 
mechanical disks. The problem is that 4-8ms is the wrong figure to be 
comparing to - if the machine is really hitting the disk for every data 
fetch, it is going to grind to a halt (think heavy swapping sort of 
performance). Most of the working data set is expected to be in caches 
most of the time, which is accessible in < 40ns (when all the latencies 
between the CPU, MCH and RAM are accounted for).

The cluster file system takes this penalty for all accesses where a lock 
isn't cached (and if both machines are accessing the same data set 
randomly, the locks aren't going to be locally held most of the time).

This may well be fine when you are dealing with large-ish files and your 
workload is arranged in such a way that accesses to particular data 
subtrees is typically executed on only one node at a time, but for cases 
such as a large Maildir being randomly accessed, from multiple nodes, 
you'll find the performance will tend to fall off a cliff pretty quickly 
as the number of users and concurrent accesses starts to increase.

The only way you are likely to overcome this is by logically 
partitioning your data sets.

Gordan