[Linux-cluster] High availability mail server

Mon Oct 26 12:52:38 UTC 2009

Gordan Bobic wrote:
> Samer ITS wrote:
>> Dear Madi,
>>
>> I have 2 years of working with sun cluster 3.x
>>
>> So the concept is there, so I want to know how linux clustering is 
>> work for
>> mail system
>> Because I still planning for the project.
>>
>> The cluster is needed for Performance & high availability.
>>
>> So I need your advice in the planning & the best requirement for best
>> performance (such servers)
>> Already I have SAN storage (EMC Clariion - CX300)
> 
> Unfortunately, the very nature of the way Maildir behaves (lots of files 
> in few directories) makes it very at odds with your intention of using 
> clustering for gaining performance through parallel operation using SAN 
> backed storage with a cluster file system (being GFS1, GFS2 or OCFS2). 
> You'll be better off with a NAS, but even so, you'll find that one 
> machine with local storage will still perform a pair of clustered 
> machines running in parallel under heavy load, because the clustered 
> machines will spend most of their time bouncing locks around. You may 
> find that configuring them as fail-over with an ext3 volume on the SAN 
> that gets mounted _ONLY_ on the machine that's currently running the 
> service works faster.
> 
> The problem is that most people overlook why clustered file systems are 
> so slow, given the apparently low ping times to the SAN and between 
> machines on gigabit ethernet (or something faster). The generally 
> erroneous assumption is that given that the ping time is typically < 
> 0.1ms, this is negligible compared to the 4-8ms access time of 
> mechanical disks. The problem is that 4-8ms is the wrong figure to be 
> comparing to - if the machine is really hitting the disk for every data 
> fetch, it is going to grind to a halt (think heavy swapping sort of 
> performance). Most of the working data set is expected to be in caches 
> most of the time, which is accessible in < 40ns (when all the latencies 
> between the CPU, MCH and RAM are accounted for).
> 
> The cluster file system takes this penalty for all accesses where a lock 
> isn't cached (and if both machines are accessing the same data set 
> randomly, the locks aren't going to be locally held most of the time).
> 
> This may well be fine when you are dealing with large-ish files and your 
> workload is arranged in such a way that accesses to particular data 
> subtrees is typically executed on only one node at a time, but for cases 
> such as a large Maildir being randomly accessed, from multiple nodes, 
> you'll find the performance will tend to fall off a cliff pretty quickly 
> as the number of users and concurrent accesses starts to increase.
> 
> The only way you are likely to overcome this is by logically 
> partitioning your data sets.
> 
> Gordan

Expanding further on Gordon's post;

If you really want to have performance and high availability, you might 
be better off with a simple two-node cluster using a shared, 
Primary/Secondary DRBD setup. This will make sure that you data is 
always duplicated on both nodes without taking the huge locking hit that 
Gordon is talking about. Then you can use HA Linux/Heartbeat to handle 
fail-over in the case of the primary node dieing. This setup should be 
fairly straight forward to setup.

If you want more help, be sure to ask more specific questions.

Madi