[Linux-cluster] I/O scheduler and performance

Wed Jul 5 04:15:09 UTC 2006

On Wed, 2006-07-05 at 12:43 +1000, RR wrote:
> All,
> 
> I'm not sure if what I'm about to ask is relevant to this discussion
> but because of the brief explanation Wendy has here (which I did know
> a bit of from before) re-inforced the question in mind to confirm one
> way or the other.
> 
> The question is, does anyone on this list think that for an
> application which "MAY" have just as many simultaneous reads to the
> same filesystem (almost never in the same folders however) by multiple
> nodes as it has writes, using a database to store information instead
> of using a clustered file system ala GFS is a better solution??
> 
> In short, I can either store and read these sound files from a SAN
> with a layer of GFS running over it (in which case the performance
> will depend on the efficiency and nature of GFS implementation) OR I
> could store these same files in a database over an ODBC connection.
> Note that this databases are enterprise grade, clustered and attached
> to the SAN via high performance HBAs.
> 

I'm not a database person but I suspect this is also workload dependent.
If your IO patterns are parallel in nature (IOs in node A are mostly
independent of Node B, say read/write from different directories), then
cluster filesystem such as GFS can scale and perform very well while one
single database server could easily become bottleneck. If you use
cluster version of database, you'll have the very same problem as GFS. 

At the same time, be aware that, in reality, most of the database needs
to run on top of a filesystem, regardless cluster version or not. *And*
GFS can be used as a single node filesystem with lock_nolock protocol
where locking overhead no longer exists.  

-- Wendy