[Linux-cluster] Question about GFS2 and mmap

Scooter Morris scooter at cgl.ucsf.edu
Sun Jan 16 00:46:30 UTC 2011


We have a RedHat cluster (5.5 currently) with 3 nodes, and are sharing a 
number of gfs2 filesystems across all nodes.  One of the applications we 
run is a standard bioinformatics application called BLAST that searches 
large indexed files to find similar dna (or protein) sequences.  BLAST 
will typically mmap a fair amount of data into memory from the index 
files.  Normally, this significantly speeds up subsequent executions of 
BLAST.  This doesn't appear to work on gfs2, however, when I involve 
other nodes.  For example, if I run blast three times on a single node, 
the first execution is very slow, but subsequent executions are 
significantly quicker.  If I then run it on another node in the cluster 
(accessing the same data files over gfs2), the first execution is slow, 
and subsequent executions are quicker.  This makes sense.  The problem 
is that when I run it on multiple nodes, the speeds of subsequent runs 
on the same node are no quicker.  It almost seems as if gfs2 is flushing 
the in-memory copy (which is read only) immediately when the file is 
accessed on another node.  Is this the case?  If so, is there a reason 
for this, or is it a bug?  If it's a known bug, is there a workaround?

Any help would be appreciated!  This is a critical application for us.

Thanks in advance,

-- scooter




More information about the Linux-cluster mailing list