[Linux-cluster] Question about GFS2 and mmap
Scooter Morris
scooter at cgl.ucsf.edu
Mon Jan 17 19:06:53 UTC 2011
Steven,
Thanks for getting back to me. Yes, I've checked and noatime is
definitely set. While blast was running, I did a lockdump and the
mmaped files had EX locks on them:
G: s:EX n:2/5497229 f:q t:EX d:EX/0 l:0 a:0 r:3
I: n:1055314/88699433 t:8 f:0x10 d:0x00000000 s:55237024/55237024
where inode 88699433 is one of the mapped files:
[root at crick blast]# ls -li /databases/mol/blast/db_current/nr.01.pin
88699433 -rw-r--r-- 1 rpcuser sacs 55237024 Jan 17 02:53
/databases/mol/blast/db_current/nr.01.pin
so that explains the behavior. What I don't understand is why they had
EX locks. I did an strace of the blast, and what I see when the files
are mmaped is something like:
stat("/databases/mol/blast/db/nr.01.pin", {st_mode=S_IFREG|0644,
st_size=55237024, ...}) = 0
open("/databases/mol/blast/db/nr.01.pin", O_RDONLY) = 8
mmap(NULL, 55237024, PROT_READ, MAP_SHARED, 8, 0) = 0x2b9ec1a14000
Where /databases/mol/blast is the gfs2 filesystem. So, the files are
not opened read/write, and the mmap'ed segment is not read/write. It's
not clear why gfs2 would create an exclusive glock for this file? Does
this make any sense to you?
-- scooter
On 01/16/2011 07:32 AM, Steven Whitehouse wrote:
> Hi,
>
> On Sat, 2011-01-15 at 16:46 -0800, Scooter Morris wrote:
>> We have a RedHat cluster (5.5 currently) with 3 nodes, and are sharing a
>> number of gfs2 filesystems across all nodes. One of the applications we
>> run is a standard bioinformatics application called BLAST that searches
>> large indexed files to find similar dna (or protein) sequences. BLAST
>> will typically mmap a fair amount of data into memory from the index
>> files. Normally, this significantly speeds up subsequent executions of
>> BLAST. This doesn't appear to work on gfs2, however, when I involve
>> other nodes. For example, if I run blast three times on a single node,
>> the first execution is very slow, but subsequent executions are
>> significantly quicker. If I then run it on another node in the cluster
>> (accessing the same data files over gfs2), the first execution is slow,
>> and subsequent executions are quicker. This makes sense. The problem
>> is that when I run it on multiple nodes, the speeds of subsequent runs
>> on the same node are no quicker. It almost seems as if gfs2 is flushing
>> the in-memory copy (which is read only) immediately when the file is
>> accessed on another node. Is this the case? If so, is there a reason
>> for this, or is it a bug? If it's a known bug, is there a workaround?
>>
>> Any help would be appreciated! This is a critical application for us.
>>
>> Thanks in advance,
>>
>> -- scooter
>>
> Are you sure that the noatime mount option has been used? I can't figure
> out why that shouldn't work if the BLAST processes are really only
> reading the files and not writing to them.
>
> GFS2 is able to tell the difference between read and write accesses to
> shared, writable mmap()ed files (unlike GFS which has to assume that all
> accesses are write accesses). Some early versions of GFS2 did that too,
> but anything recent (has ->page_mkwrite() in the source) and certainly
> 5.5 does, should be ok.
>
> You can use the glock dump to see what mode the glock associated with
> the mmap()ed inode is in. With RHEL6/Fedora/upstream you can use the
> tracepoints to watch the state dynamically during the operations. I'm
> afraid that isn't available on RHEL5. All you need to know is the inode
> number of the file in question and then look for a type 2 glock with the
> same number.
>
> Let us know if that helps narrow down the issue. BLAST is something that
> I'd like to see running well on GFS2,
>
> Steve.
>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list