[Linux-cluster] strange slowness of ls with 1 newly created file on gfs 1 or 2

Wed Jul 11 16:29:03 UTC 2007

On Tue, 2007-07-10 at 22:23 -0400, Wendy Cheng wrote:
> Pavel Stano wrote:
> 
> >and then run touch on node 1:
> >serpico# touch /d/0/test
> >
> >and ls on node 2:
> >dinorscio:~# time ls /d/0/
> >test
> >
> >  
> >
> 
> What have you expected from a cluster filesystem ? When you touch a file 
> on node 1, it is a "create" that requires at least 2 exclusive locks 
> (directory lock and the file lock itself, among many other things). On a 
> local filesystem such as ext3, disk activities are delayed due to 
> filesystem cache where "touch" writes the data into cache and "ls" reads 
> it from cache on the very same node - all memory operations.  On cluster 
> filesystem, when you do an "ls" on node 2, node 2 needs to ask node 1 to 
> release the locks (few ping-pong messages between two nodes and lock 
> managers via network), the contents inside node 1's cache need to get 
> synced to the shared storage. After node 2 gets the locks, it  has to 
> read contents from the disk.
> 
> I hope the above explanation is clear.
> 
> >and last thing, i try gfs2, but same result
> >
> >
> >  
> >
> -- Wendy

This seems a little bit odd to me. I'm running a RH 7.3 cluster,
pre-redhat Sistina GFS, lock_gulm, 1GB FC shared disk, and have been
since ~2002.

Here's the timing I get for the same basic test between two nodes:

[root at sbc1 root]# cd /mnt/gfs/workspace/cbarry/
[root at sbc1 cbarry]# mkdir tst
[root at sbc1 cbarry]# cd tst
[root at sbc1 tst]# time touch testfile

real    0m0.094s
user    0m0.000s
sys     0m0.000s
[root at sbc1 tst]# time ls -la testfile
-rw-r--r--    1 root     root            0 Jul 11 12:20 testfile

real    0m0.122s
user    0m0.010s
sys     0m0.000s
[root at sbc1 tst]#

Then immediately from the other node:

[root at sbc2 root]# cd /mnt/gfs/workspace/cbarry/
[root at sbc2 cbarry]# time ls -la tst
total 12
drwxr-xr-x    2 root     root         3864 Jul 11 12:20 .
drwxr-xr-x    4 cbarry   cbarry       3864 Jul 11 12:20 ..
-rw-r--r--    1 root     root            0 Jul 11 12:20 testfile

real    0m0.088s
user    0m0.010s
sys     0m0.000s
[root at sbc2 cbarry]#

Now, you cannot tell me 10 seconds is 'normal' for a clustered fs. That
just does not fly. My guess is DLM is causing problems.

-- 
Regards,
-C

Christopher Barry
Systems Engineer, Principal