[Linux-cluster] strange slowness of ls with 1 newly created file on gfs 1 or 2
Christopher Barry
Christopher.Barry at qlogic.com
Wed Jul 11 16:29:03 UTC 2007
On Tue, 2007-07-10 at 22:23 -0400, Wendy Cheng wrote:
> Pavel Stano wrote:
>
> >and then run touch on node 1:
> >serpico# touch /d/0/test
> >
> >and ls on node 2:
> >dinorscio:~# time ls /d/0/
> >test
> >
> >
> >
>
> What have you expected from a cluster filesystem ? When you touch a file
> on node 1, it is a "create" that requires at least 2 exclusive locks
> (directory lock and the file lock itself, among many other things). On a
> local filesystem such as ext3, disk activities are delayed due to
> filesystem cache where "touch" writes the data into cache and "ls" reads
> it from cache on the very same node - all memory operations. On cluster
> filesystem, when you do an "ls" on node 2, node 2 needs to ask node 1 to
> release the locks (few ping-pong messages between two nodes and lock
> managers via network), the contents inside node 1's cache need to get
> synced to the shared storage. After node 2 gets the locks, it has to
> read contents from the disk.
>
> I hope the above explanation is clear.
>
> >and last thing, i try gfs2, but same result
> >
> >
> >
> >
> -- Wendy
This seems a little bit odd to me. I'm running a RH 7.3 cluster,
pre-redhat Sistina GFS, lock_gulm, 1GB FC shared disk, and have been
since ~2002.
Here's the timing I get for the same basic test between two nodes:
[root at sbc1 root]# cd /mnt/gfs/workspace/cbarry/
[root at sbc1 cbarry]# mkdir tst
[root at sbc1 cbarry]# cd tst
[root at sbc1 tst]# time touch testfile
real 0m0.094s
user 0m0.000s
sys 0m0.000s
[root at sbc1 tst]# time ls -la testfile
-rw-r--r-- 1 root root 0 Jul 11 12:20 testfile
real 0m0.122s
user 0m0.010s
sys 0m0.000s
[root at sbc1 tst]#
Then immediately from the other node:
[root at sbc2 root]# cd /mnt/gfs/workspace/cbarry/
[root at sbc2 cbarry]# time ls -la tst
total 12
drwxr-xr-x 2 root root 3864 Jul 11 12:20 .
drwxr-xr-x 4 cbarry cbarry 3864 Jul 11 12:20 ..
-rw-r--r-- 1 root root 0 Jul 11 12:20 testfile
real 0m0.088s
user 0m0.010s
sys 0m0.000s
[root at sbc2 cbarry]#
Now, you cannot tell me 10 seconds is 'normal' for a clustered fs. That
just does not fly. My guess is DLM is causing problems.
--
Regards,
-C
Christopher Barry
Systems Engineer, Principal
More information about the Linux-cluster
mailing list