[Linux-cluster] Why GFS is so slow? What it is waiting for?

Wendy Cheng s.wendy.cheng at gmail.com
Tue May 13 23:22:05 UTC 2008


Ja S wrote:
> Hi, Wendy:
>
> Thanks for your so prompt and kind explanation. It is
> very helpful. According to your comments, I did
> another test. See below:
>  
> # stat abc/
>   File: `abc/'
>   Size: 8192            Blocks: 6024       IO Block:
> 4096   directory
> Device: fc00h/64512d    Inode: 1065226     Links: 2
> Access: (0770/drwxrwx---)  Uid: (    0/    root)  
> Gid: (    0/    root)
> Access: 2008-05-08 06:18:58.000000000 +0000
> Modify: 2008-04-15 03:02:24.000000000 +0000
> Change: 2008-04-15 07:11:52.000000000 +0000
>
> # cd abc/
> # time ls | wc -l 
> 31764
>
> real    0m44.797s
> user    0m0.189s
> sys     0m2.276s
>
> The real time in this test is much shorter than the
> previous one. However, it is still reasonable long. As
> you said, the ‘ls’ command only reads the single
> directory file. In my case, the directory file itself
> is only 8192 bytes. The time spent on disk IO should
> be included in “sys 0m2.276s”. Although DLM needs time
> to lookup the location of the corresponding master
> lock resource and to process locking, the system
> should not take about 42 seconds to complete the “ls”
> command. So, what is the hidden issue or is there a
> way to identify possible bottlenecks? 
>
>   
IIRC, disk IO wait time is excluded from "sys", so you really can't 
conclude the lion share of your wall (real) time is due to DLM locking. 
We don't know for sure unless you can provide the relevant profiling 
data (try to learn how to use OProfile and/or SystemTap to see where 
exactly your system is waiting at). Latency issues like this is tricky. 
It would be foolish to conclude anything just by reading the command 
output without knowing the surrounding configuration and/or run time 
environment.

If small file read latency is important to you, did you turn off storage 
device's readahead ? Did you try different Linux kernel elevator 
algorithms ? Did you make sure your other network traffic didn't block 
DLM traffic ? Be aware latency and bandwidth are two different things. A 
big and fat network link doesn't automatically imply a quick response 
time though it may carry more bandwidth.

-- Wendy




More information about the Linux-cluster mailing list