[Linux-cluster] Slowness above 500 RRDs

Tue Jun 12 16:17:09 UTC 2007

On Tue, Jun 12, 2007 at 05:43:24PM +0200, Ferenc Wagner wrote:
> David Teigland <teigland at redhat.com> writes:
> 
> > On Tue, Jun 12, 2007 at 04:01:04PM +0200, Ferenc Wagner wrote:
> >
> >> with -l0:
> >> 
> >> filecount=500
> >>   iteration=0 elapsed time=5.966146 s
> >>   iteration=1 elapsed time=0.582058 s
> >>   iteration=2 elapsed time=0.528272 s
> >>   iteration=3 elapsed time=0.936438 s
> >>   iteration=4 elapsed time=0.528147 s
> >> total elapsed time=8.541061 s
> >>
> >> Looks like the bottleneck isn't the explicit locking (be it plock
> >> or flock), but something else, like the built-in GFS locking.
> >
> > I'm guessing that these were run with a single node in the cluster?
> > The second set of numbers (with -l0) wouldn't make much sense
> > otherwise.
> 
> Yes, you guessed right.  For some reason I found it a good idea to
> reveal this at the end only.  (Sorry.)
> 
> > In the end I expect that flocks are still going to be the fastest
> > for you.
> 
> They really seem to be faster, but since the [fp]locking time is
> negligible, it doesn't buy much.
> 
> > I think if you add nodes to the cluster, the -l0 numbers will go up
> > quite a bit.
> 
> Let's see.  Mounting on one more node (and switching on the third):
> 
> # cman_tool services
> type             level name     id       state       
> fence            0     default  00010001 none        
> [1 2 3]
> dlm              1     clvmd    00020001 none        
> [1 2 3]
> dlm              1     test     000a0001 none        
> [1 2]
> gfs              2     test     00090001 none        
> [1 2]

!?!? but now you're using the old RHEL4 generation stuff -- gfs_controld
is completely irrelevant there.  The analysis completely changes between
the RHEL4/RHEL5 (old/new) generations of infrastructure.

Dave