[Linux-cluster] GFS cluster / DLM locking - Mostly idle but high load

Wed Oct 17 07:39:04 UTC 2007

On Wednesday 17 October 2007 09:24:09 Gordan Bobic wrote:
> On Wed, 17 Oct 2007, Nikolas Lam wrote:
> >> I have a cluster (3 nodes at the moment, may grow up to 16) for handling
> >> a lot of small files (Maildir). When I test the system by sending around
> >> 3-5 messages/second I see the load on the cluster nodes go up to about
> >> 20-30, even though the CPUs on the cluster are about 90% idle at all
> >> times.
> >>
> >> I am guessing that this is due to the clustered machines waiting for DLM
> >> locks to be established, which causes a lot of processes to be fighting
> >> to run, but since they don't get to run very soon, they back up and
> >> cause the load averages to go up.
> >>
> >> Assuming the DLM runs over the interface specified by IP and MAC in
> >> cluster.conf, it is running over gigabit ethernet.
> >>
> >> Are there any configuration changes or tuning parameters I can apply to
> >> DLM to alleviate this condition? The machine I'm running the test from
> >> (the one sending messages) is about 1/4 of the spec of each of the
> >> cluster nodes, and it's running a load average of about 0.4. It seems
> >> crazy that a single low-spec node should be able to completely overwhelm
> >> a cluster 12x it's spec several times over.
> >
> > I don't know alot about GFS but since no one else has replied yet, my
> > understanding is that it's not suitable for an applications like what
> > you describe (many small files being opened frequently). I think GFS2,
> > which is still a tech preview, has been redesigned to improve this
> > situation.
>
> Indeed, I am aware that GFS2 is still broken, but I seem to be getting no
> worse a performance out of GFS than I get out of NFS. The only penalty is
> the high load, but the throughput is actually similar. The advantage that
> makes GFS win is that I don't need an arbitrating server to handle the NFS
> exports, which makes the clustering and redundancy a bit tidier.
Gordan,
with your testing did you also try to adapt the size of the 
rsbtbl_size/lkbtbl_size? I would be quite interested if this increases your 
performance or not. Do you have lot of small files?


-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/