[Linux-cluster] GFS or Web Server Performance issues?

isplist at logicore.net isplist at logicore.net
Wed Nov 28 16:54:53 UTC 2007


Come on folks, you're making me feel like I should give up or something  :)

 From Gordan;

>I think a part of the problem is perception.

Perception can only be what marketing says it will do. I can't say I have once 
seen anything that says it won't scale performance wise by virtue of what it 
is, a cluster. I looked at SSI and other types of clusters, this seemed to be 
the key for my LAMP based services. 

>leads to _LOWER_ performance on I/O bound processes. If it's CPU bound,

Sure, there is a performance cost from each node but I would guess it's an 
acceptable cost so long as I can work out the I/O side of things. I'm guessing 
a lot of folks have come up with all sorts of good ways of handling this 
otherwise, no one would be using these tools.

>then sure, it'll help. But on I/O it'll likely do you harm. It's more
>about redundancy and graceful degradation than performance. There's no way
>of getting away from the fact that a cluster has to do more work than a
>single node, just because it has to keep itself in sync.

When I started learning about the RH cluster suite and GFS, it was because the 
hype was that I could build a highly scalable, highly available environment 
where I could share data in a way I had not been able to before. 

That sharing data part has been true and I love what I can achieve with that 
alone. I am now at the performance stage, need to get the most I can out of 
what I have. Time to ask questions so that I can have some basic starting 
points. 

Even though each node has to do more work and the cluster costs me in 
performance overall, that seems to be pretty much like any other application 
out there. Applications cost part of the machines resources, that's just the 
way things are.

>about redundancy and graceful degradation than performance. There's no way
>of getting away from the fact that a cluster has to do more work than a
>single node, just because it has to keep itself in sync.

Got it. 
However, I've already spent months learning about the cluster suite, GFS, and 
much harder has been all of the networking involved, the fibre channel 
switches, the fibre channel storage and it's endless needs, the list goes on 
and on, don't want to bore you.

The bottom line is, I have a working cluster, sharing GFS space. I know it's 
costing me some resources from each node, I understand this. However, there's 
plenty left to work with :). I could use some input on where to start to get 
the most performance I can out of what I have. 

>The only way clustering will give you scaleable performance benefit is
>with partitioned (as opposed to shared) data. Shared data clustering is
>about convenience and redundancy, not about performance.

I agree but this is a very general statement. In my case, I have a LAMP 
application which benefits more from having shared GFS space. I might move to 
purely distributed at some point but for now, I'd prefer to find out what I 
can do with what I've built so far.

So, looking for help. I am of course willing to give what ever information I 
can provide in order to get that help. Since I don't know the answer, I can't 
ask the right questions just yet so the question is basic. Where do I start 
looking for performance enhancements now that my cluster is ready?

 From Wendy;

>Be aware that cluster management and its associated performance tuning is
>really not a trivial task. It is kind of hard to give a "catch-all"
>advice in a mailing list, particularly we have been participating the
>discussions on our spare time basis.

I don't think anyone who bothers to take this on would think that it's trivial 
or anything less than something they do have to spend some time at. 

Asking 'catch-all' questions is often the only way to get the ball rolling, to 
invite additional questions which often lead to more meaningful help. At 
least, in all of the endless meetings I've been at where it's fact finding, 
we'll often start with some basics and that turns into more relevant things.

So, my question is again, the same :). Now that I have my cluster up and 
running, I still would like to ask those in the list for thoughts, input, 
ideas on where they started looking for performance enhancements. 

I have a basic, non cluster/GFS  list of course;

I'll need to work on fine tuning my web servers, my storage and even my 
networking. Only thing is, are there some things I should be aware of when 
doing this which are cluster/GFS related tips, input that others might have?

Mike






More information about the Linux-cluster mailing list