[Linux-cluster] GFS Tuning - it's just slow, to slow for production

Thu Mar 4 15:39:41 UTC 2010

Hi,

On Thu, 2010-03-04 at 09:17 -0600, Alan A wrote:
> Application is single threaded application that handles cgi-bin calls
> from Apache, opens up file for writing and writes data. We can have up
> to 200 concurrent sessions on single application instance hitting the
> GFS mount. We noticed major slowdown once we pass 30 concurrent
> users. 
> 
> We can run 10 instances of this application on 8 threaded server
> without any problem in non-GFS environment, yet I can't get 40 users
> due to GFS slowing down Apache page refresh.
> 
Ok, I suspect that what is going on is that the files are being accessed
randomly from the nodes and thats causing glocks to bounce between nodes
a lot. This is a common situation and can usually be solved by carefully
partitioning the workload between the cluster nodes.

Creating or removing any file in a directory will result in an exclusive
glock request, and that means that only one node can work in that
directory for a period of time. As a result, if you can arrange for file
creations/deletions to be split amoung a number of directories (better
still if the workload from each node can be, at least mostly, confined
to a private set of directories) that will speed things up enormously.

The same goes for accesses to "regular" files. If you can arrange
(assuming that file access is decided according to user) for a certain
group of users to "mostly" use just a single cluster node, that will
also speed things up a lot. Sometimes you can do this kind of thing via
DNS or other similar tricks. The net result is that the information is
much more likely to be cached in the node in question and as a result
the overall I/O rate goes down, and the performance improves.

Using noatime prevents reads from turning into writes too, so that also
makes a big difference,

Steve.