[Linux-cluster] Problem with write performance

Mon Jan 18 14:29:34 UTC 2010

Hello Arturo,

The following Red Hat KB article may be useful in your case: http://kbase.redhat.com/faq/docs/DOC-6533 . The glock_purge and demote_secs tunables have been quite useful to us in some cases, and your case looks similar.

The only drawback of this method is that you will need to write a script somewhere to set the tunables, as they are not persistent.

Regards,

Javier

________________________________________
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Arturo Gonzalez Ferrer
Sent: Saturday, January 16, 2010 9:35 PM
To: linux clustering
Subject: [Linux-cluster] Problem with write performance

Hello,

We've set up recently a rhel 5.4 cluster of 3 nodes for a Moodle high-availability website, where the sessions and data are share in a GFS2 volume.

We found that, while the read performance have been constantly good, there is a problem with writes, as the system decrease its peformance after some conditions. We think that it can be related with our backup procedure:

We do an NFS export of the GFS2 volume from one of the nodes, so that we can backup the volume every night externally, from a veritas backup client. After that, we find next morning that the write performance has decreased a lot, so that it is practically unusable for some big files and for the operation of cloning an existing course (zip, unzip the data of the course in a new folder). After some experiments with the writes and clone operations, we have found a way that improve the issue, but we think that there should be a better way. What we did was to add the next entry to crontab in every node:

0-59/10 * * * * sync; echo 3 > /proc/sys/vm/drop_caches

So that the lock caches are cleaned every ten minutes, as we didn't notice that it affects badly to the performance of the system, and effectivily, it improves the write performance somehow, at least making it usable.

Do you think this could be an option? Do you have a better explanation for it? Any other ideas what could we do?

We have been since then having problems with apache service, being stopped sometimes (not very offen) in one of the nodes. I think that it could be related to this maintenance of the vm caches... but I'm not sure.

Best regards,
Arturo.