[Linux-cluster] Performance degradation after reboot
kadlec at mail.kfki.hu
Tue Feb 3 15:44:10 UTC 2009
Due to a major power restructuring we had to shutdown our GFS cluster at
Saturday. Since then we have been suffering a serious performance
degradation. Previously system load was usually less than 1, in spikes
3-4. Yesterday we had 180(!), without no apparent reason: network
interfaces are OK (settings just right, no error/packet loss), no settings
modified, usage of the cluster did not change. GFS is over AoE: the Coraid
boxes are just fine, no RAID degradation.
At starting up, ntpd on some systems could not set the system clock as it
was off by more than 180s. We fixed that, rebooted the systems one by one
just in case, helped nothing.
What is more strange, when the init script issues the command
gfs_tool settune /gfs/home statfs_fast 1
it takes quite a lot of time, around 15-20s.
What could go wrong, on a nicely working system? Might there be
filesystem inconsistencies, which can produce such slowdown and we should
The gfs parameters which are tuned:
scand_secs 3 [This one was added today.]
Any idea can be useful.
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
More information about the Linux-cluster