[Linux-cluster] Performance degradation after reboot

Kadlecsik Jozsef kadlec at mail.kfki.hu
Tue Feb 3 15:44:10 UTC 2009


Due to a major power restructuring we had to shutdown our GFS cluster at 
Saturday. Since then we have been suffering a serious performance 
degradation. Previously system load was usually less than 1, in spikes 
3-4. Yesterday we had 180(!), without no apparent reason: network 
interfaces are OK (settings just right, no error/packet loss), no settings 
modified, usage of the cluster did not change. GFS is over AoE: the Coraid 
boxes are just fine, no RAID degradation.

At starting up, ntpd on some systems could not set the system clock as it 
was off by more than 180s. We fixed that, rebooted the systems one by one 
just in case, helped nothing.

What is more strange, when the init script issues the command

gfs_tool settune /gfs/home statfs_fast 1

it takes quite a lot of time, around 15-20s.

What could go wrong, on a nicely working system? Might there be
filesystem inconsistencies, which can produce such slowdown and we should 
run gfs_fsck?

The gfs parameters which are tuned:

statfs_slots 128
statfs_fast 1
demote_secs 30
glock_purge 50
scand_secs 3	[This one was added today.]

Any idea can be useful.

Best regards,
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary

More information about the Linux-cluster mailing list