[Linux-cluster] GFS directory freezing unexpectedly under pressure ...

Sun Nov 4 19:40:20 UTC 2007

Well, I really think so: I always used 'gfs' in my command lines, never 
'gfs2'  ...; gfs_fsck runs well on that filesystem, while gfs2_fsck returns:

[root at orarac1 ~]# gfs2_fsck /dev/vg_share/lv_share_1
Initializing fsck
Old gfs1 file system detected.

However, the same question you're now asking to me came into my own mind 
when I wrote the post: nobody seems to have problems with GFS1, only 
with GFS2 (which is not yet available for production); maybe I'm messing 
up something ...

To make my doubts even greater, I noticed (with my great surprise) that 
my gfs kernel module is using the gfs2 kernel module:

[root at orarac1 ~]# lsmod
Module                  Size  Used by
....
gfs                   302204  2
lock_dlm               55385  3
gfs2                  522965  2 gfs,lock_dlm
dlm                   131525  24 lock_dlm
configfs               62301  2 dlm
vmnet                 106288  3
vmmon                 176716  0
sunrpc                195977  1
ipv6                  410017  22
cpufreq_ondemand       40401  2
dm_mirror              60993  0
dm_mod                 93841  6 dm_mirror
video                  51273  0
sbs                    49921  0
i2c_ec                 38593  1 sbs
i2c_core               56129  1 i2c_ec
button                 40545  0
battery                43849  0
asus_acpi              50917  0
acpi_memhotplug        40133  0
ac                     38729  0
parport_pc             62313  0
lp                     47121  0
parport                73165  2 parport_pc,lp
k8_edac                49537  0
edac_mc                58657  1 k8_edac
shpchp                 70765  0
bnx2                  119057  0
pcspkr                 36289  0
serio_raw              40517  0
sg                     69737  0
qla2400               242944  0
qla2300               159360  0
usb_storage           116257  0
cciss                  92361  4
ext3                  166609  2
jbd                    93873  1 ext3
ehci_hcd               65229  0
ohci_hcd               54493  0
uhci_hcd               57433  0
qla2xxx               309664  7 qla2400,qla2300
sd_mod                 54081  9
scsi_mod              184057  5 sg,usb_storage,cciss,qla2xxx,sd_mod
qla2xxx_conf          334856  1
intermodule            37508  2 qla2xxx,qla2xxx_conf

Is it possible that I'm wrongly using GFS via GFS2 (or something like 
that) and that my configuration is not stable simply because the gfs2 
kernel module is not yet ready for production? Maybe I have to change 
something ...

I'd greatly appreciate your sharing your working GFS1 configuration with 
me, if it's possible.
Thank you
Tyzan

[root at orarac1 ~]# gfs_tool df
/share:
  SB lock proto = "lock_dlm"
  SB lock table = "orarac:gfs_share_1"
  SB ondisk format = 1309
  SB multihost format = 1401
  Block size = 4096
  Journals = 10
  Resource Groups = 796
  Mounted lock proto = "lock_dlm"
  Mounted lock table = "orarac:gfs_share_1"
  Mounted host data = "jid=1:id=196610:first=0"
  Journal number = 1
  Lock module flags = 0
  Local flocks = FALSE
  Local caching = FALSE
  Oopses OK = FALSE

  Type           Total          Used           Free           use%
  ------------------------------------------------------------------------
  inodes         30             30             0              100%
  metadata       13197          12193          1004           92%
  data           52080821       6084973        45995848       12%

Gordan Bobic wrote:
> Are you sure you are using GFS1 and not GFS2? I've experienced that 
> problem with GFS2, but not with GFS1.
>
> Gordan
>
> tam_annie at aliceposta.it wrote:
>> Hi everybody,
>>
>>    when my GFS (v. 1) filesystems experience some heavy load (ex. 
>> vmware virtual machine OS installation, oracle rman backup using gfs 
>> filesystems as flash recovery area), they "freeze" unexpectedly.
>>    More precisely, not the whole gfs filesystem freezes, but only the 
>> directory interested by the load: I can't even ls the contents of 
>> that directory, everything interesting it seems to hang hopelessly. I 
>> can't find any related errors in my logs; cluster utilities output 
>> (clustat, group_tool -v, cman_tool nodes) looks absolutely normal ... 
>> (no fencing is occurring): I can even go on working on the other 
>> directories of the same gfs!!!
>> The only way out I've found is to restart the cluster. I can 
>> reproduce deterministically the problem, but I don't know how to 
>> debug it.
>>
>>   I noted that the problem arises on both my 2-node and my 1-node 
>> cluster, either when I mount gfs with 'noquota,noatime' or not.
>>
>>   Your help is my hope:
>>   thank you in advance!
>>   Tyzan
>>
>> ___________________________________________________________________________________________________ 
>>
>> Linux xxxxxxxxxxxxxxxx 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 
>> EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
>>
>> lvm2-cluster-2.02.16-3.el5
>> kmod-gfs-0.1.16-5.2.6.18_8.1.8.el5
>> gfs2-utils-0.1.25-1.el5
>> gfs-utils-0.1.11-3.el5
>> cman-2.0.64-1.0.1.el5
>> rgmanager-2.0.24-1.el5
>>
>>
>>   [root at orarac1 ~]# gfs_tool gettune /share
>> ilimit1 = 100
>> ilimit1_tries = 3
>> ilimit1_min = 1
>> ilimit2 = 500
>> ilimit2_tries = 10
>> ilimit2_min = 3
>> demote_secs = 300
>> incore_log_blocks = 1024
>> jindex_refresh_secs = 60
>> depend_secs = 60
>> scand_secs = 5
>> recoverd_secs = 60
>> logd_secs = 1
>> quotad_secs = 5
>> inoded_secs = 15
>> quota_simul_sync = 64
>> quota_warn_period = 10
>> atime_quantum = 3600
>> quota_quantum = 60
>> quota_scale = 1.0000   (1, 1)
>> quota_enforce = 1
>> quota_account = 1
>> new_files_jdata = 0
>> new_files_directio = 0
>> max_atomic_write = 4194304
>> max_readahead = 262144
>> lockdump_size = 131072
>> stall_secs = 600
>> complain_secs = 10
>> reclaim_limit = 5000
>> entries_per_readdir = 32
>> prefetch_secs = 10
>> statfs_slots = 64
>> max_mhc = 10000
>> greedy_default = 100
>> greedy_quantum = 25
>> greedy_max = 250
>> rgrp_try_threshold = 100
>>
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>