[Linux-cluster] GFS 6.1 crashed (glock.c)

Dirk Haller haller at atix.de
Tue Feb 27 15:16:21 UTC 2007


Hello list,

we have a running two node GFS 6.1 Cluster and today GFS crashed on one node 
suddenly.

Please have a look at the following log messages:

----
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: 
assertion "FALSE" failed
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   function = 
xmote_bh
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   file 
= /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   time = 
1172575419
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: about to 
withdraw from the cluster
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: waiting for 
outstanding I/O
Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: telling LM to 
withdraw
----

We are not able to reproduce the problem, because we are not sure what is 
responsible for this problem.
I found an older post in this list, where the same problem exists, but there 
is no real solution or a reason why this is happening.

The cluster's operating system is RHEL4 U4 (x86_64). Kernel version is 
2.6.9-42.0.3.ELsmp and the following GFS rpms are installed and in use.
GFS-6.1.6-1
GFS-kernel-2.6.9-60.3
GFS-kernel-smp-2.6.9-60.3
GFS-kernheaders-2.6.9-60.3

Any hints and tips to look deeper into this problem or even a solution would 
be great.
For more details, please have a look at the attached crash log.

Thanks in advance! 

-- 
Gruss / Regards Dirk Haller
-------------- next part --------------
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   function = xmote_bh
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   time = 1172575419
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   function = xmote_bh
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   time = 1172575419
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw
Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Checking 172.23.50.51, Level 0
Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> 172.23.50.51 present on bond1
Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Link for bond1: Detected
Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Link detected on bond1
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Trying to acquire journal lock...
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Looking at journal...
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Acquiring the transaction lock...
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replaying journal...
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replayed 0 of 0 blocks
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: replays = 0, skips = 0, sames = 0
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Journal replayed in 1s
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node2  lock_dlm: withdraw abandoned memory
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  GFS: fsid=ozeane:lt_atlantik.1: withdrawn
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  GFS: fsid=ozeane:lt_atlantik.1: ret = 0x00000003
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node1  GFS: fsid=ozeane:lt_atlantik.0: jid=1: Done
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243
Feb 27 12:23:46 node2  general protection fault: 0000 [1] SMP
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  CPU 0
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  Modules linked in: nfsd exportfs lockd nfs_acl sg cpqci(U) mptctl mptbase netconsole netdump i2c_dev i2c_core sunrpc ext3 jbd button battery ac ohci_hcd hw_random shpchp floppy md5 ipv6 lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) bonding(U) dm_round_robin dm_multipath qla2300 qla2xxx scsi_transport_fc cciss sd_mod scsi_mod dm_snapshot dm_mirror dm_mod tg3 e1000
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  Pid: 17539, comm: lock_dlm1 Tainted: P      2.6.9-42.0.3.ELsmp
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  RIP: 0010:[<ffffffffa013debc>] <ffffffffa013debc>{:gfs:run_queue+477}
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  RSP: 0018:00000100e5891db8  EFLAGS: 00010202
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  RAX: 000000000006000f RBX: 000001006f426920 RCX: 0000000000000001
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  RDX: ffffffffa017e9c0 RSI: 0000000000000001 RDI: 000001006f4268c8
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  RBP: 000001006d604420 R08: ffffffff803e1fe8 R09: 0000000000000001
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  R10: 0000000100000000 R11: ffffffff8011e884 R12: 0000000000000001
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  R13: 560a11000001000a R14: ffffff0000481000 R15: 000001006f4268c8
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  FS:  0000002a96a970e0(0000) GS:ffffffff804e5180(0000) knlGS:00000000f61d1bb0
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  CR2: 0000002a96a86880 CR3: 0000000000101000 CR4: 00000000000006e0
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  Process lock_dlm1 (pid: 17539, threadinfo 00000100e5890000, task 00000100e8ed17f0)
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  Stack: 0000000000000000 000001006f4268f4 000001006d604420 000001006f4268f4
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2         000001006f4268c8 ffffff0000481000 0000000000000003 ffffffffa013facf
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2         0000000000000001 0000000000000001
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2  Call Trace:<ffffffffa013facf>{:gfs:xmote_bh+953} <ffffffffa0141426>{:gfs:gfs_glock_cb+194}
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2         <ffffffffa01a8a75>{:lock_dlm:dlm_async+1989} <ffffffff80133dfe>{__wake_up_common+67}
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2         <ffffffff80133dad>{default_wake_function+0} <ffffffff8014b4f4>{keventd_create_kthread+0}
Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:46 node2         <ffffffffa01a82b0>{:lock_dlm:dlm_async+0}
ÂŽFeb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   function = xmote_bh
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1:   time = 1172575419
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2  GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw
Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   function = xmote_bh
Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093
Feb 27 12:23:47 node1 clurgmgrd: [15224]: <debug> Link detected on bond1
Feb 27 12:23:48 node1 clurgmgrd: [15224]: <notice> Using atlantik as NetBIOS name (service atlantik)
Feb 27 12:23:48 node1 clurgmgrd: [15224]: <debug> Checking Samba instance "atlantik"
Feb 27 12:23:48 node1 clurgmgrd: [15224]: <debug> Checking 172.23.50.52, Level 0
Feb 27 12:23:48 node1 smbd[10164]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85)
Feb 27 12:23:48 node1 smbd[31559]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85)
Feb 27 12:23:48 node1 smbd[10164]:   Unable to connect to CUPS server localhost - Connection refused
Feb 27 12:23:48 node1 smbd[31559]:   Unable to connect to CUPS server localhost - Connection refused
Feb 27 12:23:48 node1 smbd[10164]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85)
Feb 27 12:23:48 node1 smbd[31559]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85)
Feb 27 12:23:48 node1 smbd[10164]:   Unable to connect to CUPS server localhost - Connection refused
Feb 27 12:23:48 node1 smbd[31559]:   Unable to connect to CUPS server localhost - Connection refused
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Trying to acquire journal lock...
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Looking at journal...
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Acquiring the transaction lock...
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replaying journal...
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replayed 0 of 0 blocks
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: replays = 0, skips = 0, sames = 0
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Journal replayed in 1s
Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Done


More information about the Linux-cluster mailing list