[Linux-cluster] Panic

andre at hudat.com andre at hudat.com
Sun Oct 1 19:13:44 UTC 2006


I have the following panic on two nodes hours apart. Each node Is in a
different state ( as in states of the US ). NO I am not running a cluster
over a WAN, just two separate clusters in two different locations. Files are
written on one cluster and I have a script that does an SCP of the file to
the other cluster. Both machines running the latest RHEL4 with the latest
GFS updates. This just started happening. Happened twice since Friday
morning. Any hints ? What is happening with clvmd here ? What does the
global conflict message mean ?

--
Andre

Oct  1 13:26:47 fs1.fl.apexrad.com kernel: purged 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd mark waiting requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd marked 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd recover event 5 done
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd move flags 0,0,1 ids 2,5,5
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd process held requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd processed 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd resend marked requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd resent 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: clvmd recover event 5 finished
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 total nodes 1
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 rebuild resource directory
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 rebuilt 0 resources
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 recover event 4 done
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 move flags 0,0,1 ids 0,4,4
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 process held requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 processed 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 recover event 4 finished
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 move flags 1,0,0 ids 4,4,4
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 move flags 0,1,0 ids 4,7,4
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 move use event 7
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 recover event 7
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 add node 2
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 total nodes 2
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 rebuild resource directory
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 rebuilt 6 resources
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 purge requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 purged 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 mark waiting requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 marked 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 recover event 7 done
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 move flags 0,0,1 ids 4,7,7
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 process held requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 processed 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 resend marked requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 resent 0 requests
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01 recover event 7 finished
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 245-253 ex
1 own 4158637196, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 254-26c ex
1 own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 26d-27b ex
1 own 4158637196, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 27c-28b ex
1 own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 28c-29b ex
1 own 4158637196, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 29c-2ac ex
1 own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 2ad-2b9 ex
1 own 4158637196, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 2ba-2c7 ex
1 own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-ff ex 0
own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 c8-2c7 ex
0 own 4158638348, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 200-2c7 ex
0 own 4158638348, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1 ex 0
own 4101191756, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638828, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-2c7 ex 0
own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-fff ex 0
own 4158638348, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 2c8-fff ex
0 own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638348, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-3f ex 0
own 4158636236, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638348, pid 444u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-fff ex 1
own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
70000-7ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
80000-8ffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
90000-9ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
a0000-affff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
b0000-bffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
c0000-cffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
d0000-dffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
e0000-effff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
f0000-fffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
100000-10ffff ex 1 own 4101191756, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
110000-11ffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
120000-12ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
130000-13ffff ex 1 own 4158638828, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
140000-14ffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
150000-15ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
160000-16ffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
170000-17ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
180000-18ffff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
190000-19ffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1a0000-1affff ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1b0000-1bffff ex 1 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c0000-1c2aa7 ex 1 own 4158636236, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-ff ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 200-2ff ex
0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c28a8-1c2aa7 ex 0 own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c26da-1c27d9 ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 44b-54a ex
0 own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c0ba5-1c0ca4 ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c0780-1c087f ex 0 own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c12a8-1c13a7 ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c277d-1c287c ex 0 own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c276a-1c2869 ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c10eb-1c11ea ex 0 own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 1c04-1d03
ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 fe00-ffff
ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1 ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-fff ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0
1c1aa8-1c2aa7 ex 0 own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-fff ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-3f ex 0
own 4158637196, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: 4424 global conflict 0 0-1ff ex 0
own 4158638348, pid 296u
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lock_dlm:  Assertion failed on
line 428 of file /usr/src/build/765787-i686/BUIL
D/gfs-kernel-2.6.9-58/smp/src/dlm/lock.c
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lock_dlm:  assertion:  "!error"
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lock_dlm:  time = 185852977
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: lvol01: num=2,684f0dd err=-22
cur=3 req=5 lkf=44
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: ------------[ cut here
]------------
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: kernel BUG at
/usr/src/build/765787-i686/BUILD/gfs-kernel-2.6.9-58/smp/src/dlm/
lock.c:428!
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: invalid operand: 0000 [#1]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: SMP
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: Modules linked in: nfs nfsd
exportfs lockd nfs_acl autofs4 i2c_dev i2c_core loc
k_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc dm_mirror
button battery ac uhci_hcd ehci_hcd hw_random e10
00 floppy sg ext3 jbd dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: CPU:    0
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: EIP:    0060:[<f8df7779>]    Not
tainted VLI
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: EFLAGS: 00010246
(2.6.9-42.ELsmp)
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: EIP is at do_dlm_lock+0x134/0x14e
[lock_dlm]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: eax: 00000001   ebx: ffffffea
ecx: d18c5dc0   edx: f8dfc221
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: esi: f8df7798   edi: c387e600
ebp: e194f780   esp: d18c5dbc
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: ds: 007b   es: 007b   ss: 0068
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: Process rmdir (pid: 23174,
threadinfo=d18c5000 task=f64f0b30)
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: Stack: f8dfc221 20202020 32202020
20202020 20202020 34383620 64643066 32200018
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:        20202020 e194f780 00000001
00000003 e194f780 f8df7828 00000005 f8dff940
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:        f8919000 f8eba936 00000000
00000001 d16c1dd4 d16c1db8 f8919000 f8eb08fe
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: Call Trace:
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8df7828>]
lm_dlm_lock+0x49/0x52 [lock_dlm]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eba936>]
gfs_lm_lock+0x35/0x4d [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eb08fe>]
gfs_glock_xmote_th+0x130/0x172 [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eaffbd>]
rq_promote+0xc8/0x147 [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eb01a9>] run_queue+0x91/0xc1
[gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eb11b9>]
gfs_glock_nq+0xcf/0x116 [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eb18f5>] nq_m_sync+0x44/0x64
[gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8eb1a5e>]
gfs_glock_nq_m+0x149/0x15d [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<f8ec87d9>]
gfs_rmdir+0x6a/0x168 [gfs]
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<c0168a55>]
vfs_rmdir+0x1a3/0x1f1
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<c0168b44>] sys_rmdir+0xa1/0xf4
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<c011ae55>]
do_page_fault+0x0/0x5c6
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  [<c02d4703>]
syscall_call+0x7/0xb
Oct  1 13:26:47 fs1.fl.apexrad.com kernel: Code: 26 50 0f bf 45 24 50 53 ff
75 08 ff 75 04 ff 75 0c ff 77 18 68 4c c3 df f
8 e8 32 b1 32 c7 83 c4 38 68 21 c2 df f8 e8 25 b1 32 c7 <0f> 0b ac 01 5e c1
df f8 68 23 c2 df f8 e8 e0 a8 32 c7 83 c4 20
Oct  1 13:26:47 fs1.fl.apexrad.com kernel:  <0>Fatal exception: panic in 5
seconds
Oct  1 13:49:59 fs1.fl.apexrad.com syslogd 1.4.1: restart.




More information about the Linux-cluster mailing list