[Linux-cluster] bug in kernel or module cman?

Flagman at incomtel.ru Flagman at incomtel.ru
Tue Jul 11 11:48:40 UTC 2006


Здравствуйте, Linux-cluster.

My gfs mountpoints in cluster periodically (approximately, once per 2 weeks) hangs, and in my logs i see this:


Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t
ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t
ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:16:26 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:16:30 cluster kernel: CMAN: sending membership request
Jun 30 23:16:31 cluster kernel: CMAN: got node node0
Jun 30 23:16:31 cluster kernel: CMAN: got node node1
Jun 30 23:17:01 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster
Jun 30 23:17:01 cluster kernel: CMAN: we are leaving the cluster.
Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t
ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t
ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:18:04 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:18:05 cluster kernel: CMAN: sending membership request
Jun 30 23:18:06 cluster kernel: CMAN: got node node0
Jun 30 23:18:06 cluster kernel: CMAN: got node node1
Jun 30 23:18:36 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster
Jun 30 23:18:36 cluster kernel: CMAN: we are leaving the cluster.
Jun 30 23:19:05 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:19:05 cluster kernel: CMAN: sending membership request
Jun 30 23:19:06 cluster kernel: CMAN: got node node1
Jun 30 23:19:06 cluster kernel: CMAN: got node node0
Jun 30 23:19:27 cluster kernel: CMAN: node node0 has been removed from the cluster : Inconsistent cluster view
Jun 30 23:22:39 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages
Jun 30 23:22:39 cluster kernel: ------------[ cut here ]------------
Jun 30 23:22:39 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!
Jun 30 23:22:39 cluster kernel: invalid opcode: 0000 [#1]
Jun 30 23:22:39 cluster kernel: Modules linked in: nfs lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable_nat
ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx
Jun 30 23:22:39 cluster kernel: CPU:    0
Jun 30 23:22:39 cluster kernel: EIP:    0060:[<f8aa95e6>]    Tainted: GF     VLI
Jun 30 23:22:39 cluster kernel: EFLAGS: 00010246   (2.6.16.20-grsec #8)
Jun 30 23:22:39 cluster kernel: eax: 00000000   ebx: 00000080   ecx: f8ab9000   edx: 00000080
Jun 30 23:22:39 cluster kernel: esi: d3352f64   edi: d3352fa0   ebp: 00000000   esp: d3352f58
Jun 30 23:22:39 cluster kernel: ds: 007b   es: 007b   ss: 0068
Jun 30 23:22:39 cluster kernel: Process cman_memb (pid: 7952, threadinfo=d3352000 task=c530e2b0)
Jun 30 23:22:39 cluster kernel: Stack: <0>f2b45920 f8aa12bc f8aaa9f9 f5458dc0 f8aa0712 00000001 f2b45920 f8aaaa9d
Jun 30 23:22:39 cluster kernel:        c530e2b0 f8aa12e5 f8aad021 00000000 00000000 00000000 c530e2b0 c01473ba
Jun 30 23:22:39 cluster kernel:        00100100 00200200 0100001e 00000001 c01473ba 00100100 00200200 00000001
Jun 30 23:22:39 cluster kernel: Call Trace:
Jun 30 23:22:39 cluster kernel:  [<f8aaa9f9>]
Jun 30 23:22:39 cluster kernel:  [<f8aaaa9d>]
Jun 30 23:22:39 cluster kernel:  [<f8aad021>]
Jun 30 23:22:39 cluster kernel:  [<c01473ba>]
Jun 30 23:22:39 cluster kernel:  [<c01473ba>]
Jun 30 23:22:39 cluster kernel:  [<f8aac631>]
Jun 30 23:22:39 cluster kernel:  [<c0131005>]
Jun 30 23:22:39 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39
da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15

------------------ 
And another one:

Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/guid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0
Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0
Jul 10 12:48:46 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jul 10 12:48:48 cluster kernel: CMAN: sending membership request
Jul 10 12:48:48 cluster kernel: CMAN: sending membership request
Jul 10 12:48:48 cluster kernel: CMAN: got node node1
Jul 10 12:53:42 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages
Jul 10 12:53:42 cluster kernel: ------------[ cut here ]------------
Jul 10 12:53:42 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!
Jul 10 12:53:42 cluster kernel: invalid opcode: 0000 [#1]
Jul 10 12:53:42 cluster kernel: Modules linked in: nfs gnbd lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable
_nat ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx
Jul 10 12:53:42 cluster kernel: CPU:    0
Jul 10 12:53:42 cluster kernel: EIP:    0060:[<f8aa95e6>]    Tainted: GF     VLI
Jul 10 12:53:42 cluster kernel: EFLAGS: 00010246   (2.6.16.20-grsec #8)
Jul 10 12:53:42 cluster kernel: eax: 00000000   ebx: 00000080   ecx: f8ab9000   edx: 00000080
Jul 10 12:53:42 cluster kernel: esi: c0722f64   edi: c0722fa0   ebp: 00000000   esp: c0722f58
Jul 10 12:53:42 cluster kernel: ds: 007b   es: 007b   ss: 0068
Jul 10 12:53:42 cluster kernel: Process cman_memb (pid: 31173, threadinfo=c0722000 task=d41e8910)
Jul 10 12:53:42 cluster kernel: Stack: <0>f6e45bc0 f8aa12bc f8aaa9f9 f6e630c0 f8aa0712 00000003 f6e45bc0 f8aaaa9d
Jul 10 12:53:42 cluster kernel:        d41e8910 f8aa12e5 f8aad021 00000000 00000000 00000000 d41e8910 c01473ba
Jul 10 12:53:42 cluster kernel:        00100100 00200200 0100001e 00000003 c01473ba 00100100 00200200 00000001
Jul 10 12:53:42 cluster kernel: Call Trace:
Jul 10 12:53:42 cluster kernel:  [<f8aaa9f9>]
Jul 10 12:53:42 cluster kernel:  [<f8aaaa9d>]
Jul 10 12:53:42 cluster kernel:  [<f8aad021>]
Jul 10 12:53:42 cluster kernel:  [<c01473ba>]
Jul 10 12:53:42 cluster kernel:  [<c01473ba>]
Jul 10 12:53:42 cluster kernel:  [<f8aac631>]
Jul 10 12:53:42 cluster kernel:  [<c0131005>]
Jul 10 12:53:42 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39
da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15
Jul 10 13:03:02 cluster kernel:  releasing gnbd class
Jul 10 13:03:02 cluster kernel: releasing gnbd class
Jul 10 13:03:05 cluster last message repeated 126 times

Actually, all requests to GFS moutpoint gets hang forever to wait something, and all 100% CPU time passeed to wait state.
At that time servers with imported GNBD`s does not go to soft reboot or shutdown anyway. Only hard reset/poweroff helps.
The dump i provide is from main cluster node that hosts hard disks with partition that i shared over GNBD with GFS.

BTW, my kernel patched with grsecurity patch (as you can see at top of provided logs).

what is a solution? What for cman_tool require a stack size over 50Mb and over 100Mb??? 

-- 
С уважением,
 Flagman                          mailto:Flagman at incomtel.ru
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060711/a3d83642/attachment.htm>


More information about the Linux-cluster mailing list