[Linux-cluster] kernel bug at fs/dlm/lowcomms.c:647!

Welterlen Benoit Benoit.Welterlen at bull.net
Mon Oct 18 15:33:20 UTC 2010


Hi all,


I'm doing some tests on OCFS2 with a 2.6.32-100 kernel (Oracle) or 
RHEL6/fedora and I have a hang in lowcomms.c as you can see below.
I have a crash dump if you need more information. I'm lost and I need 
help to know where to search to debug this problem.

Thanks

Regards,

Benoit



Kernel 2.6.32-100.0.19.el5 on an x86_64
chili0 login: ------------[ cut here ]------------
kernel BUG at fs/dlm/lowcomms.c:647!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
CPU 34
Modules linked in: ocfs2(U) ocfs2_nodemanager(U) nfsd(U) exportfs(U) 
sctp(U) libcrc32c(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U) 
configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U) 
ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) 
sunrpc(U) ipv6(U) scsi_dh_emc(U) dm_round_robin(U) dm_multipath(U) 
iTCO_wdt(U) iTCO_vendor_support(U) mlx4_core(U) i2c_i801(U) igb(U) 
pcspkr(U) i2c_core(U) ioatdma(U) dca(U) ahci(U) uhci_hcd(U) ehci_hcd(U) 
lpfc(U) scsi_transport_fc(U) scsi_tgt(U) [last unloaded: ocfs2_nodemanager]
Pid: 27062, comm: dlm_recv/34 Not tainted 2.6.32-100.0.19.el5 #1 bullx 
super-node
RIP: 0010:[<ffffffffa02406c3>]  [<ffffffffa02406c3>] 
receive_from_sock+0x554/0x6ed [dlm]
RSP: 0018:ffff880c77c6bc60  EFLAGS: 00010246
RAX: 0000000000000030 RBX: ffff8810774b8d30 RCX: ffff88087c4548f8
RDX: 0000000000000030 RSI: ffff880876dce000 RDI: ffffffff81398045
RBP: ffff880c77c6be50 R08: ffff000000000000 R09: ffff880c77c6b900
R10: ffff880c77c6b8f0 R11: 0000000000000030 R12: 0000000000000030
R13: ffff8810774b8d20 R14: ffff880c7caa00c0 R15: ffffffffa023ecca
FS:  0000000000000000(0000) GS:ffff88048e600000(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000fcb078 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dlm_recv/34 (pid: 27062, threadinfo ffff880c77c6a000, task 
ffff880c7caa00c0)
Stack:
  ffff880c77c6bc70 ffffffff8122fa24 ffff880c77c6bc90 ffffffff8122faca
<0> ffff88048e414ec0 0000100000000002 0000000000000000 ffffffff00000000
<0> 0000000000000000 0000000000000000 ffffffffa024bb20 0000000000000030
Call Trace:
  [<ffffffff8122fa24>] ? cpumask_next+0x19/0x1b
  [<ffffffff8122faca>] ? cpumask_next_and+0x20/0x32
  [<ffffffffa023ecca>] ? process_recv_sockets+0x0/0x28 [dlm]
  [<ffffffffa023ecea>] process_recv_sockets+0x20/0x28 [dlm]
  [<ffffffff81071802>] worker_thread+0x14d/0x1ed
  [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d
  [<ffffffff810716b5>] ? worker_thread+0x0/0x1ed
  [<ffffffff810756d3>] kthread+0x6e/0x76
  [<ffffffff81012dea>] child_rip+0xa/0x20
  [<ffffffff81075665>] ? kthread+0x0/0x76
  [<ffffffff81012de0>] ? child_rip+0x0/0x20
Code: 29 e7 ff ff e9 2d 01 00 00 41 8b 74 24 10 0f b7 d0 48 c7 c7 d1 8c 
24 a0 31 c0 e8 ab 71 e1 e0 e9 12 01 00 00 41 83 7d 08 00 75 04 <0f> 0b 
eb fe 4d 8d 7d 68 49 be 00 00 00 00 00 16 00 00 41 8b 55
RIP  [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
  RSP <ffff880c77c6bc60>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-100.0.19.el5 (mockbuild at ca-build9.us.oracle.com) 
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Sep 17 
17:51:41 EDT 2010
Command line: ro root=/dev/mapper/vg_chili0-lv_root 
rd_LVM_LV=vg_chili0/lv_root rd_LVM_LV=vg_chili0/lv_swap rd_NO_LUKS 
rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
KEYBOARDTYPE=pc KEYTABLE=fr-pc cgroup_disable=memory selinux=0 
pcie_aspm=off nmi_watchdog=0 console=ttyS1,115200 maxcpus=1 
reset_devices memmap=exactmap memmap=640K at 0K memmap=195948K at 33408K 
elfcorehdr=229356K memmap=308K#1993940K memmap=16K#2077704K 
memmap=4K#2077748K memmap=4K#2077764K memmap=44K#2077768K 
memmap=72K#2077812K memmap=4K#2077884K memmap=4K#2077888K 
memmap=4K#2077892K memmap=4K#2078024K memmap=2716K#2078052K 
memmap=1024K#69204860K memmap=128K#69205884K
KERNEL supported cpus:
   Intel GenuineIntel
   AMD AuthenticAMD
   Centaur CentaurHauls
BIOS-provided physical RAM map:

Here is the configuration :

[root at chili1 ~]#  crm configure show
node chili0
node chili1
primitive IPaddr-dhcp ocf:Bull:IPaddr \
         params ip="11.1.0.20" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-dns ocf:Bull:IPaddr \
         params ip="11.1.0.21" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-monitoring-master ocf:Bull:IPaddr \
         params ip="11.1.0.22" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-mysql ocf:Bull:IPaddr \
         params ip="11.1.0.23" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-nfs ocf:Bull:IPaddr \
         params ip="11.1.0.24" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-postgresql ocf:Bull:IPaddr \
         params ip="11.1.0.25" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive IPaddr-tftp ocf:Bull:IPaddr \
         params ip="11.1.0.26" \
         op monitor on-fail="restart" interval="30" \
         meta migration-threshold="1"
primitive dhcp-dhcp-server lsb:dhcpd \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive dlm ocf:pacemaker:controld \
         op monitor interval="120s"
primitive dns-dns-server lsb:named \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive fs-BCM-MCO ocf:Bull:Filesystem \
         params device="-L HA_MNGT:MCO" directory="/BCM/MCO" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive fs-BCM-conf ocf:Bull:Filesystem \
         params device="-L HA_MNGT:CONF" directory="/BCM/conf" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive fs-BCM-console ocf:Bull:Filesystem \
         params device="-L HA_MNGT:CONSOLE" directory="/BCM/console" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive fs-BCM-data ocf:Bull:Filesystem \
         params device="-L HA_MNGT:RRDDBs" directory="/BCM/data" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive fs-BCM-log ocf:Bull:Filesystem \
         params device="-L HA_MNGT:LOGs" directory="/BCM/log" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive fs-BCM-storage ocf:Bull:Filesystem \
         params device="-L HA_MNGT:STORAGE" directory="/BCM/storage" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive monitoring-master-errorManager lsb:errorManager \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive monitoring-master-eventManager lsb:eventManager \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive monitoring-master-nagios lsb:nagios \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive monitoring-master-powerManager lsb:powerManager \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive monitoring-master-syslog-ng lsb:syslog-ng-monitoring \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive mysql-fs-DBs ocf:Bull:Filesystem \
         params device="-L HA_MNGT:MYSQLDBs" directory="/var/lib/mysql" 
fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive mysql-mysqld ocf:heartbeat:mysql \
         params binary="/usr/bin/mysqld_safe" 
pid="/var/run/mysqld/mysqld.pid" \
         op start interval="0" timeout="" 120 \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive nfs-nfs-server ocf:heartbeat:nfsserver \
         params nfs_init_script="/etc/init.d/nfs" 
nfs_notify_cmd="/usr/sbin/sm-notify" 
nfs_shared_infodir="/BCM/log/nfs-server-logs" nfs_ip="11.1.0.24" \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60"
primitive o2cb ocf:ocfs2:o2cb \
         op monitor interval="120s"
primitive postgresql-clusterdb ocf:heartbeat:pgsql \
         params pgdata="/var/lib/pgsql/data" \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
primitive postgresql-fs-DBs ocf:Bull:Filesystem \
         params device="-L HA_MNGT:PGSQLDBs" 
directory="/var/lib/pgsql/data" fstype="ocfs2" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40"
primitive restofencechili0 stonith:fence_ipmilan \
         params ipaddr="11.1.0.10" login="super" passwd="pass" 
pcmk_host_check="none" action="diag" \
         meta target-role="Stopped"
primitive restofencechili1 stonith:fence_ipmilan \
         params ipaddr="11.1.0.11" login="super" passwd="pass" 
pcmk_host_check="none" action="diag" \
         meta target-role="Stopped"
primitive syslog-ng-syslog-ng lsb:hasyslog-ng \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60" \
         op monitor interval="20" timeout="40" on-fail="restart" \
         meta migration-threshold="3"
primitive tftp-tftp-server lsb:xinetd \
         op start interval="0" timeout="120" \
         op stop interval="0" timeout="120" \
         op monitor interval="20" timeout="60" on-fail="restart" 
start-delay="60" \
         meta migration-threshold="1"
group dhcp IPaddr-dhcp dhcp-dhcp-server \
         meta target-role="Started" migration-threshold="1"
group dns IPaddr-dns dns-dns-server \
         meta target-role="Started" migration-threshold="1"
group monitoring-master IPaddr-monitoring-master 
monitoring-master-syslog-ng monitoring-master-nagios 
monitoring-master-errorManager monitoring-master-eventManager 
monitoring-master-powerManager \
         meta target-role="Started" migration-threshold="1"
group mysql IPaddr-mysql mysql-mysqld \
         meta target-role="Started" migration-threshold="1"
group nfs IPaddr-nfs nfs-nfs-server \
         meta target-role="Started" migration-threshold="1"
group postgresql IPaddr-postgresql postgresql-clusterdb \
         meta target-role="Started" migration-threshold="1"
group tftp IPaddr-tftp tftp-tftp-server \
         meta target-role="Started" migration-threshold="1"
clone clone-dlm dlm \
         meta target-role="Started" globally-unique="false" 
interleave="true"
clone clone-fs-BCM-MCO fs-BCM-MCO \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-fs-BCM-conf fs-BCM-conf \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-fs-BCM-console fs-BCM-console \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-fs-BCM-data fs-BCM-data \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-fs-BCM-log fs-BCM-log \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-fs-BCM-storage fs-BCM-storage \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-mysql-fs-DBs mysql-fs-DBs \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-o2cb o2cb \
         meta target-role="Started" globally-unique="false" 
interleave="true"
clone clone-postgresql-fs-DBs postgresql-fs-DBs \
         meta interleave="true" ordered="false" true target-role="Started" \
         meta target-role="Started"
clone clone-syslog-ng syslog-ng-syslog-ng \
         meta interleave="true" ordered="false" target-role="Stopped" \
         meta target-role="Stopped"
location forbiddenloc-restofencechili0 restofencechili0 -inf: chili0
location forbiddenloc-restofencechili1 restofencechili1 -inf: chili1
location loc1-group-dhcp dhcp +100: chili0
location loc1-group-dns dns +100: chili1
location loc1-group-monitoring-master monitoring-master +100: chili0
location loc1-group-mysql mysql +100: chili1
location loc1-group-nfs nfs +100: chili1
location loc1-group-postgresql postgresql +100: chili1
location loc1-group-tftp tftp +100: chili0
location loc1-restofencechili0 restofencechili0 +inf: chili1
location loc1-restofencechili1 restofencechili1 +inf: chili0
colocation coloc-clone-fs-BCM-MCO-o2cb inf: clone-fs-BCM-MCO clone-o2cb
colocation coloc-clone-fs-BCM-conf-o2cb inf: clone-fs-BCM-conf clone-o2cb
colocation coloc-clone-fs-BCM-console-o2cb inf: clone-fs-BCM-console 
clone-o2cb
colocation coloc-clone-fs-BCM-data-o2cb inf: clone-fs-BCM-data clone-o2cb
colocation coloc-clone-fs-BCM-log-o2cb inf: clone-fs-BCM-log clone-o2cb
colocation coloc-clone-fs-BCM-storage-o2cb inf: clone-fs-BCM-storage 
clone-o2cb
colocation coloc-clone-mysql-fs-DBs-o2cb inf: clone-mysql-fs-DBs clone-o2cb
colocation coloc-clone-postgresql-fs-DBs-o2cb inf: 
clone-postgresql-fs-DBs clone-o2cb
colocation coloc-fs-BCM-MCO-monitoring-master +inf: monitoring-master 
clone-fs-BCM-MCO
colocation coloc-fs-BCM-MCO-nfs +inf: nfs clone-fs-BCM-MCO
colocation coloc-fs-BCM-conf-monitoring-master +inf: monitoring-master 
clone-fs-BCM-conf
colocation coloc-fs-BCM-conf-nfs +inf: nfs clone-fs-BCM-conf
colocation coloc-fs-BCM-console-nfs +inf: nfs clone-fs-BCM-console
colocation coloc-fs-BCM-data-monitoring-master +inf: monitoring-master 
clone-fs-BCM-data
colocation coloc-fs-BCM-data-nfs +inf: nfs clone-fs-BCM-data
colocation coloc-fs-BCM-log-monitoring-master +inf: monitoring-master 
clone-fs-BCM-log
colocation coloc-fs-BCM-log-nfs +inf: nfs clone-fs-BCM-log
colocation coloc-mysql-fs-DBs-mysql +inf: mysql clone-mysql-fs-DBs
colocation coloc-postgresql-fs-DBs-postgresql +inf: postgresql 
clone-postgresql-fs-DBs
colocation o2cb-with-dlm inf: clone-o2cb clone-dlm
order order-clone-fs-BCM-MCO-o2cb inf: clone-o2cb clone-fs-BCM-MCO
order order-clone-fs-BCM-conf-o2cb inf: clone-o2cb clone-fs-BCM-conf
order order-clone-fs-BCM-console-o2cb inf: clone-o2cb clone-fs-BCM-console
order order-clone-fs-BCM-data-o2cb inf: clone-o2cb clone-fs-BCM-data
order order-clone-fs-BCM-log-o2cb inf: clone-o2cb clone-fs-BCM-log
order order-clone-fs-BCM-storage-o2cb inf: clone-o2cb clone-fs-BCM-storage
order order-clone-mysql-fs-DBs-o2cb inf: clone-o2cb clone-mysql-fs-DBs
order order-clone-postgresql-fs-DBs-o2cb inf: clone-o2cb 
clone-postgresql-fs-DBs
order order-monitoring-master inf: clone-fs-BCM-MCO clone-fs-BCM-log 
clone-fs-BCM-data clone-fs-BCM-conf monitoring-master
order order-mysql inf: clone-mysql-fs-DBs mysql
order order-nfs inf: clone-fs-BCM-console clone-fs-BCM-MCO 
clone-fs-BCM-log clone-fs-BCM-data clone-fs-BCM-conf nfs
order order-postgresql inf: clone-postgresql-fs-DBs postgresql
order start-o2cb-after-dlm inf: clone-dlm clone-o2cb
property $id="cib-bootstrap-options" \
         dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
         cluster-infrastructure="openais" \
         expected-quorum-votes="2" \
         stonith-enabled="true" \
         no-quorum-policy="ignore" \
         default-resource-stickiness="5000" \
         last-lrm-refresh="1286452453"




More information about the Linux-cluster mailing list