[Linux-cluster] Cluster NFS causes kernel bug

Ward, Timothy - SSD Timothy.Ward at itt.com
Wed Sep 12 22:03:20 UTC 2007


I have successfully setup apache and samba as cluster services.  I am
now trying to setup nfs, but encountering a kernel bug.  Any ideas where
I should start looking to fix this?

Thanks,
Tim


System
------
node1# uname -a
Linux node1.cluster.com 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:39:22 EDT
2006 x86_64 x86_64 x86_64 GNU/Linux


FC6 64bit RPMs
--------------
rpm -ivh fc6_rpm/openais-0.80.1-3.x86_64.rpm

rpm -ivh fc6_rpm/perl-Net-Telnet-3.03-5.noarch.rpm
rpm -ivh fc6_rpm_more/xen-libs-3.0.3-9.fc6.x86_64.rpm
rpm -ivh fc6_rpm_more/bridge-utils-1.1-2.x86_64.rpm
rpm -ivh --nodeps fc6_rpm_more/libvirt-0.2.3-1.fc6.x86_64.rpm
rpm -ivh fc6_rpm_more/libvirt-python-0.2.3-1.fc6.x86_64.rpm
rpm -ivh fc6_rpm_more/python-virtinst-0.95.0-1.fc6.noarch.rpm
rpm -ivh fc6_rpm_more/xen-3.0.3-9.fc6.x86_64.rpm
rpm -ivh fc6_rpm_updates/cman-2.0.60-1.fc6.x86_64.rpm
rpm -ivh fc6_rpm_updates/gfs2-utils-0.1.25-1.fc6.x86_64.rpm
rpm -ivh --force fc6_rpm_updates/device-mapper-1.02.13-1.fc6.x86_64.rpm
rpm -ivh --force fc6_rpm_updates/lvm2-2.02.17-1.fc6.x86_64.rpm
rpm -ivh fc6_rpm_updates/lvm2-cluster-2.02.17-1.fc6.x86_64.rpm
rpm -ivh fc6_rpm/rgmanager-2.0.8-1.fc6.x86_64.rpm

Luci
rpm -ivh conga/python-imaging-1.1.6-3.fc6.x86_64.rpm
rpm -ivh conga/zope-2.9.7-2.fc6.x86_64.rpm rpm -ivh
conga/plone-2.5.3-1.fc6.x86_64.rpm
rpm -ivh conga/luci-0.9.3-2.fc6.x86_64.rpm

Ricci
rpm -ivh --nodeps conga/oddjob-libs-0.27-8.x86_64.rpm
rpm -ivh conga/oddjob-0.27-8.x86_64.rpm
rpm -ivh conga/modcluster-0.9.3-2.fc6.x86_64.rpm
rpm -ivh conga/ricci-0.9.3-2.fc6.x86_64.rpm


/etc/cluster/cluster.conf
-------------------------
<cluster config_version="49" name="test1">
   <clusternodes>
      <clusternode name="node1.cluster.com" nodeid="1" votes="1">
         <fence>
            <method name="1">
               <device name="simnps1" port="1" switch="1"/>
            </method>
         </fence>
      </clusternode>
      <clusternode name="node2.cluster.com" nodeid="2" votes="1">
         <fence>
            <method name="1">
               <device name="simnps1" port="2" switch="1"/>
            </method>
         </fence>
      </clusternode>
      <clusternode name="node3.cluster.com" nodeid="3" votes="1">
         <fence/>
      </clusternode>
   </clusternodes>
   <fencedevices>
      <fencedevice agent="fence_apc" ipaddr="172.20.1.12" login="root"
name="nps1" passwd=""/>
   </fencedevices>
   <rm>
      <failoverdomains>
         <failoverdomain name="web0" ordered="1" restricted="1">
            <failoverdomainnode name="node1.cluster.com" priority="1"/>
            <failoverdomainnode name="node2.cluster.com" priority="2"/>
         </failoverdomain>
      </failoverdomains>
      <resources>
         <script file="/etc/rc.d/init.d/httpd" name="httpd_init"/>
         <ip address="172.20.1.10" monitor_link="1"/>
         <fs device="/dev/sdb1" force_fsck="0" force_unmount="0"
fsid="51920" fstype="ext3" mountpoint="/mnt/disk0" name="disk0"
self_fence="0"/>
         <smb name="ssdfwmsa" workgroup="ACDADM"/>
         <ip address="172.20.1.14" monitor_link="1"/>
         <ip address="172.20.1.16" monitor_link="1"/>
      </resources>
      <service autostart="1" domain="web0" exclusive="0" name="apache0"
recovery="relocate">
         <script ref="httpd_init"/>
         <ip ref="172.20.1.10"/>
         <fs ref="disk0"/>
      </service>
      <service autostart="1" domain="web0" exclusive="0" name="samba0"
recovery="relocate">
         <ip ref="172.20.1.14"/>
         <smb ref="ssdfwmsa"/>
      </service>
      <service autostart="1" domain="web0" exclusive="0" name="nfs3"
recovery="relocate">
         <ip ref="172.20.1.16"/>
      </service>
   </rm>
   <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
   <cman/>
</cluster>


Commands
--------
node1# clusvcadm -e nfs3
test1# mount 172.20.1.16:/dyntest /mnt/dyntest test1# ll /mnt/dyntest

<command hangs and kernel bug logged>


/var/log/messages
-----------------
Sep 12 14:22:05 node1 clurgmgrd[2751]: <notice> Starting disabled
service service:nfs3
Sep 12 14:22:05 node1 clurgmgrd: [2751]: <info> Adding IPv4 address
172.20.1.16 to eth1
Sep 12 14:22:05 node1 avahi-daemon[2555]: Registering new address record
for 172.20.1.16 on eth1.
Sep 12 14:22:07 node1 clurgmgrd[2751]: <notice> Service service:nfs3
started
Sep 12 14:22:15 node1 mountd[29364]: authenticated mount request from
10.32.144.169:761 for /dyntest (/dyntest)
Sep 12 14:22:21 node1 kernel: original: gfs2_glock_nq_atime+0x152/0x2a2
[gfs2]
Sep 12 14:22:21 node1 kernel: pid : 29354
Sep 12 14:22:21 node1 kernel: lock type : 2 lock state : 1
Sep 12 14:22:21 node1 kernel: new: gfs2_getattr+0x2b/0x63 [gfs2]
Sep 12 14:22:21 node1 kernel: pid : 29354
Sep 12 14:22:21 node1 kernel: lock type : 2 lock state : 1
Sep 12 14:22:21 node1 kernel: ----------- [cut here ] --------- [please
bite here ] ---------
Sep 12 14:22:21 node1 kernel: Kernel BUG at fs/gfs2/glock.c:1193
Sep 12 14:22:21 node1 kernel: invalid opcode: 0000 [1] SMP
Sep 12 14:22:21 node1 kernel: last sysfs file:
/fs/gfs2/test1:gfslv/lock_module/recover_done
Sep 12 14:22:21 node1 kernel: CPU 0
Sep 12 14:22:21 node1 kernel: Modules linked in: nfsd exportfs lockd
nfs_acl ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack
nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge
autofs4 hidp rfcomm l2cap bluetooth md5 sctp lock_dlm gfs2 dlm configfs
sunrpc dm_mirror dm_multipath dm_mod video sbs i2c_ec button battery
asus_acpi ac ipv6 parport_pc lp parport sg amd_rng i2c_amd8111 ide_cd
i2c_amd756 i2c_core serio_raw pcspkr cdrom e1000 shpchp tg3 floppy
k8_edac edac_mc qla2xxx scsi_transport_fc sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
Sep 12 14:22:21 node1 kernel: Pid: 29354, comm: nfsd Not tainted
2.6.18-1.2798.fc6 #1
Sep 12 14:22:21 node1 kernel: RIP: 0010:[<ffffffff88450641>]
[<ffffffff88450641>] :gfs2:gfs2_glock_nq+0x106/0x1f2
Sep 12 14:22:21 node1 kernel: RSP: 0018:ffff8100ea0816d0  EFLAGS:
00010282
Sep 12 14:22:21 node1 kernel: RAX: 0000000000000020 RBX:
ffff8100ea081cc0 RCX: ffffffff806aea40
Sep 12 14:22:21 node1 kernel: RDX: 0000000000000000 RSI:
0000000000000046 RDI: ffffffff80556ef0
Sep 12 14:22:21 node1 kernel: RBP: ffff8100ea081710 R08:
00000000ffffffff R09: 0000000000000400
Sep 12 14:22:21 node1 kernel: R10: 0000000000000000 R11:
0000000000000000 R12: ffff8100d8d98830
Sep 12 14:22:21 node1 kernel: R13: ffff8100d8d98830 R14:
0000000000000000 R15: ffff8100e3c3b000
Sep 12 14:22:21 node1 kernel: FS:  00002aaaab0146f0(0000)
GS:ffffffff80609000(0000) knlGS:00000000f7fd46d0
Sep 12 14:22:21 node1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Sep 12 14:22:21 node1 kernel: CR2: 0000000000669358 CR3:
0000000068f92000 CR4: 00000000000006e0
Sep 12 14:22:21 node1 kernel: Process nfsd (pid: 29354, threadinfo
ffff8100ea080000, task ffff8100f41ee7d0)
Sep 12 14:22:21 node1 kernel: Stack:  0000000100000000 ffff8100cc8ba01c
ffff8100ea0819b0 ffff8100d4539678
Sep 12 14:22:21 node1 kernel:  ffff8100ea0817b0 0000000000000001
ffff8100cc8ba000 ffffffff8845c747
Sep 12 14:22:21 node1 kernel:  ffff8100ea081710 ffff8100ea081710
ffff8100d8d98830 ffff8100f41ee7d0
Sep 12 14:22:21 node1 kernel: Call Trace:
Sep 12 14:22:21 node1 kernel:  [<ffffffff8845c747>]
:gfs2:gfs2_getattr+0x33/0x63
Sep 12 14:22:21 node1 kernel:  [<ffffffff886a4d0b>]
:nfsd:encode_post_op_attr+0x3f/0x213
Sep 12 14:22:21 node1 kernel:  [<ffffffff886a5492>]
:nfsd:encode_entry+0x21d/0x51b
Sep 12 14:22:21 node1 kernel:  [<ffffffff886a57a0>]
:nfsd:nfs3svc_encode_entry_plus+0x10/0x12
Sep 12 14:22:21 node1 kernel:  [<ffffffff8845a2c2>]
:gfs2:filldir_func+0x22/0x86
Sep 12 14:22:21 node1 kernel:  [<ffffffff8844abd1>]
:gfs2:do_filldir_main+0x126/0x16d
Sep 12 14:22:21 node1 kernel:  [<ffffffff8844b102>]
:gfs2:gfs2_dir_read+0x426/0x485
Sep 12 14:22:21 node1 kernel:  [<ffffffff8845aa6f>]
:gfs2:gfs2_readdir+0x9e/0xc4
Sep 12 14:22:21 node1 kernel:  [<ffffffff802350b2>]
vfs_readdir+0x77/0xa9
Sep 12 14:22:21 node1 kernel:  [<ffffffff8869ce5e>]
:nfsd:nfsd_readdir+0x6d/0xc5
Sep 12 14:22:21 node1 kernel:  [<ffffffff886a4621>]
:nfsd:nfsd3_proc_readdirplus+0xf8/0x211
Sep 12 14:22:21 node1 kernel:  [<ffffffff886990e9>]
:nfsd:nfsd_dispatch+0xd7/0x198
Sep 12 14:22:21 node1 kernel:  [<ffffffff883e2437>]
:sunrpc:svc_process+0x42e/0x6ec
Sep 12 14:22:21 node1 kernel:  [<ffffffff88699662>]
:nfsd:nfsd+0x1b5/0x32b
Sep 12 14:22:21 node1 kernel:  [<ffffffff8025cea5>] child_rip+0xa/0x11
Sep 12 14:22:22 node1 kernel: DWARF2 unwinder stuck at
child_rip+0xa/0x11
Sep 12 14:22:22 node1 kernel: Leftover inexact backtrace:
Sep 12 14:22:22 node1 kernel:  [<ffffffff886994ad>] :nfsd:nfsd+0x0/0x32b
Sep 12 14:22:22 node1 kernel:  [<ffffffff886994ad>] :nfsd:nfsd+0x0/0x32b
Sep 12 14:22:22 node1 kernel:  [<ffffffff8025ce9b>] child_rip+0x0/0x11
Sep 12 14:22:22 node1 kernel:
Sep 12 14:22:22 node1 kernel:
Sep 12 14:22:22 node1 kernel: Code: 0f 0b 68 0d 80 46 88 c2 a9 04 48 8b
75 18 49 8b 84 24 90 00
Sep 12 14:22:22 node1 kernel: RIP  [<ffffffff88450641>]
:gfs2:gfs2_glock_nq+0x106/0x1f2
Sep 12 14:22:22 node1 kernel:  RSP <ffff8100ea0816d0>

*****************************************************************
This e-mail and any files transmitted with it may be proprietary 
and are intended solely for the use of the individual or entity to 
whom they are addressed. If you have received this e-mail in 
error please notify the sender. Please note that any views or
opinions presented in this e-mail are solely those of the author 
and do not necessarily represent those of ITT Corporation. The 
recipient should check this e-mail and any attachments for the 
presence of viruses. ITT accepts no liability for any damage 
caused by any virus transmitted by this e-mail.
*******************************************************************





More information about the Linux-cluster mailing list