[Linux-cluster] GFS 6.0 crashing x86_64 machine

micah nerren mnerren at paracel.com
Mon Aug 2 20:59:55 UTC 2004


On Mon, 2004-08-02 at 08:46, Adam Manthei wrote:
> On Mon, Aug 02, 2004 at 07:48:02AM -0700, micah nerren wrote:
> > 
> > The system crashes. At the console, there are tons of system calls being
> > listed, and at the bottom of the screen:
> > 
> > Code: 39 d0 75 f8 85 c9 74 10 8b 44 24 14 39 d0 74 08 8b 44 24 14
> > Console Shuts up:
> >    pid: 3547, lock_gulmd Not tainted
> > RIP: 0010
> > 
> > 
> > So... Any ideas on what may be causing this? 
> 
> Those "tons of system calls being listed" are really quite useful if not
> necessary to tell you what the problem is.  My gut feeling is that there is
> a stack overrun that is happening.


Ok, here is a capture of the crash occurring. Note that the message is
slightly different than the one I posted before, the end changes,
however the calls it is making look very similar. I also went and
upgraded the kernel to the lastest from RHEL 3 WS. I upgraded GFS to
GFS-6.0.0-7.src.rpm. Still crashing.

Here is my entire boot log from power on, to mount crash. Prior to the
crash, I did the following:

logged in as root via ssh
depmod -a
modprobe lock_gulm
modprobe gfs

(module pool was already loaded at boot time.)

mount -t gfs /dev/pool/pool_gfs01 /mnt/gfs
CRASH

I hope this helps!!

Micah


////////////////////////

Bootdata ok (command line is ro root=LABEL=/ noapic console=ttyS0,38400)
Linux version 2.4.21-15.0.3.ELsmp (bhcompile at thor.perf.redhat.com) (gcc
version 3.2.3 20030502 (Red Hat Linux 3.2.3-37)) #1 SMP Tue Jun 29
17:46:55 EDT 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
 BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff80000 (usable)
 BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
kernel direct mapping tables upto 10100000000 @ 8000-d000
Scanning NUMA topology in Northbridge 24
Node 0 using interleaving mode 1/0
No NUMA configuration found
Faking a node at 0000000000000000-000000007ff80000
Bootmem setup node 0 0000000000000000-000000007ff80000
found SMP MP-table at 000f69a0
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
hm, page 0009b000 reserved twice.
hm, page 0009c000 reserved twice.
setting up node 0 0-7ff80
On node 0 totalpages: 524160
zone(0): 4096 pages.
zone(1): 520064 pages.
zone(2): 0 pages.
ACPI: Unable to locate RSDP
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: AMD      <6>Product ID: HAMMER       <6>APIC at: 0xFEE00000
Processor #0 15:5 APIC version 16
Processor #1 15:5 APIC version 16
I/O APIC #2 Version 17 at 0xFEC00000.
I/O APIC #3 Version 17 at 0xFC000000.
I/O APIC #4 Version 17 at 0xFC001000.
Processors: 2
Kernel command line: ro root=LABEL=/ noapic console=ttyS0,38400
Initializing CPU#0
time.c: Detected 1.193182 MHz PIT timer.
time.c: Detected 1403.229 MHz TSC timer.
Console: colour VGA+ 80x25
Calibrating delay loop... 2798.38 BogoMIPS
Memory: 2034216k/2096640k available (1797k kernel code, 0k reserved,
1862k data, 224k init)
Dentry cache hash table entries: 262144 (order: 10, 4194304 bytes)
Inode cache hash table entries: 131072 (order: 9, 2097152 bytes)
Mount cache hash table entries: 256 (order: 0, 4096 bytes)
Buffer cache hash table entries: 131072 (order: 8, 1048576 bytes)
Page-cache hash table entries: 524288 (order: 10, 4194304 bytes)
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2
way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
Machine Check Reporting enabled for CPU#0
POSIX conformance testing by UNIFIX
mtrr: v2.02 (20020716))
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2
way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
CPU0: AMD Opteron(tm) Processor 240 stepping 01
per-CPU timeslice cutoff: 5119.55 usecs.
task migration cache decay timeout: 10 msecs.
Booting processor 1/1 rip 6000 page 00000100077e2000
Initializing CPU#1
Calibrating delay loop... 2804.94 BogoMIPS
CPU: L1 I Cache: 64K (64 bytes/line/2 way), D cache 64K (64 bytes/line/2
way)
CPU: L2 Cache: 1024K (64 bytes/line/8 way)
Machine Check Reporting enabled for CPU#1
CPU1: AMD Opteron(tm) Processor 240 stepping 01
Total of 2 processors activated (5603.32 BogoMIPS).
Using local APIC timer interrupts.
Detected 12.528 MHz APIC timer.
cpu: 0, clocks: 2004614, slice: 668204
CPU0<T0:2004608,T1:1336400,D:4,S:668204,C:2004614>
cpu: 1, clocks: 2004614, slice: 668204
CPU1<T0:2004608,T1:668192,D:8,S:668204,C:2004614>
checking TSC synchronization across CPUs: passed.
time.c: Using PIT based timekeeping.
Starting migration thread for cpu 0
Starting migration thread for cpu 1
ACPI: Subsystem revision 20030619
PCI: Using configuration type 1
ACPI: System description tables not found
    ACPI-0084: *** Error: acpi_load_tables: Could not get RSDP,
AE_NOT_FOUND
    ACPI-0134: *** Error: acpi_load_tables: Could not load tables:
AE_NOT_FOUND
ACPI: Unable to load the System Description Tables
PCI: Probing PCI hardware
PCI: Using IRQ router default [1022/746b] at 00:07.3
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 1919M
PCI-DMA: Disabling IOMMU.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
aio_setup: num_physpages = 131040
aio_setup: sizeof(struct page) = 104
Hugetlbfs mounted.
Total HugeTLB memory allocated, 0
IA32 emulation $Id: sys_ia32.c,v 1.56 2003/04/10 10:45:37 ak Exp $
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT
SHARE_IRQ SERIAL_PCI SERIAL_ACPI enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 256 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
AMD8111: IDE controller at PCI slot 00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
AMD_IDE: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) UDMA100
controller on pci00:07.1
    ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:pio, hdd:pio
hda: WDC WD600JB-00CRA1, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 117231408 sectors (60022 MB) w/8192KiB Cache, CHS=116301/16/63,
UDMA(100)
ide-floppy driver 0.99.newide
Partition check:
 hda: hda1 hda2 hda3
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
Initializing IPsec netlink socket
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Red Hat nash version 3.5.13 starSCSI subsystem driver Revision: 1.00
ting
Loading scsi_mod.o module
Loading sd_mod.o module
Loadinqla2x00_set_info starts at address = ffffffffa00230c0
g qla2300.o moduqla2x00: Found  VID=1077 DID=2312 SSVID=1077 SSDID=101
scsi(0): Found a QLA2312  @ bus 2, device 0x1, irq 5, iobase
0xffffff0000013000
le
scsi(0): Allocated 4096 SRB(s).
scsi(0): Configure NVRAM parameters...
scsi(0): 64 Bit PCI Addressing Enabled.
qla2x00_nvram_config ZIO enabled:intr_timer_delay=3
scsi(0): Verifying loaded RISC code...
scsi(0): Verifying chip...
scsi(0): Waiting for LIP to complete...
scsi(0): Cable is unplugged...
scsi-qla0-adapter-node=200000e08b17cf0f\;
scsi-qla0-adapter-port=210000e08b17cf0f\;
qla2x00: Found  VID=1077 DID=2312 SSVID=1077 SSDID=101
scsi(1): Found a QLA2312  @ bus 2, device 0x1, irq 10, iobase
0xffffff0000015000
scsi(1): Allocated 4096 SRB(s).
scsi(1): Configure NVRAM parameters...
scsi(1): 64 Bit PCI Addressing Enabled.
qla2x00_nvram_config ZIO enabled:intr_timer_delay=3
scsi(1): Verifying loaded RISC code...
scsi(1): Verifying chip...
scsi(1): Waiting for LIP to complete...
scsi(1): LOOP UP detected.
scsi(1): Port database changed.
scsi(1): Topology - (F_Port), Host Loop address 0xffff
qla2x00_configure_fcports(1): LOOP READY
scsi-qla1-adapter-node=200100e08b37cf0f\;
scsi-qla1-adapter-port=210100e08b37cf0f\;
scsi-qla1-tgt-0-di-0-port=22000004cffd1447\;
scsi-qla1-tgt-1-di-0-port=22000004cffd1411\;
scsi-qla1-tgt-2-di-0-port=22000004cffd0254\;
scsi-qla1-tgt-3-di-0-port=22000004cffcec36\;
scsi(1) qla2x00_isr MBA_PORT_UPDATE ignored
scsi0 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 2 device 1
irq 5
        Firmware version:  3.02.24, Driver version 6.07.02-RH2

scsi1 : QLogic QLA2312 PCI to Fibre Channel Host Adapter: bus 2 device 1
irq 10
        Firmware version:  3.02.24, Driver version 6.07.02-RH2

  Vendor: SEAGATE   Model: ST336607FC        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: SEAGATE   Model: ST336607FC        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: SEAGATE   Model: ST336607FC        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: SEAGATE   Model: ST336607FC        Rev: 0006
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(1:0:0:0): Enabled tagged queuing, queue depth 64.
scsi(1:0:1:0): Enabled tagged queuing, queue depth 64.
scsi(1:0:2:0): Enabled tagged queuing, queue depth 64.
scsi(1:0:3:0): Enabled tagged queuing, queue depth 64.
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi1, channel 0, id 1, lun 0
Attached scsi disk sdc at scsi1, channel 0, id 2, lun 0
Attached scsi disk sdd at scsi1, channel 0, id 3, lun 0
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
 sda: sda1 sda2
SCSI device sdb: 71687372 512-byte hdwr sectors (36704 MB)
 sdb: sdb1 sdb2
SCSI device sdc: 71687372 512-byte hdwr sectors (36704 MB)
 sdc: sdc1 sdc2
SCSI device sdd: 71687372 512-byte hdwr sectors (36704 MB)
 sdd: sdd1 sdd2
Loading jbd.o module
Journalled Block Device driver loaded
Loading ext3.o module
Mounting /proc filesystem
Creating block devices
Creating root device
Mounting root filesystem
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
spurious 8259A interrupt: IRQ7.
Freeing unused kernel memory: 224k freed
INIT: version 2.85 booting
		Welcome to Rocks
		Press 'I' to enter interactive startup.
Unmounting initrd:  [  OK  ]
Configuring kernel parameters:  [  OK  ]
Setting clock  (utc): Mon Aug  2 20:42:30 GMT 2004 [  OK  ]
Setting hostname frontend-0.public:  [  OK  ]
Initializing USB controller (usb-ohci):  [  OK  ]
Mounting USB filesystem:  [  OK  ]
Initializing USB HID interface:  [  OK  ]
Initializing USB keyboard:  [  OK  ]
Initializing USB mouse:  [  OK  ]
Checking root filesystem
/: clean, 158083/7061504 files, 1680159/14116410 blocks
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/hda2 
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
Activating swap partitions:  [  OK  ]
Finding module dependencies:  [  OK  ]
Checking filesystems
/boot: clean, 69/25584 files, 75707/102280 blocks
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/hda1 
[  OK  ]
Mounting local filesystems:  [  OK  ]
Enabling local filesystem quotas:  [  OK  ]
Enabling swap space:  [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Applying iptables firewall rules: [  OK  ]
Setting network parameters:  [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:  [  OK  ]
Bringing up interface eth1:  [  OK  ]
Starting system logger: [  OK  ]
Starting kernel logger: [  OK  ]
Starting portmapper: [  OK  ]
Starting NFS statd: [  OK  ]
Starting pool:  Pool v6.0.0 (built Aug  2 2004 18:51:15) installed
[  OK  ]
Starting ganglia-restore-rrds:  [  OK  ]
Starting ccsd:  [  OK  ]
Starting GANGLIA gmetad: [  OK  ]
Initializing random number generator:  [  OK  ]
Starting Ganglia Receptor: [  OK  ]
Starting lock_gulmd:  [  OK  ]
modprobe: Can't locate module pvfs
Starting PVFS daemon: (pvfsd.c, 683): Could not setup device /dev/pvfsd.
(pvfsd.c, 684): Did you remember to load the pvfs module?
(pvfsd.c, 453): pvfsd: setup_pvfsdev() failed
[FAILED][  OK  ]
Mounting other filesystems:  [  OK  ]
Publishing login files via 411...[  OK  ]
Starting automount:[  OK  ]
Starting named: [  OK  ]
Starting sshd:[  OK  ]
Starting xinetd: [  OK  ]
ntpd: Synchronizing with time server: [FAILED]
Starting ntpd: [  OK  ]
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS daemon: [  OK  ]
Starting NFS mountd: [  OK  ]
Starting dhcpd: [  OK  ]
Starting GANGLIA gmond: [  OK  ]
Starting MySQL:  [  OK  ]
Starting httpd: [  OK  ]
Starting crond: [  OK  ]
Starting xfs: [  OK  ]
Starting atd: [  OK  ]
Starting firstboot:  [  OK  ]
   starting sge_qmaster
starting program: /opt/gridengine/bin/amd64linux/sge_commd
using service "sge_commd"
bound to port 535
Reading in complexes:
	Complex "host".
	Complex "queue".
Reading in execution hosts.
Reading in administrative hosts.
Reading in submit hosts.
Reading in usersets:
	Userset "defaultdepartment".
	Userset "deadlineusers".
Reading in queues:
	Queue "compute-0-0.q".
Reading in parallel environments:
	PE "make".
	PE "mpich".
	PE "mpi".
Reading in scheduler configuration
cant load sharetree (cant open file sharetree: No such file or
directory), starting up with empty sharetree
   starting sge_schedd
Turn off kernel logging to console: [  OK  ]
/wet^H^H^HUnable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
 printing rip:
ffffffff8024a875
PML4 78215067 PGD 77f93067 PMD 0 
Oops: 0002
CPU 1 
Pid: 4027, comm: mount Not tainted
RIP: 0010:[<ffffffff8024a875>]{net_rx_action+213}
RSP: 0018:0000010078051048  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffff80607ae8 RCX: ffffffff80607c88
RDX: ffffffff80607ae8 RSI: 0000010078986080 RDI: ffffffff80607ad0
RBP: ffffffff80607968 R08: 0000000080e76a9c R09: 0000000000e780e7
R10: 000000000100007f R11: 0000000000000000 R12: ffffffff80607ae8
R13: ffffffff80607ac0 R14: 00000000000071c2 R15: 0000000000000001
FS:  0000002a955764c0(0000) GS:ffffffff805d98c0(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000079d2000 CR4: 00000000000006e0

Call Trace: [<ffffffff8024a84d>]{net_rx_action+173} 
       [<ffffffff8012a72e>]{do_softirq+174}
[<ffffffff80267cf0>]{ip_finish_output2+0} 
       [<ffffffff80267cc0>]{dst_output+0}
[<ffffffff802b5915>]{do_softirq_thunk+53} 
       [<ffffffff802533a7>]{.text.lock.netfilter+165}
[<ffffffff80267cc0>]{dst_output+0} 
       [<ffffffff80265fbb>]{ip_queue_xmit+1019}
[<ffffffff80262ee0>]{ip_rcv_finish+0} 
       [<ffffffff802630f0>]{ip_rcv_finish+528}
[<ffffffff80252e51>]{nf_hook_slow+305} 
       [<ffffffff80262ee0>]{ip_rcv_finish+0}
[<ffffffff80277faf>]{tcp_transmit_skb+1295} 
       [<ffffffff80278ac6>]{tcp_write_xmit+198}
[<ffffffff8026de83>]{tcp_sendmsg+4051} 
       [<ffffffff8028e795>]{inet_sendmsg+69}
[<ffffffff802407ae>]{sock_sendmsg+142} 
       [<ffffffffa017c4b1>]{:lock_gulm:do_tfer+369}
[<ffffffffa017ebd4>]{:lock_gulm:.rodata.str1.1+467} 
       [<ffffffffa017c595>]{:lock_gulm:xdr_send+37}
[<ffffffffa017b498>]{:lock_gulm:xdr_enc_flush+56} 
       [<ffffffffa017951d>]{:lock_gulm:lg_lock_login+301} 
       [<ffffffffa0175ff9>]{:lock_gulm:lt_login+57}
[<ffffffffa0172164>]{:lock_gulm:gulm_core_login_reply+164} 
       [<ffffffffa01826a0>]{:lock_gulm:core_cb+0}
[<ffffffffa01780eb>]{:lock_gulm:lg_core_handle_messages+315} 
       [<ffffffffa0178713>]{:lock_gulm:lg_core_login+323} 
       [<ffffffffa017253a>]{:lock_gulm:cm_login+122}
[<ffffffffa0172bde>]{:lock_gulm:start_gulm_threads+174} 
       [<ffffffffa0172f08>]{:lock_gulm:gulm_mount+616}
[<ffffffffa014c940>]{:gfs:gfs_glock_cb+0} 
       [<ffffffff801277bb>]{release_task+763}
[<ffffffffa01313e3>]{:lock_harness:lm_mount_Rsmp_ad6c5c21+355} 
       [<ffffffffa014c940>]{:gfs:gfs_glock_cb+0}
[<ffffffffa0151ff9>]{:gfs:gfs_mount_lockproto+313} 
       [<ffffffff8013d8d2>]{do_anonymous_page+1234}
[<ffffffff8013d94f>]{do_no_page+95} 
       [<ffffffff801a5103>]{do_page_fault+627}
[<ffffffff801109d6>]{error_exit+0} 
       [<ffffffff80184cb3>]{create_elf_tables+211}
[<ffffffff802b5798>]{strnlen_user+56} 
       [<ffffffff80184f47>]{create_elf_tables+871}
[<ffffffffa013d37b>]{:gfs:gfs_read_super+1307} 
       [<ffffffffa0171b00>]{:gfs:gfs_fs_type+0}
[<ffffffff80164c0c>]{get_sb_bdev+588} 
       [<ffffffffa0171b00>]{:gfs:gfs_fs_type+0}
[<ffffffff80164ec9>]{do_kern_mount+121} 
       [<ffffffff8017baa1>]{do_add_mount+161}
[<ffffffff8017bdb9>]{do_mount+345} 
       [<ffffffff80154b40>]{__get_free_pages+16}
[<ffffffff8017c1d5>]{sys_mount+197} 
       [<ffffffff80110177>]{system_call+119} 
Process mount (pid: 4027, stackpage=10078051000)
Stack: 0000010078051048 0000000000000018 ffffffff8024a84d
0000012a80445d20 
       0000000000000001 ffffffff80606c60 0000000000000001
000000000000000a 
       0000000000000001 0000000000000002 ffffffff8012a72e
ffffffff80267cf0 
       0000000000000246 0000000000000000 0000000000000003
ffffffff80445d20 
       ffffffff80267cc0 0000000000000000 ffffffff802b5915
0000000000000043 
       0000000000000006 000001007a05309e 000001007c97bd80
0000000000000000 
       0000000000000300 ffffffff8049c688 0000000000000001
ffffffff806077c0 
       ffffffff802533a7 ffffffff80267cc0 ffffffff80445d20
0000000000000002 
       000001007c97bd80 ffffffff805abcd0 000001007a0530ac
000001007c97bd80 
       0000010078986080 0000000000000000 0000010078986080
000001007c97bde8 
Call Trace: [<ffffffff8024a84d>]{net_rx_action+173} 
       [<ffffffff8012a72e>]{do_softirq+174}
[<ffffffff80267cf0>]{ip_finish_output2+0} 
       [<ffffffff80267cc0>]{dst_output+0}
[<ffffffff802b5915>]{do_softirq_thunk+53} 
       [<ffffffff802533a7>]{.text.lock.netfilter+165}
[<ffffffff80267cc0>]{dst_output+0} 
       [<ffffffff80265fbb>]{ip_queue_xmit+1019}
[<ffffffff80262ee0>]{ip_rcv_finish+0} 
       [<ffffffff802630f0>]{ip_rcv_finish+528}
[<ffffffff80252e51>]{nf_hook_slow+305} 
       [<ffffffff80262ee0>]{ip_rcv_finish+0}
[<ffffffff80277faf>]{tcp_transmit_skb+1295} 
       [<ffffffff80278ac6>]{tcp_write_xmit+198}
[<ffffffff8026de83>]{tcp_sendmsg+4051} 
       [<ffffffff8028e795>]{inet_sendmsg+69}
[<ffffffff802407ae>]{sock_sendmsg+142} 
       [<ffffffffa017c4b1>]{:lock_gulm:do_tfer+369}
[<ffffffffa017ebd4>]{:lock_gulm:.rodata.str1.1+467} 
       [<ffffffffa017c595>]{:lock_gulm:xdr_send+37}
[<ffffffffa017b498>]{:lock_gulm:xdr_enc_flush+56} 
       [<ffffffffa017951d>]{:lock_gulm:lg_lock_login+301} 
       [<ffffffffa0175ff9>]{:lock_gulm:lt_login+57}
[<ffffffffa0172164>]{:lock_gulm:gulm_core_login_reply+164} 
       [<ffffffffa01826a0>]{:lock_gulm:core_cb+0}
[<ffffffffa01780eb>]{:lock_gulm:lg_core_handle_messages+315} 
       [<ffffffffa0178713>]{:lock_gulm:lg_core_login+323} 
       [<ffffffffa017253a>]{:lock_gulm:cm_login+122}
[<ffffffffa0172bde>]{:lock_gulm:start_gulm_threads+174} 
       [<ffffffffa0172f08>]{:lock_gulm:gulm_mount+616}
[<ffffffffa014c940>]{:gfs:gfs_glock_cb+0} 
       [<ffffffff801277bb>]{release_task+763}
[<ffffffffa01313e3>]{:lock_harness:lm_mount_Rsmp_ad6c5c21+355} 
       [<ffffffffa014c940>]{:gfs:gfs_glock_cb+0}
[<ffffffffa0151ff9>]{:gfs:gfs_mount_lockproto+313} 
       [<ffffffff8013d8d2>]{do_anonymous_page+1234}
[<ffffffff8013d94f>]{do_no_page+95} 
       [<ffffffff801a5103>]{do_page_fault+627}
[<ffffffff801109d6>]{error_exit+0} 
       [<ffffffff80184cb3>]{create_elf_tables+211}
[<ffffffff802b5798>]{strnlen_user+56} 
       [<ffffffff80184f47>]{create_elf_tables+871}
[<ffffffffa013d37b>]{:gfs:gfs_read_super+1307} 
       [<ffffffffa0171b00>]{:gfs:gfs_fs_type+0}
[<ffffffff80164c0c>]{get_sb_bdev+588} 
       [<ffffffffa0171b00>]{:gfs:gfs_fs_type+0}
[<ffffffff80164ec9>]{do_kern_mount+121} 
       [<ffffffff8017baa1>]{do_add_mount+161}
[<ffffffff8017bdb9>]{do_mount+345} 
       [<ffffffff80154b40>]{__get_free_pages+16}
[<ffffffff8017c1d5>]{sys_mount+197} 
       [<ffffffff80110177>]{system_call+119} 

Code: 48 89 18 48 89 43 08 8b 85 90 01 00 00 85 c0 79 08 03 85 94

Kernel panic: Fatal exception
In interrupt handler - not syncing
 
NMI Watchdog detected LOCKUP on CPU0, eip ffffffff801a5419, registers:
CPU 0 
Pid: 3532, comm: lock_gulmd Not tainted
RIP: 0010:[<ffffffff801a5419>]{.text.lock.fault+7}
RSP: 0018:000001007ba7b978  EFLAGS: 00000086
RAX: 000000000000000f RBX: ffffffff806077e8 RCX: 0000000000000000
RDX: ffffffff803042e0 RSI: ffffffff803042e0 RDI: ffffffff8024a875
RBP: ffffffff80607668 R08: ffffffff803042d0 R09: 0000000000e780e7
R10: 000000000100007f R11: 0000000000000000 R12: 0000010037dcbc00
R13: 0000000000000000 R14: 0000000000000002 R15: 000001007ba7ba58
FS:  0000002a95576ce0(0000) GS:ffffffff805d9840(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace:  <EOE> [<ffffffff80252e51>]{nf_hook_slow+305} 
       [<ffffffff80262ee0>]{ip_rcv_finish+0}
[<ffffffff802630f0>]{ip_rcv_finish+528} 
       [<ffffffff80262d70>]{ip_local_deliver_finish+0}
[<ffffffff801109d6>]{error_exit+0} 
       [<ffffffff8024a875>]{net_rx_action+213}
[<ffffffff8024a84d>]{net_rx_action+173} 
       [<ffffffff8012a72e>]{do_softirq+174}
[<ffffffff80267cf0>]{ip_finish_output2+0} 
       [<ffffffff80267cc0>]{dst_output+0}
[<ffffffff802b5915>]{do_softirq_thunk+53} 
       [<ffffffff802533a7>]{.text.lock.netfilter+165}
[<ffffffff80267cc0>]{dst_output+0} 
       [<ffffffff80265fbb>]{ip_queue_xmit+1019}
[<ffffffff80277faf>]{tcp_transmit_skb+1295} 
       [<ffffffff80278ac6>]{tcp_write_xmit+198}
[<ffffffff8026de83>]{tcp_sendmsg+4051} 
       [<ffffffff8028e795>]{inet_sendmsg+69}
[<ffffffff802407ae>]{sock_sendmsg+142} 
       [<ffffffff802418e3>]{sys_sendto+195}
[<ffffffff80154d14>]{free_pages+132} 
       [<ffffffff801714c8>]{__poll_freewait+136}
[<ffffffff80110177>]{system_call+119} 
       
Process lock_gulmd (pid: 3532, stackpage=1007ba7b000)
Stack: 000001007ba7b978 0000000000000018 0000000000100000
0000000000000000 
       00000100079c4c80 ffffffff803e89a0 0000000000000000
00000100000fdea0 
       ffffffff803e8d00 00000100079bf000 00000100079d6400
0000000000000042 
       00000100079de280 ffffff0000000000 000000fffffff000
0000000000000000 
       00000100079d7a80 0000000000000000 0000000000000000
0000000000000000 
       0000000000000000 0000000000000000 0000000000000000
0000000000000000 
       0000010078050d48 0000000000000000 00000000006d9994
0000000000000003 
       0000000000000000 0000000000000000 0000000100000000
ffffffffffffffff 
       ffffffffffffffff ffffffffffffffff ffffffffffffffff
ffffffffffffffff 
       ffffffffffffffff ffffffffffffffff ffffffffffffffff
ffffffffffffffff 
Call Trace:  <EOE> [<ffffffff80252e51>]{nf_hook_slow+305} 
       [<ffffffff80262ee0>]{ip_rcv_finish+0}
[<ffffffff802630f0>]{ip_rcv_finish+528} 
       [<ffffffff80262d70>]{ip_local_deliver_finish+0}
[<ffffffff801109d6>]{error_exit+0} 
       [<ffffffff8024a875>]{net_rx_action+213}
[<ffffffff8024a84d>]{net_rx_action+173} 
       [<ffffffff8012a72e>]{do_softirq+174}
[<ffffffff80267cf0>]{ip_finish_output2+0} 
       [<ffffffff80267cc0>]{dst_output+0}
[<ffffffff802b5915>]{do_softirq_thunk+53} 
       [<ffffffff802533a7>]{.text.lock.netfilter+165}
[<ffffffff80267cc0>]{dst_output+0} 
       [<ffffffff80265fbb>]{ip_queue_xmit+1019}
[<ffffffff80277faf>]{tcp_transmit_skb+1295} 
       [<ffffffff80278ac6>]{tcp_write_xmit+198}
[<ffffffff8026de83>]{tcp_sendmsg+4051} 
       [<ffffffff8028e795>]{inet_sendmsg+69}
[<ffffffff802407ae>]{sock_sendmsg+142} 
       [<ffffffff802418e3>]{sys_sendto+195}
[<ffffffff80154d14>]{free_pages+132} 
       [<ffffffff801714c8>]{__poll_freewait+136}
[<ffffffff80110177>]{system_call+119} 
       

Code: f3 90 7e f5 e9 c8 fd ff ff 90 90 90 90 90 90 90 90 90 90 90

console shuts up ...
NM I Watchdog detected LOCKUP on CPU1, eip ffffffff8011a948, registers:
  




More information about the Linux-cluster mailing list