[K12OSN] Server Crash K12LTSP 4.2.0-2 smp

Mary Jo Spencer webace98 at gmail.com
Wed May 17 21:10:22 UTC 2006


Greetings everyone.  I have been lurking on this mailing list for quite some
time.  I started last spring with 3 grade 3 classrooms set up with banks of
K12ltsp thin clients and this year expanded to 15 classrooms grades 3-5.  We
have been using the same server all along though I did add memory to a total
of 5.9 gb on a Dual Xenon PowerEdge 2600 with scsi drives.  Originally the
techs upgraded the kernel to 2.6.10-1.770_FC3smp and updated a bit of other
K12ltsp software.

I have been working on setting up a second server and almost had everything
all set but before I got it deployed, our main server crashed in the middle
of the day - at the time it crashed there was about 4.2 gb memory in use and
the CPU usage was about 20% (this was down from 5.2 gb and 65% utilization
for while a half hr earlier).  All the thin clients froze showing black
screen with some colored blocks.  The server would not respond to direct
keyboard input or mouse so I just powered it down and restarted it.  After
restarting and going throught the disk check it has been ok but I am worried
about it crashing again because I don't know why it crashed in the first
place.

Here is what it shows in var/log/messages:

May 11 10:13:08 SMS-K12LTSP kernel: Unable to handle kernel paging request
at virtual address 00004330
May 11 10:13:08 SMS-K12LTSP kernel:  printing eip:
May 11 10:13:08 SMS-K12LTSP kernel: c02a469d
May 11 10:13:08 SMS-K12LTSP kernel: *pde = 138be001
May 11 10:13:08 SMS-K12LTSP kernel: Oops: 0000 [#1]
May 11 10:13:08 SMS-K12LTSP kernel: SMP
May 11 10:13:08 SMS-K12LTSP kernel: Modules linked in: ipt_MASQUERADE
iptable_nat ip_conntrack ip_tables nfsd exportfs lockd parport_pc lp parport
autofs4 i2c_dev i2c_core sunrpc video button battery ac md5 ipv6 uhci_hcd
hw_random e1000 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
megaraid_mbox megaraid_mm sd_mod scsi_mod

According to the system map in /boot c02a469d corresponds to inet_ioctl

Here is some of what showed up in dmesg after the server booted:

Linux version 2.6.10-1.770_FC3smp (bhcompile at porky.build.redhat.com) (gcc
version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)) #1 SMP Thu Feb 24 14:20:06 EST
2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 0000000000100000 - 00000000f7fd0000 (usable)
 BIOS-e820: 00000000f7fd0000 - 00000000f7fdfc00 (ACPI data)
 BIOS-e820: 00000000f7fdfc00 - 00000000f7fff000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec90000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 00000001c8000000 (usable)
6400MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 1867776
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 225280 pages, LIFO batch:16
  HighMem zone: 1638400 pages, LIFO batch:16
DMI 2.3 present.
Using APIC driver default
ACPI: RSDP (v000 DELL                                  ) @ 0x000fdc20
ACPI: RSDT (v001 DELL   PE2600   0x00000001 MSFT 0x0100000a) @ 0x000fdc34
ACPI: FADT (v001 DELL   PE2600   0x00000001 MSFT 0x0100000a) @ 0x000fdc64
ACPI: MADT (v001 DELL   PE2600   0x00000001 MSFT 0x0100000a) @ 0x000fdcd8
ACPI: SPCR (v001 DELL   PE2600   0x00000001 MSFT 0x0100000a) @ 0x000fdd96
ACPI: DSDT (v001 DELL   PE2600   0x00000001 MSFT 0x0100000a) @ 0x00000000
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
Processor #7 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
<<<<3 more of thse for the two cpu's>>>>
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
ACPI: IOAPIC (id[0x0a] address[0xfec81000] gsi_base[72])
IOAPIC[2]: apic_id 10, version 32, address 0xfec81000, GSI 72-95
ACPI: IOAPIC (id[0x0b] address[0xfec82000] gsi_base[120])
IOAPIC[3]: apic_id 11, version 32, address 0xfec82000, GSI 120-143
ACPI: IOAPIC (id[0x0c] address[0xfec82800] gsi_base[144])
IOAPIC[4]: apic_id 12, version 32, address 0xfec82800, GSI 144-167
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high edge)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: IRQ10 used by override.
Enabling APIC mode:  Flat.  Using 5 I/O APICs
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec80000)
mapped IOAPIC to ffffa000 (fec81000)
mapped IOAPIC to ffff9000 (fec82000)
mapped IOAPIC to ffff8000 (fec82800)
Initializing CPU#0
CPU 0 irqstacks, hard=c03ca000 soft=c03aa000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2393.112 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 7273988k/7471104k available (1780k kernel code, 64932k reserved,
718k data, 204k init, 6422336k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 4734.97 BogoMIPS (lpj=2367488)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:        bfebf3ff 00000000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 2.40GHz stepping 09
per-CPU timeslice cutoff: 1462.91 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c03cb000 soft=c03ab000
Initializing CPU#1
Calibrating delay loop... 4767.74 BogoMIPS (lpj=2383872)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps:        bfebf3ff 00000000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Xeon(TM) CPU 2.40GHz stepping 09
Booting processor 2/6 eip 3000
CPU 2 irqstacks, hard=c03cc000 soft=c03ac000
Initializing CPU#2
Calibrating delay loop... 4767.74 BogoMIPS (lpj=2383872)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
CPU: After vendor identify, caps:  bfebfbff 00000000 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After all inits, caps:        bfebf3ff 00000000 00000000 00000080
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
<<<more stuff re: detecting second CPU>>>>>>>
Total of 4 processors activated (19038.20 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 4 CPUs: passed.
Brought up 4 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 1022k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfc6ce, last bus=11
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20041105
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI2.P2PA._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI2.P2PB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI3.P2PC._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI3.P2PD._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4.P2PE._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4.P2PE.ZION._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4.P2PF._PRT]
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 9 *11 12 14)
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 9 11 12 14) *0, disabled.
ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 9 11 12 14) *0, disabled.
ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 9 11 12 14) *0, disabled.
ACPI: PCI Interrupt Link [LNK8] (IRQs 3 4 *5 6 7 9 11 12 14)
ACPI: PCI Interrupt Link [LNK9] (IRQs 3 4 5 6 7 9 11 12 14) *0, disabled.
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 *11 12 14)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 11 12 14) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 11 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
** PCI interrupts are no longer routed automatically.  If this
** causes a device to stop working, it is probably because the
** driver failed to call pci_enable_device().  As a temporary
** workaround, the "pci=routeirq" argument restores the old
** behavior.  If this argument makes the device work again,
** please email the output of "lspci" to bjorn.helgaas at hp.com
** so I can fix the driver.
pnp: 00:0a: ioport range 0x800-0x87f could not be reserved
pnp: 00:0a: ioport range 0x880-0x8bf has been reserved
pnp: 00:0a: ioport range 0x8c0-0x8df has been reserved
pnp: 00:0a: ioport range 0xc00-0xc1f has been reserved
pnp: 00:0a: ioport range 0xca2-0xca7 has been reserved
pnp: 00:0a: ioport range 0xc20-0xc2f has been reserved
apm: BIOS not found.
audit: initializing netlink socket (disabled)
audit(1147342842.883:0): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key C1F2FA57F7EECD64
- User ID: Red Hat, Inc. (Kernel Module GPG key)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 76 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
divert: not allocating divert_blk for non-ethernet device lo
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH3: IDE controller at PCI slot 0000:00:1f.1
PCI: Enabling device 0000:00:1f.1 (0005 -> 0007)
ACPI: PCI interrupt 0000:00:1f.1[A]: no GSI - using IRQ 0
ICH3: chipset revision 2
ICH3: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: HL-DT-STCD-RW/DVD-ROM GCC-4243N, ATAPI CD/DVD-ROM drive
elevator: using anticipatory as default io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: ImPS/2 Logitech Wheel Mouse on isa0060/serio1
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 32768 buckets, 512Kbytes
TCP: Hash tables configured (established 262144 bind 43690)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI wakeup devices:
PCI0 PCI1 PCI2 P2PA P2PB PCI3 P2PC P2PD PCI4 P2PE P2PF
ACPI: (supports S0 S4 S5)
CPU0:
 domain 0: span 00000003
  groups: 00000001 00000002
  domain 1: span 0000000f
   groups: 00000003 0000000c
<<<more cpu stuff like this up to CPU3>>>
  Freeing unused kernel memory: 204k freed
SCSI subsystem initialized
megaraid cmm: 2.20.2.3 (Release Date: Thu Dec  9 19:02:14 EST 2004)
megaraid: 2.20.4.1 (Release Date: Thu Nov  4 17:44:59 EST 2004)
megaraid: probe new device 0x1028:0x000e:0x1028:0x0123: bus 8:slot 8:func 0
ACPI: PCI interrupt 0000:08:08.0[A] -> GSI 120 (level, low) -> IRQ 169
megaraid: fw version:[2.48] bios version:[1.06]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
  Vendor: PE/PV     Model: 1x6 SCSI BP       Rev: 1.1
  Type:   Processor                          ANSI SCSI revision: 02
scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[0]: scanning scsi channel 2 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD0 RAID1 39900R  Rev: 2.48
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 286515200 512-byte hdwr sectors (146696 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
SCSI device sda: 286515200 512-byte hdwr sectors (146696 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 sda: sda1 sda2
Attached scsi disk sda at scsi0, channel 2, id 0, lun 0
device-mapper: 4.3.0-ioctl (2004-09-30) initialised: dm-devel at redhat.com
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: dm-0: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 494579
ext3_orphan_cleanup: deleting unreferenced inode 524510
ext3_orphan_cleanup: deleting unreferenced inode 524509
EXT3-fs: dm-0: 3 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
Attached scsi generic sg0 at scsi0, channel 0, id 6, lun 0,  type 3
Attached scsi generic sg1 at scsi0, channel 2, id 0, lun 0,  type 0
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Intel(R) PRO/1000 Network Driver - version 5.5.4-k2-NAPI
Copyright (c) 1999-2004 Intel Corporation.
ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 24 (level, low) -> IRQ 177
divert: allocating divert_blk for eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI interrupt 0000:03:01.0[A] -> GSI 28 (level, low) -> IRQ 185
ip_tables: (C) 2000-2002 Netfilter core team
divert: allocating divert_blk for eth1
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ip_tables: (C) 2000-2002 Netfilter core team
hw_random hardware driver 1.0.0 loaded
USB Universal Host Controller Interface driver v2.2
ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 193
uhci_hcd 0000:00:1d.0: UHCI Host Controller
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: irq 193, io base 0xbce0
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
usb 1-2: new low speed USB device using uhci_hcd and address 2
NET: Registered protocol family 10
Disabled Privacy Extensions on device c032a200(lo)
IPv6 over IPv4 tunneling driver
divert: not allocating divert_blk for non-ethernet device sit0
ip_tables: (C) 2000-2002 Netfilter core team
hiddev96: USB HID v1.10 Device [American Power Conversion Back-UPS RS 1500
FW:8.g9 .D USB FW:g9] on usb-0000:00:1d.0-2
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
eth0: no IPv6 routers present
ACPI: Power Button (FF) [PWRF]
ibm_acpi: ec object not found
EXT3 FS on dm-0, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 2031608k swap on /dev/VolGroup00/LogVol01.  Priority:-1 extents:1

So any idea why it crashed??  Is 5.9 gb memory too much for this kernel?
Are there other config that should be changed?

I have now deployed the second server (mounting /home from server1) so that
should help keep the load down on the main server but what else can I do to
keep from crashing again???

Thanks,  Mary Jo Spencer, Technology Coordinator, Stratham Memorial School,
Stratham, New Hampshire
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/k12osn/attachments/20060517/c3779aa0/attachment.htm>


More information about the K12OSN mailing list