2 different FC2 issues: >4GB of RAM, and dead USB ports
John Burk
jburk at mainframe.ca
Sat Feb 5 01:16:40 UTC 2005
You'll have to excuse the lack of continuity in this message; this is
cut&pasted from a much longer msg posted to the IBM xcat
(XtremeClusterAdminTool) user's forums. I didn't want to drown everyone
in off-topic noise.
------------------------------------------------------
Issue 1 - FC2 and >4GB of ram panics the kernel
------------------------------------------------------
We're now able to install rhfc2 to the new blades. Installation
finishes, the blades re-boot to a login prompt, but the kernel panics
and crashes within a few minutes. Pulling out the 1GB of ram causes
that panics to not occur. Installing rhes3 on the blades with 5GB
installed also produces _no_ kernel panics, so it's not bad ram. It
seems to be related to rhfc2 and >4GB of ram.
Kernel version on the blades is 2.6.5-1.358smp, release version is
"Fedora Core release 2 (Tettnang)".
Should I bother re-compiling a kernel on a blade with 4GB and rhfc2
installed, and then get xcat to install it onto the blades as a postscript?
Or is there something else I should try first?
------------------------------------------------------
Issue 3: - FC2 kernel kills the USB ports on the blades, rendering local
console unresponsive - not xcat-related at all. An install of rhes3 on
the blades does not exhibit this behavior, so it's definitely FC2-related
------------------------------------------------------
This is a strange one, and I can't figure out what's causing it...
I experienced the same issue when I built my 336 mgm't node. I had a
ps/2 keyboard and usb mouse during the os installation, and the mouse
would be responsive up to the grub stage. Once the kernel booted, the
led on the optical mouse would go out, and the mouse would be dead. If
I was using a usb k/b, it would be dead, too. All the usb ports were dead.
I couldn't troubleshoot it within an hour of poking around on the
redhat/fedora user forums, so I bought rhes3 for the 336 and made all my
problems go away. Now I see the same problem on my new blades; I knew
the fix on the 336 was too easy :-) But running Redhat Enterprise
would be too expensive for a 300-blade installation, even if I only
bought rhws.
[root at r01c01b01 configs]# dmesg | grep -i someth
uhci_hcd 0000:00:1d.0: host controller process error, something bad
happened!
uhci_hcd 0000:00:1d.1: host controller process error, something bad
happened!
[root at r01c01b01 configs]# lspci | grep -i usb
00:1d.0 USB Controller: Intel Corp. 6300ESB USB Universal Host
Controller (rev 02)
00:1d.1 USB Controller: Intel Corp. 6300ESB USB Universal Host
Controller (rev 02)
/var/log/messages on the xcat mgmt' node shows:
=====================================
Feb 4 11:43:19 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
process error, something bad happened!
Feb 4 11:43:19 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
halted, very bad!
Feb 4 11:43:22 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
process error, something bad happened!
Feb 4 11:43:22 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
halted, very bad!
Feb 4 11:43:24 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
process error, something bad happened!
Feb 4 11:43:24 r01c01b01 kernel: uhci_hcd 0000:00:1d.0: host controller
halted, very bad!
Feb 4 11:43:24 r01c01b01 kernel: usb 1-2: control timeout on ep0out
Feb 4 11:43:26 r01c01b01 kernel: usb 1-2: device not accepting address
2, error -110
=====================================
--
John Burk
Sr. Technical Director
Mainframe Entertainment
604.628.1019
More information about the fedora-list
mailing list