F7: Trying to figure out why kernel crashes with journal commit I/O error

George N. White III aa056 at chebucto.ns.ca
Mon Oct 8 15:04:44 UTC 2007


On Mon, 8 Oct 2007, Gilbert Sebenste wrote:

> Hello all,
>
> I am having an absolutely vexing problem that maybe somebody might shed some 
> light on.
>
> I just got 2 new computers, both running F7. They each have one Seagate 750 
> GB SATA 3 Gb/s, 7200 RPM, 16 MB drive. Each machine has 4 GB of RAM, Core 2 
> quad 6700 motherboard from ASUS.
>
> OK. I run the computers pretty hard. But I have two Pentium 4's who work just 
> as hard, all getting a 20 MB/sec peak (1 MB/sec avg) weather feed from the 
> National Weather Service, flawlessly for months until I install new kernels 
> on it and reboot.

The P4 has been around for years, so that type of system has been pretty 
well tested.

> OK, within 12 hours after startup of the new machine running identical 
> software that the other slower machines are running with the exact same data 
> feed, I get
>
> kernel: journal commit I/O error
>
> I can log in, but can't do commands. A manual power-down (shutdown -r now 
> won't work) and reboot clears it fine.
>
> First I suspected a hard drive error on both machines. But then
> replacement hard drives came in. It seemed to stop the problem for a few 
> days, so I closed a bugzilla I had. Nope, this weekend, it went back to 
> crashing every 4-18 hours.
>
> I tried to cut the read-writes in half, to no effect, by reducing the
> amount of data/files coming in.
>
> I have:
>
> Replaced the hard drive 3 times with new ones (to no avail)
>
> Reduced the read/writes by around half
>
> Turned off legacy USB support, which also caused my keyboard and mouse to 
> stop working with errors (that's been cleared and is OK)
>
> Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
>
> Tonight, I tried using the original kernel that came with F7
> (2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
> As of two hours into this, so far so good, but I'm not confident.
>
> Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr like a 
> kitten.
>
> Has anyone seen anything like this, or know what could be the problem?
>
> As always, grateful for any help, and thanks for reading this!

Don't assume the problem is related to your heavy disk I/O.  Try some 
other workloads.  I like to run a suite of benchmarks on new hardware.
They often reveal problems with the initial setup, and are helpful
later on when something seems broken, e.g., why did the last kernel
update cause disk I/O to slow by 50%?

Are you using x86_64 kernels?  I suspect most people with similar 
workloads will be using x86_64, so you may be encountering problems 
specific code that hasn't been thoroughly exercises on i386 kernels.  In 
the past, there have been problems with RH's 4k stack size, particularly 
during error handling, that can mask the real source of the problem.
If you are really stuck with 32-bit kernels, you might try the 16k
versions from linuxant.



-- 
George N. White III  <aa056 at chebucto.ns.ca>




More information about the fedora-list mailing list