F7: Trying to figure out why kernel crashes with journal commit I/O error
Howard Wilkinson
howard at cohtech.com
Mon Oct 8 07:47:25 UTC 2007
Gilbert Sebenste wrote:
> Hello all,
>
> I am having an absolutely vexing problem that maybe somebody might
> shed some light on.
>
> I just got 2 new computers, both running F7. They each have one
> Seagate 750 GB SATA 3 Gb/s, 7200 RPM, 16 MB drive. Each machine has 4
> GB of RAM, Core 2 quad 6700 motherboard from ASUS.
>
> OK. I run the computers pretty hard. But I have two Pentium 4's who
> work just as hard, all getting a 20 MB/sec peak (1 MB/sec avg) weather
> feed from the National Weather Service, flawlessly for months until I
> install new kernels on it and reboot.
>
> OK, within 12 hours after startup of the new machine running identical
> software that the other slower machines are running with the exact
> same data feed, I get
>
> kernel: journal commit I/O error
>
> I can log in, but can't do commands. A manual power-down (shutdown -r
> now won't work) and reboot clears it fine.
>
> First I suspected a hard drive error on both machines. But then
> replacement hard drives came in. It seemed to stop the problem for a
> few days, so I closed a bugzilla I had. Nope, this weekend, it went
> back to crashing every 4-18 hours.
>
> I tried to cut the read-writes in half, to no effect, by reducing the
> amount of data/files coming in.
>
> I have:
>
> Replaced the hard drive 3 times with new ones (to no avail)
>
> Reduced the read/writes by around half
>
> Turned off legacy USB support, which also caused my keyboard and mouse
> to stop working with errors (that's been cleared and is OK)
>
> Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
>
> Tonight, I tried using the original kernel that came with F7
> (2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
> As of two hours into this, so far so good, but I'm not confident.
>
> Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr
> like a kitten.
>
> Has anyone seen anything like this, or know what could be the problem?
>
> As always, grateful for any help, and thanks for reading this!
>
> Gilbert
>
> *******************************************************************************
>
> Gilbert Sebenste
> ********
> (My opinions only!)
> ******
> *******************************************************************************
>
>
I would suspect a hardware issue with the motherboards as my first port
of call. I have had a similar problsm with a new Pentium 4 board
recently where the ATA disc interface offlined every 18 hours of so but
hvaing replaced with a SATA drive the system purrs for weeks.
Secondly the kernel version may be important - core 2 quad processors
are newish so later kernel SHOULD have better support. Maybe try a
development kernel on one of the machines e.g. 2.6.23.-----
Finally, have you run a full FSCK on the drives after they fail - reboot
into single mode and run fsck -f. You may find that the problem is a
disc structure corruption ... then you have to find out why.
You do not say which journalling file system you are using - is this
ext3, jfs, reiserfs, ...
Finally, have you run memtest86+ on these machines - possible memory
dropout going unnoticed (especially if they do not have ECC memory)
Note sure if this will help but hope it is not just noise....
--
Howard Wilkinson
Phone:
+44(20)76907075
Coherent Technology Limited
Fax:
23 Northampton Square,
Mobile:
+44(7980)639379
United Kingdom, EC1V 0HL
Email:
howard at cohtech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20071008/914d48b5/attachment-0001.htm>
More information about the fedora-list
mailing list