F7: Trying to figure out why kernel crashes with journal commit I/O error

Gilbert Sebenste sebenste at weather3.admin.niu.edu
Mon Oct 8 15:22:37 UTC 2007


Hello Howard,

>> OK, within 12 hours after startup of the new machine running identical 
>> software that the other slower machines are running with the exact same 
>> data feed, I get
>> 
>> kernel: journal commit I/O error
>> 
>> I can log in, but can't do commands. A manual power-down (shutdown -r now 
>> won't work) and reboot clears it fine.
>> 
>> First I suspected a hard drive error on both machines. But then
>> replacement hard drives came in. It seemed to stop the problem for a few 
>> days, so I closed a bugzilla I had. Nope, this weekend, it went back to 
>> crashing every 4-18 hours.
>> 
>> I tried to cut the read-writes in half, to no effect, by reducing the
>> amount of data/files coming in.
>> 
>> I have:
>> 
>> Replaced the hard drive 3 times with new ones (to no avail)
>> 
>> Reduced the read/writes by around half
>> 
>> Turned off legacy USB support, which also caused my keyboard and mouse to 
>> stop working with errors (that's been cleared and is OK)
>> 
>> Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
>> 
>> Tonight, I tried using the original kernel that came with F7
>> (2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
>> As of two hours into this, so far so good, but I'm not confident.
>> 
>> Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr like 
>> a kitten.
>> 
>> Has anyone seen anything like this, or know what could be the problem?
>> 
>> As always, grateful for any help, and thanks for reading this!
>> 
>> Gilbert
>> 
>> ******************************************************************************* 
>> Gilbert Sebenste 
>> ********
>> (My opinions only!)                                                  ******
>> ******************************************************************************* 
>> 
> I would suspect a hardware issue with the motherboards as my first port of 
> call. I have had a similar problsm with a new Pentium 4 board recently where 
> the ATA disc interface offlined every 18 hours of so but hvaing replaced with 
> a SATA drive the system purrs for weeks.

On two new PC's? Showing identical symptoms? I find that hard to believe.
But on the other hand...

> Secondly the kernel version may be important - core 2 quad processors are 
> newish so later kernel SHOULD have better support. Maybe try a development 
> kernel on one of the machines e.g. 2.6.23.-----

This is what I am wondering...if it *is* the kernel, udev, or something 
like that. This thing has 2 gb/sec throughput...it shouldn't be doing 
this.

> Finally, have you run a full FSCK on the drives after they fail - reboot into 
> single mode and run fsck -f. You may find that the problem is a disc 
> structure corruption ... then you have to find out why.

I need to do that...thanks for the reminer.

> You do not say which journalling file system you are using - is this ext3, 
> jfs, reiserfs, ...

ext3.

> Finally, have you run memtest86+ on these machines - possible memory dropout 
> going unnoticed (especially if they do not have ECC memory)

Not yet. But I can tell you "top" gives the full 4 GB it says I have. Of 
course, that doesn't mean much. Again, I find it very difficult to believe 
that two machines will have this problem. That said, I'm not ruling out 
anything.

  > Note sure if this will help but hope it is not just noise.... >

No, it helped, thanks. Any other suggestions, I'll take them.

*******************************************************************************
Gilbert Sebenste                                                     ********
(My opinions only!)                                                  ******
*******************************************************************************




More information about the fedora-list mailing list