F7: Trying to figure out why kernel crashes with journal commit I/O error
Gilbert Sebenste
sebenste at weather3.admin.niu.edu
Mon Oct 8 15:22:37 UTC 2007
Hello Howard,
>> OK, within 12 hours after startup of the new machine running identical
>> software that the other slower machines are running with the exact same
>> data feed, I get
>>
>> kernel: journal commit I/O error
>>
>> I can log in, but can't do commands. A manual power-down (shutdown -r now
>> won't work) and reboot clears it fine.
>>
>> First I suspected a hard drive error on both machines. But then
>> replacement hard drives came in. It seemed to stop the problem for a few
>> days, so I closed a bugzilla I had. Nope, this weekend, it went back to
>> crashing every 4-18 hours.
>>
>> I tried to cut the read-writes in half, to no effect, by reducing the
>> amount of data/files coming in.
>>
>> I have:
>>
>> Replaced the hard drive 3 times with new ones (to no avail)
>>
>> Reduced the read/writes by around half
>>
>> Turned off legacy USB support, which also caused my keyboard and mouse to
>> stop working with errors (that's been cleared and is OK)
>>
>> Filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=318661
>>
>> Tonight, I tried using the original kernel that came with F7
>> (2.6.21-1.3194.fc7) instead of the latest (2.6.22.9-91.fc-7).
>> As of two hours into this, so far so good, but I'm not confident.
>>
>> Two other machines, Pentium 4's at 3 GHZ with ASUS motherboards, purr like
>> a kitten.
>>
>> Has anyone seen anything like this, or know what could be the problem?
>>
>> As always, grateful for any help, and thanks for reading this!
>>
>> Gilbert
>>
>> *******************************************************************************
>> Gilbert Sebenste
>> ********
>> (My opinions only!) ******
>> *******************************************************************************
>>
> I would suspect a hardware issue with the motherboards as my first port of
> call. I have had a similar problsm with a new Pentium 4 board recently where
> the ATA disc interface offlined every 18 hours of so but hvaing replaced with
> a SATA drive the system purrs for weeks.
On two new PC's? Showing identical symptoms? I find that hard to believe.
But on the other hand...
> Secondly the kernel version may be important - core 2 quad processors are
> newish so later kernel SHOULD have better support. Maybe try a development
> kernel on one of the machines e.g. 2.6.23.-----
This is what I am wondering...if it *is* the kernel, udev, or something
like that. This thing has 2 gb/sec throughput...it shouldn't be doing
this.
> Finally, have you run a full FSCK on the drives after they fail - reboot into
> single mode and run fsck -f. You may find that the problem is a disc
> structure corruption ... then you have to find out why.
I need to do that...thanks for the reminer.
> You do not say which journalling file system you are using - is this ext3,
> jfs, reiserfs, ...
ext3.
> Finally, have you run memtest86+ on these machines - possible memory dropout
> going unnoticed (especially if they do not have ECC memory)
Not yet. But I can tell you "top" gives the full 4 GB it says I have. Of
course, that doesn't mean much. Again, I find it very difficult to believe
that two machines will have this problem. That said, I'm not ruling out
anything.
> Note sure if this will help but hope it is not just noise.... >
No, it helped, thanks. Any other suggestions, I'll take them.
*******************************************************************************
Gilbert Sebenste ********
(My opinions only!) ******
*******************************************************************************
More information about the fedora-list
mailing list