[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Nasty ext3 errors 2.4.18

On Dec 14, 2002  14:24 -0000, Glen Cumming wrote:
> I've included below some of the debug output from the kernel below,
> there is a lot of it so I've only included the different types of errors
> reported (with times when the problems started)

> [KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_branches: Read
> failure, inode=3567502, block=-1576348012]
> :
> :
> [KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing
> blocks not in datazone - block = 1874129395, count = 1]
> [KMSG:<0>Assertion failure in do_get_write_access() at
> transaction.c:589: "handle->h_buffer_credits > 0"]

The assertion is easy enough to explain - ext3 reserves a fixed number
of journal blocks to do a transaction, but as a result of the many
errors that happen it runs out of reserved blocks while correcting the
errors before it can complete the operation.  I'm not sure whether that
should be "fixed" or not (e.g. we could try to extend the transaction
if we hit such an error case), because having so many errors is just asking
for further corruption down the road.

> The only other thing to note is that there was a panic on kswapd a
> number of hours earlier - but I've seen these on other systems running
> 2.4.18 and they don't seem to cause any problems (I think).

Well, any kind of prior error is usually a bad sign, because it could
mean memory corruption, not enough free memory to do operations, etc.

> As I've mentioned I've seen the same behavior before on other systems,
> the specs for all of them are:
> Abit ST6 Motherboard with 1.2 Gig Celeron
> 2 x disks (varying sizes and makes)
> 128Meg Ram
> AGP Graphics Card
> Ethernet
> Bt848 capture cards (2-3 depending on customer)

One option is always to disable DMA on the IDE chipset in case that is a
source of problems.  Not to deny the possibility that the error is in
ext3, but it is also possible that the problem is in the capture cards
or drivers, or bad interaction on the PCI bus or something.

> I'm really pulling my hair out - I don't know why they are doing this -
> these are all on customer sites (they never go wrong in the office, each
> one that have gone bad has been in different environments i.e. warm,
> cold, no power spikes or anything reported) - and at the moment as you
> can imagine we are not flavor of the month so I really need to come up
> with a bullet-proof plan (one customer is one his second box, which did
> the same as the first after 2 days - it ran in our office for 2 weeks no
> problems!)

It may be load related, if you are not stressing the box as much as the
customers are...  Is it possible to configure only a single capture card
in a box for some period of time?

Cheers, Andreas
Andreas Dilger

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]