misleading error message

Rick Stevens rstevens at vitalstream.com
Tue Jul 11 02:10:11 UTC 2006


On Sun, 2006-07-09 at 12:47 -0400, Chuck Kollars wrote:
> The error message "There was an error installing [package].  This can
> indicate media failure ..." can apparently mean more than one thing.  It
> can mean what it says: typically a problem with the CD drive or the CD. 
> It can _also_ mean some other subtle problem with the hardware was
> encountered during installation.  You can even encounter both problems
> on the same system, which is quite confusing (just one error message 
> provoked by two totally separate problems).  Many people never see the
> message, but a few people see it so often they can't get past it, 
> especially when they're trying to "repurpose" very old hardware.
> 
> I wish RedHat's install provided just a little more information to help
> disambiguate this error message.  First, when testing CD & media, please
> provide some measure of the quality of the CD drive & media (perhaps a
> scale of 0-10) rather than just a PASS/FAIL indication.

There is no way to "rate" the media on a scale.  It either works or it
doesn't.  It isn't an analog media like a tape or LP where you can
create something like a signal-to-noise ratio.

>   And second,
> when the error message occurs tell us just a little more, such as which
> devices had interrupts outstanding or which devices had IO operations 
> that weren't completed.

That's all available from the console window and the "dmesg" command.

> I first encountered this message because I had _two_ flaky CD drives and
> some too-cheap CD blanks.  Swapping out a suspect CD drive but still
> seeing the same error was quite misleading.  I eventually fetched a
> _third_ CD drive and nailed that problem.  Don't skip the media check;
> CD and CD drive problems are the most common (but not the only) cause 
> of this error.

As is bad DMA hardware.

> 
> But even though my media problem was fixed, the error message wouldn't
> go away.  In the absence of information, superstition took over: move the
> ribbon cables, put hard drives on different IDE channels to avoid any 
> possible interference, swap masters and slaves, put the cabinet cover on
> to reduce possible interference from outside, temporarily power off all 
> the fans (except the CPU) to eliminate any possible electrical noise, 
> replace the ribbon cables, switch to "cable select" or explicit jumpers,
> turn off Plug'n'Play, check every one of the BIOS settings, remove 
> possible antenna wires such as a CAT5 jumper cable, turn off the 
> electrostatic air filter that was in the room, switch to auto partition 
> or disk druid, switch to text or GUI, remove unnecessary cards to reduce 
> chances of power brownout, reseat all connectors, reseat the RAM, blow a 
> brand new BIOS, force all disk partitions to be reformatted, replace the 
> CMOS battery, etc.  I doubt any of that hurt  ...but none of it solved 
> the problem and none of it was necessary (in fact it risked introducing 
> yet another problem).

The FIRST thing you do on questionable hardware is make sure your
machine has the latest BIOS installed.  Badly handled DMA code in the
BIOS is a HUGE problem.

If you still have issues, try using "linux ide=nodma" on the boot line.
Turning off DMA on the CD drive can cure a raft of woes.  Don't blame
Linux or anaconda...blame the BIOS writers.

> Something about changing a ribbon cable did have a noticeable effect
> (delaying the problem a couple more minutes further into the install), 
> but in hindsight I think that was just a fluke not related to the real 
> problem.  After more fiddling it seemed the problem was somehow related 
> to a mouse interrupt and a CD drive interrupt happening at the same time, 
> but even knowing that didn't suggest to me what to do about it.

Lovely.  Yet another BIOS issue.  Interrupt mapping is also an issue
with some BIOSs.  Why in the hell a programmer can make the mistake that
mouse interrupts should have a higher priority than disk interrupts is
something that still astounds me.

> What finally worked was to remove a 256MB DIMM. The BIOS recognized the
> DIMM, and `memtest` said it worked fine on that motherboard.  But the 
> manufacturer's documentation for my hardware said only up to 128MB DIMMs 
> were supported.  So I took out the 256MB DIMM and tried the install 
> again and voilà.

Memtest is good, but not perfect, and when you REALLY pound on that RAM,
you'll find that the BIOS wasn't refreshing it often enough.

----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-     If you can't beat your computer at chess...try kickboxing!     -
----------------------------------------------------------------------





More information about the Redhat-install-list mailing list