use/cause of ide=nodma boot parameter?

Nifty Hat Mitch mitch48 at sbcglobal.net
Sat Sep 18 00:41:42 UTC 2004


On Wed, Sep 15, 2004 at 09:50:13AM +0200, Pavel R wrote:

> John Best wrote:
> 
> >Through my difficult time trying to get started with FC2, I found that I
> >needed to use the ide=nodma boot parameter to get things to work.   Failure
> >to do so resulted in a system hang while loading the appropriate disk/io
> >drivers during boot/install.  My question to the group is..
> >
> >Is this caused by a misconfiguration of the IDE environment or other HW
> >problem?  Or is this just something that happens?  Does use of this
> >parameter cause/create any other issues (disable functionality, increase
> >risk, compatibility?)
> > 
> >
> I think that it just happens. Some drives do not work reliable with DMA. 
> This setting  supposed to be harmless in terms of 
> compatibility/functionality.
> 
> >Computer is ....

This is not something that 'just happens' it is a bug of unknown
nature that has yet to be isolated and fixed for this specific
hardware and specific set of drivers.

Turning off DMA is commonly safe.  It might consume
processor cycles...

I cannot speak to the specifics of your problem but in general DMA can
interact with cache memory in complex ways or interact with kernel
task queues in bad ways if the coherency model of the hardware is
mismatched to the code/driver.

Various chip sets manage memory and cache differently. Some chip sets
have latency/race issues to memory as well.

So to speculate, If cache and memory at address 0xABC contains data or
a flag and a DMA engine modifies memory at address 0xABC how can the
processor get the new data without the driver author triggering some
magic to update cache.  Some memory is DMA coherent some is not....

For some chip sets the DMA engines are shared and if the DMA engine is
not available when the disk needs to DMA data then either the data is
lost or the system stalls.  

If the DMA engine can modify data without synchronization with the
processor those data regions would be bad for mutual exclusion flags
or other critical code paths.  Data in registers saved on a stack in
not guaranteed equal to data in memory that the register was loaded
from.

Then there are issues with cache line size and data alignment...
DMA and read-modify-write is a tangle.

Device driver authors commonly use a set of standard functions that are
hardware aware and specific.  If a driver author establishes
additional code paths for data or device registers then there can be a
loss of correctness and bad things happen (like a hang).

By turning off DMA, cache memory consistency from the processor side
is simpler to manage and sharing of DMA hardware can be eliminated.

One hardware problem might be a DMA engine with a four cache line deep
FIFO to memory.  If the engine reports done (interrupts) when the last
cache line is in the FIFO but prior to the memory functional block
pushing that data out to memory a modern gigahertz processor can be
well down multiple speculation execution paths before the fourth cache
lines of DMAed bits gets committed to memory.

It is important to think of a DMA engine as a co-processor that
requires full synchronization little different than a second processor.

The larger the cache the larger the impact of such problems is.

Above I said that there was a bug....
Now the trick is to report that bug in a productive way.

Review the previous posts and draft a bug so folks can help you flesh
it out.  When a couple of the better folks here have had a go at it
then file the final draft.

The bug needs to be rich in facts.  
 unique descriptive subject.
 you see...
 your hardware is...
 your software is...
 your active drivers are...
 your work around is/are...

In a bug report do not speculate as I have above.
Try to not diagnose the problem in the initial report.
(You might in a subsequent conversation if you feel it can help.)
i.e. in the initial report skip the "I think" and  "I suspect" stuff.
Just the facts... save long data sets that might clutter the
initial report (stuff like "lspci -vvv") for attachments or a 
follow up message.   

Offer to help... and then help.


-- 
	T o m  M i t c h e l l 
	In the USA, vote informed, second Tuesday Nov 2004.





More information about the fedora-list mailing list