Swap problem with F9 PPC

Tue May 27 18:30:59 UTC 2008

Rick Stevens wrote:
> Dave Jones wrote:
>> On Mon, May 19, 2008 at 04:27:07PM -0700, Rick Stevens wrote:
>>  
>>  > Uh, oh.  Methinks I see an issue here.  First, I didn't know you could
>>  > use labels for swap partitions (since they don't have a real 
>> filesystem
>>  > on them).
>> They've supported labels for quite a while now. See the -L parameter
>> to mkswap.
> 
> Yeah, I see that and I realized it.  I guess I'm too old-school.  I
> used to hate labeling filesystems and using them in fstab, but with
> Linux' propensity now to identify drives in absolutely no predictable
> way whatsoever, it's almost a requirement.
> 
> If there are any kernel coders listening, would you guys PLEASE try to
> be consistent in the sequence you scan for drives?  Obviously, you know
> what interface is being used for each drive (since it ends up in /proc).
> You could at least PUBLISH what the sequence is.

The sequence is, essentially, random.  The main factors that contribute 
to the order the kernel registers drives in (off the top of my head, so 
possibly not complete), in order from most influential to, er..., most 
equally influential once the biggies are sorted out, is listed below. 
(By way of explanation I'll only consider two drives; I'll leave 
figuring out how the interaction becomes more complicated with more 
devices as an exorcize for the reader.)

1) module load order -- if two drives are controlled by two different 
types of hardware, they'll almost certainly wind up in the order that we 
load the modules.
2) physical controller -- two drives on the same physical bus will 
generally be *probed* in the bus-defined probe order.  Note that on some 
buses, that order isn't deterministic or necessarily reproducible. 
Depending on the bus and controller, this may mean that both are probed 
before *either* responds.
4) disk response time -- if you've got two drives on the same bus, and 
it supports transactions that don't require keeping the bus locked, and 
the first drive probed is much slower to respond than the second, then 
they'll sometimes wind up with whichever one is fast being registered 
first.  There are a couple of reasons this can happen:
  a) latency due to the disk hardware being built on the cheap,
     or just old.  I.e. the circuitry on one disk is just plain
     slow.
  b) spin-up delays and other effects seen while powering up.
  c) complicated bus topologies that cause the delivery of the
     message to vary (think multipath SAN here).
  d) bugs ;)  One example is ata_piix vs ahci (which operate on
     what is essentially the same hardware but in two different modes),
     combined with firmware setup bugs that cause us to see one device
     or another depending on which is loaded first.

The current philosophy in the kernel is to make getting the response 
from the drives into a non-blocking thread, so the  kernel registers the 
drive when it responds.  It doesn't usually fluctuate much with your 
standard "just a couple of SATA devices on the same controller" PC, but 
on larger systems it can actually fluctuate a great deal.

I'm sure I've left some stuff out, but you can see why it's fairly 
variable.  It's possible, from a sortof mathematical point of view, to 
make it sortof invariant, but that stands in direct opposition to doing 
it quickly.  Hence the move to "UUID=" wherever possible.

> Example: An F7 machine with one SATA drive and two IDE drives (both on
> the same IDE controller).  On one F7 kernel, the kernel picks them up
> as:
> 
>     sda: IDE primary
>     sdb: IDE slave
>     sdc: SATA
> 
> which is also the sequence the BIOS sees them.  On the next bloody
> kernel release:
> 
>     sda: SATA
>     sdb: IDE primary
>     sdc: IDE slave

This almost certainly means we're loading the drivers in the opposite 
order between the two... can you show us "lsmod" from each of the two OSes?

> What the hell?  Makes configuring grub or using grub-install a right
> pain for a tyro.  I can sort it out, but some of my less skilled friends
> had tons of problems with it.  F8 seemed sorta consistent, but I'm
> concerned that F9 may screw it up yet again with the 2.6.5-series of
> kernels.

I'm not sure why that would have changed between F8 and F9; there are 
reasons it would between F7 and F8 (the switch to libata being the most 
likely candidate.)

> C'mon.  Consistency and predictability are good qualities, guys.

Yeah, but sometimes they're pretty hard to attain.

-- 
   Peter