Silicon RAID controler

Mon Sep 26 16:01:33 UTC 2005

On Mon, 2005-09-26 at 09:57 +0200, Heinz Mauelshagen wrote:

> The names are persistant as well, which prevents you from nasty
> name changes leading to eg, filesystems mounted on different mountpoints
> after a reboot.

Long term, we really shouldn't rely on names, non-unique labels, or path
for device files during system booting (or many other cases) at all.
The whole "persistent device node name" idea is the wrong way.  They're
good for sysadmins referring to objects that have been discovered and
are "part of" a running system, but horrible for discovery (among other
tasks) in the face of multipath, raid, lvm, shared storage, and moving
drives between machines.

I'm working on a more comprehensive way of doing this, since there are a
lot of requirements that the current methods don't satisfy very well.
Basically, we need to store more than one "level" of data describing the
FS:

. VPD based UUID/WWID for the drive itself (a hint; explained in
  detail below)
. partition number (another hint)
. uuid of a joined raid set
. uuid of a volume group + name of an LV (lv name is unique within
  a vg, but vg name is not unique.  lvm tools don't _really_ allow for
  this yet without some shenanagans, but it is critical that they
  provide this functionality eventually)
. uuid of an fs/swap area/etc (for verification more than discovery)

So for each FS we need to mount, we have its data, and the data of each
of its expected containers.   Then the process for discovery during
system boot, and possibly later as well, becomes something like

Look for vpd's we know we're allowed to access
see if we can build containers with them
if so, do so, look for filesystems
  if we find the filesystems: mount, done
  else iterate drives without the vpd we're looking for
    for each, try to see if we can build containers
      if so (same as above)
    iterate until we're out of new drives or FSes to mount

Which gives us some nice features:
  resiliency in the case of: pv move, parted's "move",
    backup+restore to new device
  resiliency for having two devices with the same label in
    the same box (drives moved for recovery from another system,
    shared scsi, crappy SAN),
  no scanning of on-disk data that we don't have to in many cases
    (so, for example, we don't spin a whole SAN up just looking for
     "/")

(I'm ignoring the network filesystem/cluster lvm case, because it is
both simpler than this and completely orthogonal.  Also ignoring
multipath here, but it's fairly straightforward in many ways as well)

So basically, "/dev/sda1" should only really be used by root when
performing extreme measures, by os installers, and by similar tools.  In
the general case nothing should care about it, or about the location of
a filesystem.  At the same time, physical identifiers are _only_ hints;
raid uuids should be used to assemble raids, vg uuids to assemble VGs,
filesystem uuids for mounting.  That's obviously not an incredibly
simple state for us to get to, though ;)

I'll let you know how progress goes.

(and yes, I know that there are those among you who would argue to use
udev for much of this.  I don't want to hear about that ;)
-- 
  Peter