starting Fedora Server SIG

Doug Ledford dledford at redhat.com
Wed Nov 19 20:25:22 UTC 2008


On Wed, 2008-11-19 at 13:24 -0600, Les Mikesell wrote:
> Doug Ledford wrote:
> No, what I'm saying is that to manage servers you need a way to identify 
> NICs that is not arbitrary.  And that changing the convention, 
> particularly in arbitrary ways, is expensive.  All previously working 
> procedures have to be re-invented and re-tested.  And this is 
> particularly a problem when the differences in new behavior are subtle 
> and only appear randomly after the copied disk is moved to its remote 
> location.

As I pointed out, mac addresses aren't arbitrary.  For example, on
certain boxen that have two embedded nics, the box will label one as the
first and as the second interface, and generally will tell you the mac
address for each.  Depending on the kernel version or kernel pci
options, they might be found in the right order or swapped.  But
assigning them by mac always gets it right.

> Agreed, but when the kernel hardware detection order was predictable, 
> this was simple.  Now it isn't.

The detection order was only predictable in the sense that it didn't
change, not that it was always right.  The 2.4 kernel got some of them
swapped, we just never corrected it in the 2.4 kernel series.  Note:
this isn't to refute what Jesse wrote, BIOS updates could cause a flip
flop, but we didn't alter sorting order in the kernel in the 2.4 series.
On the other hand, I *did* alter sorting order for aic7xxx adapters
during the 2.4 kernel series, but I included a kernel module option to
revert the change as needed.

> >  hether you
> > customize the disk by cloning and editing, or by using something like
> > cobbler to clone the install via a profile and then have cobbler
> > customize the addresses based upon its database is merely
> > implementation.  And that's my point, there are better implementations
> > to be had.
> 
> Errr, doesn't having to build server to run cobbler before you can 
> install your real server make this a circular argument too?  Assuming I 
> wanted a cobbler server at every remote location instead of shipping a 
> pre-configured disk, how would I build it when it needs a cobbler server 
> first?

No, this goes to the statement I made about some amount of setup cost
that then is made up for in time savings in the future.  And you only
need one cobbler server.  You can install the image on a local machine
and then ship the disc off (or if you can't, the concept of a local
install box used to create disc images destined for a remote box that
has a different mac address than the local install box wouldn't be a
hard thing to add to cobbler, but since you haven't tried it and haven't
brought up this need, it might not be in there yet).

> >>  and ship a box of them to remote locations where anyone could 
> >> insert them into a chassis and have them come up working.  Now it 
> >> doesn't work and I either have to know every target mac address and 
> >> match up the disks or talk some 'hands-on' operator through the setup 
> >> with a console on a crash cart to get the address assignment done.  Is 
> >> that an improvement?
> > 
> > Failure to use tools that automate this sort of thing is not a valid
> > indictment of the infrastructure that's been put in place.
> 
> What tools don't involve the bootstrap problem - or are suitable for 
> isolated remote servers?  Or maintaining a diverse set of OS's?

See above.  And cobbler has experimental support for SuSE and debian
systems.  I'm sure the authors would correct any bugs you might run
across in trying to use cobbler with those.  As for non-linux OSes,
especially windows, cloning is probably the right call.

> > You could have done a small
> > post script during a kickstart install to rectify this.  One loop to
> > modify all the device labels of filesystems to a unique label based
> > upon, say, hostname + mount point, eg. firewall-10.0.1-root as a
> > filesystem label combined with modifying the entries in fstab, then a
> > final line to rebuild the initrd image.  This sort of thing can be
> > automated easily in cobbler such that the default kickstart template
> > need not know about each machines name/purpose, you can use variable
> > substitution to do what you want.
> 
> Lovely.  I just sit around waiting for extra work like that - especially 
> version-specific stuff.

Like I said, it's a one time setup cost that is amortized over its
reuse.

>   And it misses the point that I want to be able 
> to shove a disk into chassis slots in a certain order and know what to 
> call a partition on a particular physical disk regardless of where it 
> was used before.

The unique filesystem labels I mentioned previously would achieve this
result too.

>   Plus,  the concepts are wildly different when you use 
> md devices (and probably LVM's too but I've avoided those completely).

md devices are 100% discover by label, not by disk partition.  It scans
the partitions and assembles based upon the labels therein.

> > Actually, it has created less work for those people that utilized the
> > tools that have been created to automate these things.  In your case,
> > you already mention having to go in and hand edit network settings any
> > time you clone a disk for a new machine.  That's not 0 work.
> 
> It is the minimal amount of work to get a correct setup.  The 
> information needed to set the hostname and IP addresses has to be known 
> and entered somewhere.

Yes, and it has to be redone every time you reimage a disk for that
machine.  On the other hand, when you enter the same information in
cobbler, it can (optionally) enter the information in your dhcpd.conf or
dnsmasq.conf, enter the information into your named zone file, and the
machine is now permanently in the database so any time you reimage that
same machine, it's all there ready to go.

> > Yet, with
> > things like cobbler, there is a certain amount of work to get things set
> > up initially, but once that's done, the amount of work goes down.
> 
> Please start from scratch here.  How does that cobbler server get 
> installed?  How does it make it less work to get from the person who 
> knows the hostname and IP addresses to the machine than entering it 
> directly?  What if you want to replace the OS with a platform cobbler 
> won't install?

See above.

> > Your
> > real complaint is that your work has gone up *because* you choose not to
> > make use of these tools or better methods of doing things.
> 
> Yes, I choose not to use them because they aren't appropriate for my 
> usage.  They require a large amount of infrastructure work and only 
> serve a specific OS - and probably only one or a few versions before you 
> have to rebuild your infrastructure again.

Not true.

> > I don't know
> > what to say to that.  If your going to do things the 1980s way and no
> > other, then I'm not sure there's anything that anyone can do to make
> > your life easier.
> 
> What can possibly be easier than typing 'dd'? 

Not having to type dd and then edit the results to get the right image?

>  I like unix-like systems 
> because everything is a file or a stream of bytes and those don't take 
> specialized tools to manage.

Nor do they fill in the right information for you.

>   If you want a copy of something, you copy 
> it, including the raw disk containing the whole system.  If it becomes 
> something that takes specialized tools to touch every specialized device 
> in its own special way, I won't be interested any more.
> 
> Actually I use drbl and clonezilla to make most copies because it really 
> is easier than typing 'dd', but that's a practical, not a philosophical 
> choice - the effect is the same.  The tool simply has to be able to 
> handle multiple OS versions and the bulk of our systems are still 
> running windows.
> 
> >>>> For people who have already automated these processes, try not to screw 
> >>>> it up too badly.  If the way it is done now didn't work, we wouldn't 
> >>>> have an internet.
> >>> I'm sure it won't screw existing setups unless there is an overwhelming
> >>> compelling reason.
> >> But the changes you mentioned already have.
> > 
> > Not for those of us doing things in any way other than the old way.
> 
> How do you deal with overlap?  How much human time did it take to 
> maintain working services on a large set of machines across the changes 
> from, say RH7.3 (probably the first really reliable version) up through 
> current?  How much of that 'other way' is useful in a mixed OS environment?

In a mixed linux environment, pretty much all of it.  For other OSes,
not so much.

> >> I don't want dynamic devices on my servers.  I want to know exactly what 
> >> they are and how they are named by the OS.  And I want a hundred of them 
> >> with image-copied disks to all work the same way.
> > 
> > But that's the fallacy of your argument, things *didn't* work that way,
> > ever.  At least not under linux.  A device failure could cause sdb to
> > become sda,
> 
> Ummm, OK - so are you implying that having a label on a partition on sda 
> is useful in that circumstance?  Things that break just have to be 
> replaced before they work again.

Except that this isn't the only situation.  You brought up the other one
yourself in that putting more than one disk in a machine is a valid use
case too.  As I pointed out, unique device labels eliminates the need to
know for certain if the two disks you added went in as sdb and sdc or
sdc and sdb.

>   The way md devices work is sort-of ok, 
> if you've handled the special case for booting, but they worked that way 
> all along.   I'll agree that linux got most of the things it didn't copy 
> from sysvr4 wrong in the first place including scsi drive naming, but 
> changing 'detection order' naming to 'labels likely to be duplicates' 
> isn't a good fix.

Unique labels is though.

> > or a BIOS or kernel update could cause eth0 and eth1 to flip
> > flop. 
> 
> Kernel updates didn't do that until the 2.6 series.  And bios updates 
> usually don't take me by surprise.
> 
> > The changes that were made were to deal with real world
> > situations that you get to ignore because you tightly control your
> > setups. 
> 
> Yes, I'm running servers.  You know - the big use for Linux...  I have 
> old systems and new systems running simultaneously.  I want procedures 
> that don't require changing everything at once or training operators to 
> know the difference between versions for concepts that have not really 
> changed - like mounting a disk partition or assigning an IP address to 
> an interface.
> 
> > If you embraced some of these changes and worked *with* them
> > instead of disabling them, then you might be able to loosen up some of
> > that control and find that things still work like they are supposed to.
> 
> I have very little interest in converting to procedures that only work 
> with one or a few versions of one distribution of one OS.

As I mentioned before, this isn't an accurate assessment of the support
cobbler provides.

>   I'd be 
> _slightly_ more interested if there were a clear development path toward 
> those procedures as there once was before the RHEL/fedora split.  For 
> example, back in the old days I could work on procedures and local 
> applications on RHX.0 and not have too many surprises by the time it was 
> production-ready around X.2.  With current fedora, there's no way to 
> know what to expect to flow into an EL version or prepare for it.
> 
> >>   Some tools to deal 
> >> with the changes being made could help with this but so far I haven't 
> >> seen any.
> > 
> > I'm sorry, but you must not have looked very hard.  The tools are there,
> > and they do a damn fine job.
> 
> Which tool besides clonezilla is good for cross platform work?  Are 
> there even tools for a specific purpose like replacing a set of RHEL3 
> servers with RHEL5 equivalents, maintaining the existing IP addresses on 
> several interfaces each?  I eventually came up with something to scarf 
> all the old ones from the running systems along with the corresponding 
> mac addresses and included them in the clonezilla image with a script to 
> patch things up but it wasn't pretty.

Yes.  Cobbler allows you to retarget a machine.  If a specific system
was registered as a rhel3 system you can simply change it's parent
profile to rhel5 and it will preserve all the ethernet information, etc.

> > I really think this all boils down to one simple fact: Fedora is
> > supposed to bleeding edge, and that includes improving upon old, tired
> > ways of doing things for better ways, and that seems to be anathema to
> > you.  I hate to say it, but it sounds like what you really want is not
> > Fedora, but OpenSolaris.
> 
> Agreed - I'd switch in a second if someone packaged it with drivers for 
> all my hardware and the same userland we've been using for years.  You 
> can see a history of understanding servers there that is missing in Linux.
> 
> > I'm afraid that as long as you want to
> > maintain your setup the same as it has been for decades, that Fedora and
> > the direction Fedora is heading in is going to continue to frustrate
> > you.
> 
> When something has been working for decades why would anyone want to 
> change it?  And if it hasn't been working, why even look at a unix-like 
> system in the first place?

Seriously?  It has to be 100% right the first time or move on?  Don't
try to fix anything that's not right?

>  > And I really hate to say that, because I *want* Fedora to be all
> > things to all people, but realistically it can't.  And in this
> > particular conflict, it's a case of "we have real world problems from
> > some users so we fix those problems with a better way of doing things"
> > versus your case of "in my particular world, these problems don't exist,
> > so don't change things around" and I just don't know how to rectify
> > those two positions.
> 
> The way to do it is to have the kernel and the hardware use predictable 
> but unfriendly conventions for the 'real' names that connect drivers to 
> hardware

Done.  For eth0 on my machine lspci reports:
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

I can use that to go to /sys/bus/pci/devices/0000:04:00.0 and in there I
find a directory call net and in that directory is a directory named to
match the system name of the device (in my case eth0).  So no matter
what device name this would have gotten, if I want the ethernet device
in the particular pci slot this device occupies, going into the net
directory gives me that name.  A similar convention can be used to get
to any scsi device by pci device, then scsi controller, then host
number, channel number, target number, lun.

>  and some optional intermediate user level daemon that maps them 
> to a friendly name in case there is a human involved instead of a script 
> - like the old /dev/cdrom symlink.  In any case you need to look at some 
> worst-case scenarios before applying any change to decide if it really 
> helps anything or not.

Done.  udev does this.

> Will any of the changes involving friendly identifiers for partitions 
> help me when I connect a new unformatted drive?  Will any help with 
> mounts that are done over nfs or cifs?  What about iscsi?  If I have to 
> identify a new raw disk myself to make the partitions and filesystems 
> when adding it, why do you think I need different terminology to 
> identify the partitions  after that step is done?

See above about putting two disks in a machine to do whatever to them,
one of your use case scenarios.

> Likewise with network interfaces: when what I want is some particular 
> vlan from a trunk, will the changes help with that?

Going by mac address helps, certainly.  The eth names can flip flop all
day long, but in the end you know that cable X  with vlan Z is plugged
into the eth port with mac Y and you can set things up that way.

> > In the end, Fedora is going to be true to its
> > goals, including being bleeding edge and fixing things that are broken,
> > and I don't think it's even possible to stop the march of that progress
> > for the sake of your particular setup and working habits.  We can
> > attempt to, but sometimes things simply must be changed in order to deal
> > with other people's reality.
> 
> OK, but if you make server management harder or more specialized as a 
> result of changes that only matter to desktop clients, don't expect 
> anyone to run them.  I think you were on to something with OpenSolaris.

Harder, I don't think so.  Different, yes.  Requires learning, yes.  But
I don't think it's truly harder.  If anything, it requires less
specialized knowledge and custom hackery to get things done.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20081119/08e9489f/attachment.sig>


More information about the fedora-devel-list mailing list