[K12OSN] network reset ignores MACs

Petre Scheie petre at maltzen.net
Mon Nov 13 14:34:42 UTC 2006


You mention putting aliases in /etc/modules.conf; did you really use that file or is 
that a typo which should have been /etc/modprobe.conf?  I ask because 1) I thought 
modules.conf went away as of FC4, replaced by modprobe.conf; 2) Recently, on a test 
server, the onboard 100mb NIC was being seen as eth0 and the 1gb add-in card NIC was 
showing up as eth1, and I wanted them the other way around.  I *thought* it would be 
just a matter of changing the aliases in /etc/modprobe.conf.  But after changing 
modprobe.conf, neither NIC would work properly, and were still acting like they had 
their old settings.  I eventually got it to work by changing the aliases via 
system-config-network, with the resulting /etc/modprobe.conf looking like...what it 
looked like when I edited the file by hand.  Perhaps system-config-network makes some 
other changes, too (?).

Petre

James P. Kinney III wrote:
> To reply to my lengthy post with a solution:
> 
> The hardware is an HP DL385 system. The onboard Tigon NICs are where the
> problem is. I have not resolved whether the issue is with the chipset
> that controls the PCI bus or the Tigon chips themselves. 
> 
> If I removed all references to the ethX ordering from all files and just
> loaded the drivers, the tigon NICs ALWAYS presented themselves as eth0
> and eth1. OK. That makes sense as they are earlier in the PCI bus tree
> and the kernel will find them before the add on cards.
> 
> The solution to this situation was to simply renumber the ethX devices
> and let the Tigons be eth0 and eth1. The files /etc/modules.conf were
> setup for "alias eth0 tg3" and "alias eth1 tg3" and we still put the MAC
> address into the used rules in a new filed called
> 06-net_persistent_name.rules in /etc/udev/rules.d/ . The system has been
> rebooted and the network restarted multiple times and no more
> renumbering of devices and the modules unload from a stopped network
> with one call to rmmod tg3 && rmmod e1000 .
> 
> Apparently the driver doesn't like to remap the number on up. The
> e1000's are much more flexible. I did not have this issue on my test
> system which uses Broadcom NICs on the mainboard and e1000's for the
> add-on NICs.
> 
> Whew!
> 
> On Thu, 2006-11-09 at 23:50 -0500, James P. Kinney III wrote:
>> The hardware:
>> x2 dual-core Opteron 285
>> x2 Tigon Gbit NICs on mainboard
>> x2 dual port Intel e1000 cards in PCI-X slots
>>
>> K12LTSP v. 5 w/ 2.6.18 kernel 64 bit
>>
>> The issue:
>>
>> During the setup of the clients, printers and teachers machines behind
>> the K12LTSP server, I need to add the teachers machines on a static NAT
>> for LANDesk (yeah, I know...) access by the central office. There is one
>> ethernet port used to access the school LAN on this type of machine (of
>> the remaining 5, 1 is a dedicated connection to the NFS server
>> for /home, the other 4 are bonded for serving the thin clients through
>> 24 port Gbit switches to up to 20 classrooms per server). I am using
>> virtual ethernet ports and iptables rules. 
>>
>> The only way I can find to get a virtual port to start is to restart the
>> network service entirely. OK, it's down for just a few seconds. That's
>> an improvement over the current setup :)
>>
>> What I am seeing is the "service network restart" is not "clean". One
>> restart, we have had issues with ethx becoming ethy or ethz. Yet we have
>> the mac addresses listed in /etc/sysconfig/network-scripts/ifcfg-ethx. 
>>
>> So we added rules in /etc/udev/rules.d (generated a new rules files and
>> put in "KERNEL="eth*",SYSFS{address}=="MAC ADDRESS HERE",NAME="ethx""
>> for all of the 6 ports. We even ordered the aliases
>> in /etc/modprobe.conf to be in the proper order. 
>>
>> If we take down the network and then try to remove the modules with
>> rmmod, the command returns no errors, but the module is still showing
>> loaded with lsmod. We noticed that is took exactly the number of network
>> restarts to remove the modules using rmmod. So it looked like each
>> network restart added alink to the modules. Yet the module still showed
>> a 0 for the "used by" column.
>>
>> Then we did all this again after a reboot and realized that what we had
>> seen was an anomaly of counting. It took 2 rmmod's for the tg3 and 4 for
>> the e1000. We have 2 Tigon (tg3) and 4 Intel NICS (e1000). Yet with the
>> networking off, it seems there should be no more connection to the
>> modules and they should just unload (I'm sending this to the kernel list
>> shortly after I grep through list for anything similar).
>>
>> So why is this an issue?
>>
>> Because for reasons unknown, sometimes when the network is restarted
>> during the teacher machine integration with the K12LTSP servers, the
>> networking setup ignores the device numbers and MAC addresses and comes
>> up a random mess. It doesn't clear up by just stopping networking and
>> removing the modules and restarting networking. We have to keep pulling
>> out the modules until they are _really_ unloaded and then restart
>> networking.
>>
>> But the strangest part of all is when ifconfig will change and show that
>> what was an e1000 NIC is now a tg3 NIC. dmesg shows the tg3 is now
>> trying to be the ethx of 2 of the e1000 NICs. And no "wrong module"
>> errors appear anywhere.
>>
>> We are _very_ close to tagging the hardware as flaky. It was not my
>> hardware choice but it's what I have to work with (HP DL385) and I have
>> 33 to install.
>> _______________________________________________
>> K12OSN mailing list
>> K12OSN at redhat.com
>> https://www.redhat.com/mailman/listinfo/k12osn
>> For more info see <http://www.k12os.org>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> K12OSN mailing list
>> K12OSN at redhat.com
>> https://www.redhat.com/mailman/listinfo/k12osn
>> For more info see <http://www.k12os.org>




More information about the K12OSN mailing list