[K12OSN] network reset ignores MACs

James P. Kinney III jkinney at localnetsolutions.com
Mon Nov 13 02:06:57 UTC 2006


To reply to my lengthy post with a solution:

The hardware is an HP DL385 system. The onboard Tigon NICs are where the
problem is. I have not resolved whether the issue is with the chipset
that controls the PCI bus or the Tigon chips themselves. 

If I removed all references to the ethX ordering from all files and just
loaded the drivers, the tigon NICs ALWAYS presented themselves as eth0
and eth1. OK. That makes sense as they are earlier in the PCI bus tree
and the kernel will find them before the add on cards.

The solution to this situation was to simply renumber the ethX devices
and let the Tigons be eth0 and eth1. The files /etc/modules.conf were
setup for "alias eth0 tg3" and "alias eth1 tg3" and we still put the MAC
address into the used rules in a new filed called
06-net_persistent_name.rules in /etc/udev/rules.d/ . The system has been
rebooted and the network restarted multiple times and no more
renumbering of devices and the modules unload from a stopped network
with one call to rmmod tg3 && rmmod e1000 .

Apparently the driver doesn't like to remap the number on up. The
e1000's are much more flexible. I did not have this issue on my test
system which uses Broadcom NICs on the mainboard and e1000's for the
add-on NICs.

Whew!

On Thu, 2006-11-09 at 23:50 -0500, James P. Kinney III wrote:
> The hardware:
> x2 dual-core Opteron 285
> x2 Tigon Gbit NICs on mainboard
> x2 dual port Intel e1000 cards in PCI-X slots
> 
> K12LTSP v. 5 w/ 2.6.18 kernel 64 bit
> 
> The issue:
> 
> During the setup of the clients, printers and teachers machines behind
> the K12LTSP server, I need to add the teachers machines on a static NAT
> for LANDesk (yeah, I know...) access by the central office. There is one
> ethernet port used to access the school LAN on this type of machine (of
> the remaining 5, 1 is a dedicated connection to the NFS server
> for /home, the other 4 are bonded for serving the thin clients through
> 24 port Gbit switches to up to 20 classrooms per server). I am using
> virtual ethernet ports and iptables rules. 
> 
> The only way I can find to get a virtual port to start is to restart the
> network service entirely. OK, it's down for just a few seconds. That's
> an improvement over the current setup :)
> 
> What I am seeing is the "service network restart" is not "clean". One
> restart, we have had issues with ethx becoming ethy or ethz. Yet we have
> the mac addresses listed in /etc/sysconfig/network-scripts/ifcfg-ethx. 
> 
> So we added rules in /etc/udev/rules.d (generated a new rules files and
> put in "KERNEL="eth*",SYSFS{address}=="MAC ADDRESS HERE",NAME="ethx""
> for all of the 6 ports. We even ordered the aliases
> in /etc/modprobe.conf to be in the proper order. 
> 
> If we take down the network and then try to remove the modules with
> rmmod, the command returns no errors, but the module is still showing
> loaded with lsmod. We noticed that is took exactly the number of network
> restarts to remove the modules using rmmod. So it looked like each
> network restart added alink to the modules. Yet the module still showed
> a 0 for the "used by" column.
> 
> Then we did all this again after a reboot and realized that what we had
> seen was an anomaly of counting. It took 2 rmmod's for the tg3 and 4 for
> the e1000. We have 2 Tigon (tg3) and 4 Intel NICS (e1000). Yet with the
> networking off, it seems there should be no more connection to the
> modules and they should just unload (I'm sending this to the kernel list
> shortly after I grep through list for anything similar).
> 
> So why is this an issue?
> 
> Because for reasons unknown, sometimes when the network is restarted
> during the teacher machine integration with the K12LTSP servers, the
> networking setup ignores the device numbers and MAC addresses and comes
> up a random mess. It doesn't clear up by just stopping networking and
> removing the modules and restarting networking. We have to keep pulling
> out the modules until they are _really_ unloaded and then restart
> networking.
> 
> But the strangest part of all is when ifconfig will change and show that
> what was an e1000 NIC is now a tg3 NIC. dmesg shows the tg3 is now
> trying to be the ethx of 2 of the e1000 NICs. And no "wrong module"
> errors appear anywhere.
> 
> We are _very_ close to tagging the hardware as flaky. It was not my
> hardware choice but it's what I have to work with (HP DL385) and I have
> 33 to install.
> _______________________________________________
> K12OSN mailing list
> K12OSN at redhat.com
> https://www.redhat.com/mailman/listinfo/k12osn
> For more info see <http://www.k12os.org>
-- 
James P. Kinney III          
CEO & Director of Engineering 
Local Net Solutions,LLC        
770-493-8244                    
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/k12osn/attachments/20061112/a2a9c6fd/attachment.sig>


More information about the K12OSN mailing list