[K12OSN] network reset ignores MACs

James P. Kinney III jkinney at localnetsolutions.com
Fri Nov 10 04:50:16 UTC 2006


The hardware:
x2 dual-core Opteron 285
x2 Tigon Gbit NICs on mainboard
x2 dual port Intel e1000 cards in PCI-X slots

K12LTSP v. 5 w/ 2.6.18 kernel 64 bit

The issue:

During the setup of the clients, printers and teachers machines behind
the K12LTSP server, I need to add the teachers machines on a static NAT
for LANDesk (yeah, I know...) access by the central office. There is one
ethernet port used to access the school LAN on this type of machine (of
the remaining 5, 1 is a dedicated connection to the NFS server
for /home, the other 4 are bonded for serving the thin clients through
24 port Gbit switches to up to 20 classrooms per server). I am using
virtual ethernet ports and iptables rules. 

The only way I can find to get a virtual port to start is to restart the
network service entirely. OK, it's down for just a few seconds. That's
an improvement over the current setup :)

What I am seeing is the "service network restart" is not "clean". One
restart, we have had issues with ethx becoming ethy or ethz. Yet we have
the mac addresses listed in /etc/sysconfig/network-scripts/ifcfg-ethx. 

So we added rules in /etc/udev/rules.d (generated a new rules files and
put in "KERNEL="eth*",SYSFS{address}=="MAC ADDRESS HERE",NAME="ethx""
for all of the 6 ports. We even ordered the aliases
in /etc/modprobe.conf to be in the proper order. 

If we take down the network and then try to remove the modules with
rmmod, the command returns no errors, but the module is still showing
loaded with lsmod. We noticed that is took exactly the number of network
restarts to remove the modules using rmmod. So it looked like each
network restart added alink to the modules. Yet the module still showed
a 0 for the "used by" column.

Then we did all this again after a reboot and realized that what we had
seen was an anomaly of counting. It took 2 rmmod's for the tg3 and 4 for
the e1000. We have 2 Tigon (tg3) and 4 Intel NICS (e1000). Yet with the
networking off, it seems there should be no more connection to the
modules and they should just unload (I'm sending this to the kernel list
shortly after I grep through list for anything similar).

So why is this an issue?

Because for reasons unknown, sometimes when the network is restarted
during the teacher machine integration with the K12LTSP servers, the
networking setup ignores the device numbers and MAC addresses and comes
up a random mess. It doesn't clear up by just stopping networking and
removing the modules and restarting networking. We have to keep pulling
out the modules until they are _really_ unloaded and then restart
networking.

But the strangest part of all is when ifconfig will change and show that
what was an e1000 NIC is now a tg3 NIC. dmesg shows the tg3 is now
trying to be the ethx of 2 of the e1000 NICs. And no "wrong module"
errors appear anywhere.

We are _very_ close to tagging the hardware as flaky. It was not my
hardware choice but it's what I have to work with (HP DL385) and I have
33 to install.
-- 
James P. Kinney III          
CEO & Director of Engineering 
Local Net Solutions,LLC        
770-493-8244                    
http://www.localnetsolutions.com

GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/k12osn/attachments/20061109/833b9fe5/attachment.sig>


More information about the K12OSN mailing list