[K12OSN] LTSP - No clients can connect!

Phydeaux reb at taco.com
Thu Oct 28 19:40:10 UTC 2010


We're running LTSP on Fedora 13. We started having problems with some clients not
being able to connect and realized we had errors in our DHCP config.  At the same
time, we had some kernel failures on the server causing applications to not work. 
We rebooted and everything seemed fine so we applied the latest Fedora 13 updates,
which  included a new glibc and kernel. Clients could connect after this, but still
only about 50% of the time. The other 50% they failed.  Now we're in a state where
no new client connections ever succeed.  It appears that things are getting stuck at
the time the root filesystem is mounted on the clients.

We tried to fix the DHCP configuration issues (we were assigning static IPs from the
dynamic pool) and things went downhill.  The IPs are assigned fine and the boot goes
smoothly until just after the video mode changes. At that point, we see this:

scsi6 : pata_atiixp
scsi7 : pata_atiixp
ata7: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xfa00 irq 14
ata8: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xfa08 irq 15
r8169 Gigabit Ethernet Driver 2.3LK-NAPI loaded
r8169 0000:02:00/0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
eth0 RTL8168c/8111c at 0xf80fc000, 6c:f0:49:7a:96:ac, XID 1c4000c0 IRQ 26 r8169
etho: link down
r8169 etho: link down

No root device found

No root device found

Boot has failed, sleeping forever.

We tried rebuilding the client images. The i386 clients are and were using Fedora
12. The x86_64 clients are and were using Fedora 13.  We tried going back to older
kernels for each of these.  There was no change that could be identified as fixing
things, but at some point one or two clients managed to boot.  Then, no more.

Something simple is messed up, but we're not sure what.  We've looked at...

- ltsp dhcpd.conf
- lts.conf  -- it's unchanged
- NFS daemon is running
- iptables -- was running, we turned it off
- Lots of other stuff.

We believe the static IP assignments are working as we see no more warning messages
in the log.  The clients are getting both a vmlinuz.ltsp and the initial ramdisk
just fine. Things look fine on the server, but no clients connect.  The network
driver clearly works long enough to download these items, but then we see the "link
down" message.

We were hesitant to reboot, as some of the clients were still working. After
doing so, the ones that were up but logged out were still able to log in.
Unfortunately, most of the clients can't connect.



