[K12OSN] More feedback on Fedora 10 + LTSP
James P. Kinney III
jkinney at localnetsolutions.com
Tue Apr 21 20:54:09 UTC 2009
The basic applications are not changed much between the two versions of
OS you are running.
The KERNEL is radically different, however. The CentOS version has a
totally different scheduler than the F10 version. Plus a bazillion other
changes that have a huge effect on LTSP kernel usage. Due to sound
changes, it may be challenge to get the CentOS kernel src.rpm and
compile it for the F10 OS environment but it is certainly worth a try.
Also, sadly, there is not a non-kernel-hackers version of tuning params
that I have found. lwn.net should be consulted for details on how each
kernel has evolved as well as distro kernel specs (yes: read the spec
files for each kernel src.rpm to see what patches were
backported/migrated/merged/etc as sometimes those cause problems).
Note also that CentOS kernels are by default tuned more for big
installation stuff than is the Fedora kernel.
Again, I strongly suspect you are having a serious kernel issue and
recommend compiling your own.
The bridge interface is also a serious slowdown hog. It doesn't seem to
support gigabit traffic at all. More like 230-400 Mbps. So the idea of
ditching the bridge and going straight, physical nic is the best,
fastest route to a speed up. Dig on the Brandon elementary pta site for
a write up by William Fragakis on how to turn of the bridge process.
Lastly: you mentioned hyperthreading cpu's. If they are _really_
hyperthreaded and not full multi-core, turn _off_ the hyperthreading.
The F10 kernel threading will thrash royally doing stupid task swapping
as it seems to not understand that the second cpu is a fake one. There
is a kernel /proc flag to help this but I can't find it right now
Also be sure to turn off EVERY process not actively used. Some are more
of a drain on cpu throughput (avahi, a zillion python applets running
desktop applets for notification - system monitor should be removed!! so
students won't load it up - it's a HOG)
On Tue, 2009-04-21 at 13:23 -0400, David Hopkins wrote:
> Let me first say that this is going to sound like a rant in places.
> Not much I can do about that, but ... FC10+LTSP5 has not been
> performing well at all. I am currently at a loss to explain why.
> However, since I have to have sound working, LTSP5 seems to be the
> only way to ensure that sound works correctly. I have CentOS+LTSP4.2
> and that works well for everything except sound. So, the only option I
> see is to get a distribution that is using LTSP5 working. Again, just
> to be clear, I am using identical hardware for the comparison and
> using the same login accounts, same file server, same dns, same
> authentication server, etc. All hardware is 32bit, both server and
> clients. (I don't even want to deal/worry with the 64bit server/32bit
> client possible issues at the moment).
> Now, here is what I and the elementary school tech teacher observed
> today. The following is her write-up.
> "Things did not go so well this morning. When all 10 computers were
> in use at the same time, the delay between mouse and screen was
> significant. . . The point of the lesson was to improve mouse
> skills--not possible when there is a lag between their mouse movements
> and action on their monitors. We muddled through the first group of
> 10 students, and when the 2nd group began the exercise, I allowed the
> first 10 students to open Tux Paint. I thought because Tux Paint is
> running local, this would work. Big Mistake! The delay for everyone
> increased dramatically, making it virtually impossible to complete the
> mouse task in Starfall. When I tried to "QUIT" foxfire on 3
> computers, it took 3-4 minutes to return to the desktop. Although I
> was circulating the room, trying to assist students, I glanced at the
> load several times--I never saw it rise above 6. It mostly hovered
> between 4 and 5. It took more than 5 minutes to successfully close
> the website from 10 computers. During that time, I had 10 students
> just waiting.
> When my second class arrived, I did not even try to use the website.
> We used Tux Paint today. However, shortly after we got started, I
> "banned" students from selecting a new piece of paper . . . The few
> who had tried feature had their monitor hung-up for more than a
> minute. That task used to respond immediately. There is also a
> terrific feature that allows students to select any color from the
> rainbow . . . but choosing that feature takes more than 1 full minute
> to accomplish."
> This is on a system where with CentOS+LTSP4.2 I could run 25 systems
> simultaneously without issues. She was trying to use 10.
> Notice that the load average never exceeded 6. This is dual
> hyperthreaded Xeon so a load average of 4 would mean 100% utilization
> although that is a bit misleading as load averages of 6-8 perform
> quite well on all my other systems. Also, the system was never using
> swap. In fact, memory usage never exceeded 5GB.
> So, where is the bottleneck? The starfall activity is flash-based (it
> was the Earth Day activity). I know that FF3+flash is going to load
> the system. But, This issue is not as severe with FF2+Flash 9 except
> that you don't get sound half the time. FF3+Flash10 seems to really
> slow down. Also, it seems that network traffic is significantly
> higher with FC10+LTSP5 using ldm than with gdm. Can I switch back to
> gdm as the default manager or is ldm it? I have the LDM_DIRECTX set
> to TRUE so that ssh is only used for login/logout. And, login/logout
> now takes 30+ secs compared to about 2 seconds for CentOS+LTSP4.2.
> For the local apps, launching FF3 can take over a minute. And then it
> will be sluggish, even when the local hardware isn't using swap
> I have this suspicion that it is a network bandwidth issue. The only
> difference there is that LTSP5 uses the ltsbr0 bridge setup while
> LTSP4.2 does not. To test this, I should be able to delete the bridge
> and set up LTSP5 in the same dual NIC scenario as with LTSP4.2,
> correct? Though I am not sure I have the skills to do so without
> breaking something else. It might be as easy as deleting the ltsbr0
> entry and then defining the IP address for the currently-slaved NIC to
> be what the ltsbr0 was defined as.
> I haven't had a chance to look at the stats from the switch (Amer.com,
> SS2R24G4i ) but since I never changed the switch, only the OS, I don't
> see why there would suddenly be an issue.
> As for the Tuxpaint issue. That is truly baffling. I have the same
> version of Tuxpaint running on an older server and it is very
> responsive. There is a hardware difference for the server ... the one
> that runs very well has CPU's with only 70% the speed of the newer
> server. The other difference is again CentOS+LTSP4.2 (using gdm) vs
> FC10+LTSP5 (and ldm).
> So, something looks like it 'just isn't right' except I'm not getting
> any disk I/O errors, I'm not getting a huge spike in the load ... the
> system just isn't responsive.
> At this point the teacher has really reached her limit as have I. A
> single login with a single client works fine. Add a few more and I
> get the above. I want LTSP5 to work but I can't stay with it given the
> current performance issues. And I have to start planning now for next
> fall. If upgrading to FC10+LTSP5 means all my current hardware is not
> acceptable, then I have a huge issue. I know that all my current
> hardware works with FC10+LTSP5, but the performance I'm seeing is
> horrible. I have been advocating/using K12LTSP since 2003, I really
> want this to work, but right now to say I am depressed with FC10+LTSP5
> would be an understatement.
> So ... help? I'll be back at the school tonight to try and determine
> what might be happening. And once there, sitting behind the state
> firewalls, access to IRC is blocked as is all other chat capabilities.
> Dave Hopkins
> Newark Charter School
> K12OSN mailing list
> K12OSN at redhat.com
> For more info see <http://www.k12os.org>
James P. Kinney III
CEO & Director of Engineering
Local Net Solutions,LLC
GPG ID: 829C6CA7 James P. Kinney III (M.S. Physics)
<jkinney at localnetsolutions.com>
Fingerprint = 3C9E 6366 54FC A3FE BA4D 0659 6190 ADC3 829C 6CA7
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the K12OSN