[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: NGPT



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Perez-Gonzalez, Inaky wrote:

The main discussion point on M:N vs 1:1 is that thread switch is faster in
M:N [switch to other thread's context] than on 1:1 [switch to kernel, select
next, switch to thread]

That's not 100% accurate.


Only thread-to-thread switching is faster. Context switching is at least as fast. And if you want fairness in an m-on-n implementation context switching needs a) kernel support via something like scheduler activations, or b) some means to force user-level context switching. This makes the already heavy user-level scheduler even heavier. (I ignore the fact that in Linux's case the two schedulers are not collaborating and therefore often make decisions which unnecessarily slow down the process.


The point here is in which situations is an m-on-n implementation faster? Only if at the user level it can be recognized that one thread cannot continue and another one should take over. Which situations are these?


1. blocking on I/O. First it must be noticed that an m-on-n implementation without special kernel support (NGPT falls into this category) has to wrap all system which might block like this:

   oldmode = fcntl (fd, F_GETFL, 0);
   fcntl (fd, F_SETFL, oldmode | O_NONBLOCKING);

result = __libc_read (fd, buf, n);

fcntl (fd, F_SETFL, oldmode);

The only alternative to this (without kernel support) is to permanently set the non-blocking bit but this incorrectly changes the semantics and the program might fail or perform incorrectly and the thread library has to create a new data structure for every file descriptor describing the real state of the nonblocking bit since the user code might use nonblocking itself.

Back to blocking. A thread with blocking I/O must therefore perform the above mode switching, enter the kernel, return with EAGAIN. This can be noticed by the thread library which now can initiate storing the register set of the current thread in some background memory, loading the register set of a runnable thread, and start running it.

In addition the library somehow has to keep track of the file descriptor and the desire to read from it. If all threads block the library has to watch all the possible blocking reasons to find out which gets removed. This happens using a select/poll call somewhere in the guts of the thread library. This means we need another system call to wait, and a pretty expensive one. It is far more expensive to wait on a number of file descriptors than it is to wait inside the kernel for a read() to be performed.

If you tally up all these costs alone you will most likely end up with a cost which is already higher than the very fast context switching times in Linux. Context switching between threads in the same process is extremely fast and since the blocked thread is already in the kernel no additional jump to rung 0 is needed.

The scale is tipped over if you also consider the running costs. The above mentioned code sequence and a lot more code like this has to run all the time even if the thread is not going to be blocked since the user-level library doesn't know. It always has to assume that every syscall which can block will block. Programs therefore pay dearly with general performance degregation just because of possible blocking.

Other OSes, like Solaris, used a special kernel mode for a process where the O_NONBLOCKING bit doesn't have to be set. The kernel would perform an upcall to notify the thread library about the blocking. This solves the problem of having to use these three fcntl() calls but ngpt does not use this.


2. blocking on syncronization primitives. If the syncronization primitives are written in a sane way it is possible to determine whether an operation would block or not entirely at user level. The thread library could then switch between threads, still without kernel support. That is indeed faster than a context switch.


But there is a downside. The blocked thread and the sync object has to be kept track of. When the sync object is ready to allow the operation to succeed it is necessary to make the thread available for scheduling. But how to notice that the sync object is ready?

For many sync objects it is possible to attach a list to each of them which represents the waiters. When the sync object gets unlocked the library could mark one of the waiters as runnable. This sounds nice and easy but it isn't since it completely ignores inter-process sync objects (see PTHREAD_PROCESS_SHARED). In this case the unlock operation might happen in another process. It is true that the thread library can recognize sync objects with PTHREAD_PROCESS_SHARED set but the more complicated case nevertheless must be implemented. How?

It would be possible to constantly monitor all the sync objects which are a blocking reason. But who does it? And active scanning (reading memory) is expensive. It's practically busy waiting. With the Linux kernels futex mechanism it is possible to get a file descriptor for the futex which is used to implement the sync object. But then we are back to using a select/poll somewhere.



It already got very long. There are many many more problems a m-on-n implementation has to overcome and each solution adds yet more overhead to the normal operation mode. The cost to processes where the m-on-n features are not used in not zero

And note that we have considered an m-on-n implementation initially and therefore this was thoroughly investigated. The results were clear though: for Linux there are almost no advantage and a huge amount of disadvantages including thread library code which is orders of magnitude larger and more complicated.

When time allows I'll write down our findings but now I have more important things to do. If you want perform some performance tests yorself. In fact, I'd love to get results of such tests.

- -- - ---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


iD8DBQE9i9Tz2ijCOnn/RHQRAkPtAJ9hoWq+Z3+XwuY1RXyYRmH+3k0IAACgqguc
VOtO4PcBimDbHoqNPLLkRSQ=
=zcmm
-----END PGP SIGNATURE-----





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]