Re: Fast thread-local storage for OpenGL drivers

On Sun, Feb 23, 2003 at 06:44:10PM -0800, Gareth Hughes wrote:
> > In fact, we put this feature there with GL in mind...
> Did you inform the OpenGL vendors who were interested in this issue of this
> fact?  Have you documented it anywhere, particularly in Ulrich Drepper's
> "ELF Handling For Thread-Local Storage" document?  The current version of
> this document clearly states that the Local Exec TLS model "can only be used
> for code in the executable itself and to access variables in the executable
> itself".  Perhaps you can see why I was still under the impression that it
> would not work for a dynamically loadable shared library.

I believe all this was said during the huge OpenGL thread in May 2002.
Certainly the idea to support dlopening of limited number of IE/LE model
using libs came at that time.

For the dispatch tables I even remember suggesting to:
a) do the normal "awx" section entries with LE model, ie.
.section openGL_wtext, "awx"
.globl Foo
   movl %gs:__gl_dispatch ntpoff, %eax
   jmpl *offset_Foo(%eax)
b) in addition to that, you can build an .a library with the above 5 lines
   per .o file's source plus .hidden Foo which would make apps/libraries
   using openGL even faster (as they wouldn't hop through PLT, which is
   one memory load and indirect jump through the loaded value) at the expense
   of making offset_Foo part of the openGL ABI (which as far as I understood
   already is anyway because of the binary modules).
c) or you could inline the calls

In the May thread, I'm pretty sure you mentioned __indirect* routines
which are the biggest part of libGL.so are rarely used, which means the
definitely should be compiled with -fpic, the rest if it is really
performance critical can be put into awx sections using


