[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Fwd: Fast thread-local storage for OpenGL drivers]



> Humor me?  Reading the paper, it appears that for TLS to work in a
> shared library it is necessary to make the ___tls_get_address call to
> get the module id.  It seems like the -ftls-model switch is directly
> relevant here but its documentation in the GCC manual is absolutely
> useless; it lists the alternatives but doesn't explain the implications
> of using the different models (they'd need initial-exec?) in a shared
> library instead of the default.

I think the TLS document intends to explain what the models mean in
practical terms on each architecture, but I can believe it's not all that
clear.  The GCC manual doesn't explain the access models and code
sequences, just tells you how to tell the compiler what you want in the
terms that the TLS document defines.  

If you want maximal flexibility, i.e. to always work with dlopen, then
indeed you must use the "dynamic" TLS access models (GD or LD).  You can
use the Initial Exec model if you want faster accesses at the cost of some
flexibility.

When compiling PIC, IE-model accesses have one additional indirection,
i.e. loading the offset from the GOT just as the address of a global
variable is loaded in PIC.  See the instruction sequences in the TLS spec.

If you use static linking, these instruction sequences reduce to constants
at link time (i.e. direct "%gs:NNN" accesses on x86).

If you link a shared object containing IE-model access relocs, the object
will have the DF_STATIC_TLS flag set.  By the spec, this means that dlopen
might refuse to load it.

In glibc, we actually allocate some excess space in the thread-local
storage area layout determined at startup time.  This lets a dynamically
loaded module use static TLS if its PT_TLS segment fits in the available
surplus.  (In sysdeps/generic/dl-tls.c, see TLS_STATIC_SURPLUS.)  If there
is insufficient space preallocated, then loading the module will fail.  In
fact, we put this feature there with GL in mind and can adjust the
preallocated surplus for what is most useful in practice.

In GCC, you can select the access model of code generated for __thread
variable uses either by translation unit with -ftls-model=foo or by
individual variable with __attribute__ ((tls_model ("foo")).





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]