I was rummaging through some generated x86-64 assembly compiled by Clang and Intel's C compiler (ICC), and I noticed something strange. In Clang generated code, there are a large number of loads from a _DYNAMIC section.
The assembly didn't bother me, so much as the comment. What is the _DYNAMIC section doing? I've never seen anything like it before and it made me wonder. This was especially confusing because these instructions didn't seem to occur with the code compiled by ICC. What is that Dynamic section?
At first, I thought it was simply a case of position independent code (PIC) because the load was offset from the instruction pointer (%rip). I don't remember explicitly enabling position independent code, but I thought maybe Clang compiles PIC code by default. But something else struck me as odd. The ICC generated code had loads relative from %rip as well, but no comment. Maybe I just forgot to build the binary with debug information and that would add the _DYNAMIC comment?
I recompiled the binary with debug information with ICC but that didn't do it either. I knew that there was probably some issue with Thread Local Storage because I've experienced it before. I set out to discover what it was. Let's have some simple C code with a thread local variable and see what happens.
Clang generates the following assembly code::
Here, instruction #2 is the most important. This is a thread local variable access. The generated assembly uses the initial-exec storage model, which is another blog post. Instruction #5 is the call to printf. Here the assembly makes sense. We just load the thread local variable and pass it as a parameter into printf. (X86-64 passes parameters in registers). But there's still no load from _DYNAMIC! What's going on? I read more about the DYNAMIC section in ELF files , trying to figure out where the thread local variable is allocated. Let's take a look at the DYNAMIC section in the object file:
Here the dynamic section refers to elements that are required during dynamic linking. However, looking at this, we see some sections in ELF describing what's needed at runtime. We need libc for printf. We also see a symbol table (Item 10), but still nothing here that tells us how the thread local variable is accessed. In addition, the generated code doesn't have a comment indicating a load from a DYNAMIC section. But this gave me an idea, what if we made the thread local variable extern so the linker would have to resolve the address later? And since it should be in the dynamic section, let's make the code a shared library. Let's write some code:
Alright, finally! We have a reference to the DYNAMIC section again! Why did this happen? Compilers can optimize Thread Local Storage using different thread local storage models. One optimization is if the compiler can determine that the thread local variable is only referenced within the executable, a level of indirection can be optimized away, hence the lack of a _DYNAMIC access. By making the binary a shared library, we prevented the compiler from performing this optimization. So now that we finally have a reference to _DYNAMIC, what is it actually referencing? We see something interesting about instruction #3, a call to tls_get_addr@plt.
This is one method of accessing thread local storage . The code calls a function that retrieves the address of the local variable through the Procedure Linking Table (PLT). How the PLT works is another long blog post. Essentially, thread local access uses a table to patch up relocation points to access variables. Code accesses the variable through this table. Then I thought, oh let's take a look at the dynamic relocation entries in the object file:
Whew, at least now I know what the _DYNAMIC section is for. If refers to the dynamic relocation section in an ELF file, not the dynamic shared library section. Another blog post will have to be written to describe how relocation works and each thread local variable access optimization strategies.