Relocations =========== Relocations are only present in object files (i.e. not executables, dylibs, etc). They tell the linker that certain parts of the binary are pointers that the link step needs to handle. They are stored as per-section lists of the following struct:: struct relocation_info { int32_t r_address; // offset in the section to what is being relocated uint32_t r_symbolnum:24, // symbol index or section ordinal r_pcrel:1, // whether the relocation is PC-relative r_length:2, // log2 size of the item to be relocated r_extern:1, // whether r_symbolnum refers to a symbol or a section r_type:4; // target-specific relocation type }; ``r_extern`` indicates how we should interpret ``r_symbolnum``. For extern relocations, ``r_symbolnum`` is an index into the file's symbol table; for local relocations, it instead stores the ordinal of a section in the object file. One "missing" piece of information is the addend, AKA the offset that should be added to the target symbol or section's address. Mach-O encodes the addend directly in the instruction stream instead of within the relocation info struct. That is, we must read the bytes at ``r_address`` to find it. If the linker can determine the final address of the target at link time, it will update the bytes at ``r_address`` accordingly. If not, it will emit instructions to tell ``dyld`` how to resolve these addresses at load time. See :doc:`dynamic-binding` for details. Either way, the relocation entry itself is elided from the final binary. Relocations are inspectable via ``llvm-readobj --relocations --expand-relocs``. Below are the semantics of the relocation ``r_type`` values for x86-64 and ARM64, as well as assembly snippets that show what inputs would cause the assembler to emit these relocations. X86_64 ------ X86_64_RELOC_UNSIGNED ^^^^^^^^^^^^^^^^^^^^^ Resolves to an absolute address. Naturally, ``r_pcrel`` is always false here. The relocation is "unsigned" since addresses cannot be negative. Example:: .data .quad _foo # Absolute address of _foo X86_64_RELOC_SIGNED ^^^^^^^^^^^^^^^^^^^ Resolves to an address offset, relative to the current instruction pointer (``%rip``). Example:: leaq _foo(%rip), %rax # Load address of _foo into %rax X86_64_RELOC_BRANCH ^^^^^^^^^^^^^^^^^^^ References a function. For statically linked functions, this resolves to an address offset relative to the current instruction pointer (``%rip``). For dynamically-linked functions, this resolves to an entry in the stubs section (AKA what ELF calls the Procedure Linkage Table, or PLT.) Example:: callq _foo # Call function _foo X86_64_RELOC_GOT_LOAD ^^^^^^^^^^^^^^^^^^^^^ Resolves to an address within the Global Offset Table. Only used with ``mov`` opcodes that reference symbols which may be dynamically loaded (i.e. live in a dylib). If the symbol ends up being statically linked, we don't need to go through the GOT, and can instead reference the symbol directly by turning the ``mov`` into a ``lea`` opcode. Example:: movq _foo@GOTPCREL(%rip), %rax # Load address of _foo from GOT If ``_foo`` ends up being statically linked, the above can be optimized to:: leaq _foo(%rip), %rax # Load address of _foo directly X86_64_RELOC_GOT ^^^^^^^^^^^^^^^^ Resolves to an address within the Global Offset Table. Used for all non-``mov`` opcodes. No optimization can be done even if the symbol ends up being statically linked. Example:: pushq _foo@GOTPCREL(%rip) # Push address of _foo from GOT X86_64_RELOC_TLV ^^^^^^^^^^^^^^^^ References a thread-local variable. Resolves to an address within the ``__thread_ptrs`` section, which, like the GOT, is an array of address values. Example:: movq _tlv_var@TLVP(%rip), %rdi # Load TLV descriptor address callq *(%rdi) # Call TLV getter function X86_64_RELOC_SUBTRACTOR ^^^^^^^^^^^^^^^^^^^^^^^ Used to encode the difference between two symbol addresses. Example:: .quad _foo - _bar In order to encode the two separate referents (``_foo`` and ``_bar``), the assembler will emit a pair of relocations: an ``UNSIGNED`` one whose ``r_symbolnum`` points at ``_foo``, and a ``SUBTRACTOR`` one whose ``r_symbolnum`` points at ``_bar``. They will both point to the same ``r_address``. ARM64 ------ ???