Relocations¶
Relocations are only present in object files (i.e. not executables, dylibs, etc). They tell the linker that certain parts of the binary are pointers that the link step needs to handle. They are stored as per-section lists of the following struct:
struct relocation_info {
int32_t r_address; // offset in the section to what is being relocated
uint32_t r_symbolnum:24, // symbol index or section ordinal
r_pcrel:1, // whether the relocation is PC-relative
r_length:2, // log2 size of the item to be relocated
r_extern:1, // whether r_symbolnum refers to a symbol or a section
r_type:4; // target-specific relocation type
};
r_extern indicates how we should interpret r_symbolnum. For extern
relocations, r_symbolnum is an index into the file’s symbol table; for
local relocations, it instead stores the ordinal of a section in the object
file.
One “missing” piece of information is the addend, AKA the offset that should be
added to the target symbol or section’s address. Mach-O encodes the addend
directly in the instruction stream instead of within the relocation info
struct. That is, we must read the bytes at r_address to find it.
If the linker can determine the final address of the target at link time, it
will update the bytes at r_address accordingly. If not, it will emit
instructions to tell dyld how to resolve these addresses at load time. See
Dynamic Binding for details. Either way, the relocation entry itself is
elided from the final binary.
Relocations are inspectable via llvm-readobj --relocations --expand-relocs.
Below are the semantics of the relocation r_type values for x86-64 and
ARM64, as well as assembly snippets that show what inputs would cause the
assembler to emit these relocations.
X86_64¶
X86_64_RELOC_UNSIGNED¶
Resolves to an absolute address. Naturally, r_pcrel is always false here.
The relocation is “unsigned” since addresses cannot be negative.
Example:
.data
.quad _foo # Absolute address of _foo
X86_64_RELOC_SIGNED¶
Resolves to an address offset, relative to the current instruction pointer
(%rip).
Example:
leaq _foo(%rip), %rax # Load address of _foo into %rax
X86_64_RELOC_BRANCH¶
References a function. For statically linked functions, this resolves to an
address offset relative to the current instruction pointer (%rip).
For dynamically-linked functions, this resolves to an entry in the stubs section (AKA what ELF calls the Procedure Linkage Table, or PLT.)
Example:
callq _foo # Call function _foo
X86_64_RELOC_GOT_LOAD¶
Resolves to an address within the Global Offset Table. Only used with mov
opcodes that reference symbols which may be dynamically loaded (i.e. live in a
dylib).
If the symbol ends up being statically linked, we don’t need to go through the
GOT, and can instead reference the symbol directly by turning the mov into a
lea opcode.
Example:
movq _foo@GOTPCREL(%rip), %rax # Load address of _foo from GOT
If _foo ends up being statically linked, the above can be optimized to:
leaq _foo(%rip), %rax # Load address of _foo directly
X86_64_RELOC_GOT¶
Resolves to an address within the Global Offset Table. Used for all non-mov
opcodes. No optimization can be done even if the symbol ends up being statically
linked.
Example:
pushq _foo@GOTPCREL(%rip) # Push address of _foo from GOT
X86_64_RELOC_TLV¶
References a thread-local variable. Resolves to an address within the
__thread_ptrs section, which, like the GOT, is an array of address values.
Example:
movq _tlv_var@TLVP(%rip), %rdi # Load TLV descriptor address
callq *(%rdi) # Call TLV getter function
X86_64_RELOC_SUBTRACTOR¶
Used to encode the difference between two symbol addresses.
Example:
.quad _foo - _bar
In order to encode the two separate referents (_foo and _bar), the
assembler will emit a pair of relocations: an UNSIGNED one whose
r_symbolnum points at _foo, and a SUBTRACTOR one whose
r_symbolnum points at _bar. They will both point to the same
r_address.
ARM64¶
???