Exports Trie

Overview

The exports trie is a compact data structure used to encode symbol export information in Mach-O binaries. It’s referenced by the LC_DYLD_EXPORTS_TRIE load command and provides an efficient way for dyld to look up exported symbols by name.

Since the exports trie uses a prefix tree (trie) structure, it allows for significant space savings when many symbols share common prefixes. Given that many languages create symbols with similar prefixes (e.g. due to C++ namespaces, class names etc), this format is often much smaller than traditional symbol tables.

Trie Structure

The exports trie is a serialized prefix trie where each node can represent:

  • An internal node (part of a symbol name prefix)

  • A terminal node (a complete exported symbol name with associated export data)

  • Both of the above!

Node Format

Each node in the trie has the following binary structure:

struct Node {
    uleb128 terminal_size;      // Size of export_data in bytes
    ExportData export_data;     // Present only if terminal_size > 0
    u8 children_count;          // Number of child edges
    Edge edges[children_count];
    Node children[children_count];
};

Edge Format

Each edge connects a parent node to a child node:

struct Edge {
    char* edge_str;         // The null-terminated string fragment for this edge
    child_offset: uleb128   // Offset to child node
};

Important: The child_offset is relative to the start of the entire trie data, not relative to the current node.

Export Data

Export data is only present for terminal nodes (i.e., nodes that represent complete symbol names). The terminal_size field indicates how many bytes of export data follow.

Every exported symbol has a flags field. Depending on the flags, different fields follow.

Regular Export

This is the most common type, and is the default if EXPORT_SYMBOL_FLAGS_REEXPORT and EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER are not set.

For standard exported symbols:

struct ExportData {
    uleb128 flags;
    uleb128 address;  // Symbol address (offset from mach_header)
};

Re-export

For symbols that are re-exported from another dylib:

struct ExportData {
    uleb128 flags;       // Includes EXPORT_SYMBOL_FLAGS_REEXPORT
    uleb128 ordinal;     // Index of the dylib being re-exported from
    string import_name;  // Original symbol name (if different), or empty
};

You can generate a re-export with a non-empty import_name via the -alias flag. E.g.:

ld -lfoo -alias foo_new foo …

will create a re-export of foo from libfoo.dylib under the name foo_new.

Stub and Resolver

For symbols with separate stub and resolver addresses:

struct ExportData {
    uleb128 flags;           // Includes EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER
    uleb128 stub_offset;     // Offset to stub
    uleb128 resolver_offset; // Offset to resolver
};

Other Flags

The remaining flags are fairly self-explanatory:

  • EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION

  • EXPORT_SYMBOL_FLAGS_THREAD_LOCAL

  • EXPORT_SYMBOL_FLAGS_KIND_ABSOLUTE

  • EXPORT_SYMBOL_FLAGS_REGULAR (this is just 0x0)

Example

For a dylib that exports _foo, _foobar, and _bar, the trie might look like this (conceptual structure):

root
├─ "_" → node1
   ├─ "foo" → node2 (terminal: export data for _foo)
   │  └─ "bar" → node3 (terminal: export data for _foobar)
   └─ "bar" → node4 (terminal: export data for _bar)

Notes on Creating the Trie

Reading the trie is pretty straightforward, but creating the trie from a flat list of symbols is a bit more interesting:

  1. Note that the size of the nodes depends on the child offsets they encode, but the size of these offsets depend on how far the nodes are spaced apart. In order to encode the smallest possible trie, we need to provisionally lay out the nodes first, then iterate on the offset encodings until we reach a fixpoint.

  2. A naive trie construction would walk down the partially-built trie for each new symbol, but that means we traverse the prefix nodes repeatedly. LLD instead uses a radix quicksort to avoid this.