Exports Trie ============ .. contents:: :local: Overview -------- The exports trie is a compact data structure used to encode symbol export information in Mach-O binaries. It's referenced by the ``LC_DYLD_EXPORTS_TRIE`` load command and provides an efficient way for ``dyld`` to look up exported symbols by name. Since the exports trie uses a prefix tree (trie) structure, it allows for significant space savings when many symbols share common prefixes. Given that many languages create symbols with similar prefixes (e.g. due to C++ namespaces, class names etc), this format is often much smaller than traditional symbol tables. Trie Structure -------------- The exports trie is a serialized prefix trie where each node can represent: * An internal node (part of a symbol name prefix) * A terminal node (a complete exported symbol name with associated export data) * Both of the above! Node Format ----------- Each node in the trie has the following binary structure:: struct Node { uleb128 terminal_size; // Size of export_data in bytes ExportData export_data; // Present only if terminal_size > 0 u8 children_count; // Number of child edges Edge edges[children_count]; Node children[children_count]; }; Edge Format ~~~~~~~~~~~ Each edge connects a parent node to a child node:: struct Edge { char* edge_str; // The null-terminated string fragment for this edge child_offset: uleb128 // Offset to child node }; **Important:** The ``child_offset`` is relative to the start of the entire trie data, not relative to the current node. Export Data ----------- Export data is only present for terminal nodes (i.e., nodes that represent complete symbol names). The ``terminal_size`` field indicates how many bytes of export data follow. Every exported symbol has a ``flags`` field. Depending on the flags, different fields follow. Regular Export ~~~~~~~~~~~~~~ This is the most common type, and is the default if ``EXPORT_SYMBOL_FLAGS_REEXPORT`` and ``EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER`` are not set. For standard exported symbols:: struct ExportData { uleb128 flags; uleb128 address; // Symbol address (offset from mach_header) }; Re-export ~~~~~~~~~ For symbols that are re-exported from another dylib:: struct ExportData { uleb128 flags; // Includes EXPORT_SYMBOL_FLAGS_REEXPORT uleb128 ordinal; // Index of the dylib being re-exported from string import_name; // Original symbol name (if different), or empty }; You can generate a re-export with a non-empty ``import_name`` via the ``-alias`` flag. E.g.: ld -lfoo -alias foo_new foo ... will create a re-export of ``foo`` from ``libfoo.dylib`` under the name ``foo_new``. Stub and Resolver ~~~~~~~~~~~~~~~~~ For symbols with separate stub and resolver addresses:: struct ExportData { uleb128 flags; // Includes EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER uleb128 stub_offset; // Offset to stub uleb128 resolver_offset; // Offset to resolver }; Other Flags ~~~~~~~~~~~ The remaining flags are fairly self-explanatory: * ``EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION`` * ``EXPORT_SYMBOL_FLAGS_THREAD_LOCAL`` * ``EXPORT_SYMBOL_FLAGS_KIND_ABSOLUTE`` * ``EXPORT_SYMBOL_FLAGS_REGULAR`` (this is just 0x0) Example ------- For a dylib that exports ``_foo``, ``_foobar``, and ``_bar``, the trie might look like this (conceptual structure):: root ├─ "_" → node1 ├─ "foo" → node2 (terminal: export data for _foo) │ └─ "bar" → node3 (terminal: export data for _foobar) └─ "bar" → node4 (terminal: export data for _bar) Notes on Creating the Trie -------------------------- Reading the trie is pretty straightforward, but creating the trie from a flat list of symbols is a bit more interesting: 1. Note that the size of the nodes depends on the child offsets they encode, but the size of these offsets depend on how far the nodes are spaced apart. In order to encode the smallest possible trie, we need to provisionally lay out the nodes first, then iterate on the offset encodings until we reach a fixpoint. 2. A naive trie construction would walk down the partially-built trie for each new symbol, but that means we traverse the prefix nodes repeatedly. LLD instead uses a `radix quicksort `_ to avoid this.