Header & Load Commands ====================== .. contents:: :local: Inspectable via ``llvm-objdump --macho --private-headers``. The load commands below are grouped by their functionality. This is *not* a comprehensive list of all possible load commands; the focus is on the ones that are typically present in modern userspace Mach-O binaries. Header ------ The most interesting field in the `header`_ is probably ``flags``. Possible flag values can be found `here `_. ``dyld`` will check these flags and change its behavior; e.g. it won't process weak bindings unless ``MH_BINDS_TO_WEAK`` is set. .. _header: https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/EXTERNAL_HEADERS/mach-o/loader.h#L72-L81 .. _flag_values: https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/EXTERNAL_HEADERS/mach-o/loader.h#L110-L228 The header is fixed-size. Its ``ncmds`` and ``sizeofcmds`` fields indicate the number of load commands that follow it. The load commands themselves have a variety of sizes, but they all have a 32-bit tag as their first field. The size of the command is determined by the tag. These load commands usually contain offsets to further variable-sized data. All that data is stored in the ``__LINKEDIT`` segment, which is always the last segment in the binary. VM Layout --------- LC_SEGMENT[_64] ~~~~~~~~~~~~~~~ Instructs `the kernel `_ to map regions of the file into memory at some given address. For position-independent code, an offset (AKA "slide") can be added to this address, for the purpose of `ASLR `_. The first segment in a binary is always ``__PAGEZERO``, which has all RWX permissions turned off (``maxprot == initprot == 0``). This ensures that any NULL pointer dereference will fault. On 64-bit platforms, this segment is 4GB in size. This is likely done to catch bugs caused by 64-bit pointers getting truncated to 32 bits. The next segment is typically ``__TEXT``. The header and load commands themselves are always mapped in at the beginning of this segment. When the kernel hands off execution to userspace, ``dyld`` will look for them there. Segments may be further subdivided into sections. Dylib Loading ------------- LC_ID_DYLIB ~~~~~~~~~~~ Only present in dylibs. Set via ``ld -install_name`` or ``install_name_tool -id``. This roughly corresponds to the file path of the dylib, but we have to account for expansion of ``@``-prefixes, which we will discuss below. The linker will use this path when generating ``LC_LOAD_DYLIB`` commands in the binaries that want to load a given dylib. LC_LOAD_DYLIB / LC_LOAD_WEAK_DYLIB ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looks for a dylib at a given path and loads it. The difference between the two load commands is that ``LC_LOAD_WEAK_DYLIB`` will not cause a load failure if the dylib is missing, and all its imported symbols are treated as weak imports, i.e. they may be absent from the library and just point to NULL. There are several special prefixes that can be used in the dylib path: * ``@executable_path/`` - expands to the directory containing the main executable. * ``@loader_path/`` - expands to the directory containing the binary doing the loading. For executables, this is identical to ``@executable_path/``. But for a ``LC_LOAD_DYLIB`` in some ``libfoo.dylib`` loaded by some executable ``bar``, this will expand to the directory containing ``libfoo.dylib`` and not to the directory containing ``bar``. * ``@rpath/`` - expands to one of the paths specified by ``LC_RPATH``, in order. See below for details. LC_RPATH ~~~~~~~~ Adds an entry to the runtime path list. Whenever ``dyld`` encounters ``@rpath/...`` in an install name, it goes through the list and substitutes each path for the ``@rpath``. You can append to this list via ``ld -rpath``. LC_REEXPORT_DYLIB ~~~~~~~~~~~~~~~~~ Re-exports all symbols from the specified dylib. Note that it is possible to have ``LC_REEXPORT_DYLIB`` for some libfoo without a matching ``LC_LOAD_DYLIB``; that just means the current dylib does not *use* libfoo's symbols even though it re-exports libfoo. Imports and Exports ------------------- LC_DYLD_EXPORTS_TRIE ~~~~~~~~~~~~~~~~~~~~ Contains a pointer to symbol export information encoded as a compact trie data structure. This is a more modern alternative to the export info in ``LC_DYLD_INFO_ONLY``. See :doc:`exports-trie` for details on the format. Inspectable via ``llvm-objdump --macho --exports-trie``. LC_DYLD_CHAINED_FIXUPS ~~~~~~~~~~~~~~~~~~~~~~ TODO Debug Metadata -------------- LC_SYMTAB ~~~~~~~~~ Symbol table. Most symbols refer to addresses within the binary, although there are some special symbols: * `STABS `_ symbols indicate where debug info can be located. (STABS can be used to encode more general debug info, but the Mach-O format uses DWARF for that.) * ``$ld$`` symbols modify linker behavior. For example, ``$ld$hide$os10.10$_symbol`` indicates that ``_symbol`` should be hidden when targeting macOS 10.10 or earlier. Inspectable via ``llvm-objdump --macho --syms`` and ``dsymutil -s``. The latter is more useful for inspecting STABS. Note that symbol entries here are for debugging purposes only and may be stripped. Symbols that have to be discoverable will have another entry in the exports trie (which is never stripped). LC_DYSYMTAB ~~~~~~~~~~~ Despite what its name suggests, this is not another symbol table. It's more like metadata for the symbols in the table defined by LC_SYMTAB. It indicates which symbols are local / external / undefined. It also records which entries in the GOT correspond to which symbols. Inspectable via ``llvm-objdump --macho --indirect-symbols``. LC_UUID ~~~~~~~ Contains a UUID generated by hashing the contents of the binary. ``dsymutil`` uses this to match debug-info-stripped binaries with their unstripped counterparts. Inspectable via ``llvm-dwarfdump --uuid``. LC_FUNCTION_STARTS ~~~~~~~~~~~~~~~~~~ Contains the addresses of all function entry points in the binary. This is useful for profiling and debugging tools working with stripped binaries. The addresses are stored as a stream of ULEB128-encoded deltas, with the first entry being the delta between the start of ``__TEXT`` and the first function start, the second one being the delta between the first and second function starts, and so on. Inspectable via ``llvm-objdump --macho --function-starts``. LC_DATA_IN_CODE ~~~~~~~~~~~~~~~ Points to ranges of code that contain non-instruction data, e.g. jump table offsets. This tells disassemblers not to interpret those bytes as opcodes. Inspectable via ``llvm-objdump --macho --data-in-code``. Miscellaneous Load Commands --------------------------- LC_MAIN ~~~~~~~ Entry point. Note that this is an offset from the header, not an absolute address. LC_LOAD_DYLINKER ~~~~~~~~~~~~~~~~ Path to the dynamic linker (dyld). Note that modern versions of macOS will only execute binaries that use the system loader at ``/usr/lib/dyld``, so this load command has only one valid value. Why they have not removed it entirely is a mystery to me. LC_BUILD_VERSION ~~~~~~~~~~~~~~~~ ``LC_MIN_VERSION_*`` are used when targeting older platforms; newer ones use ``LC_BUILD_VERSION``. LC_ENCRYPTION_INFO[_64] ~~~~~~~~~~~~~~~~~~~~~~~ Offsets indicating the range of ``__TEXT`` that doesn't contain load commands and can therefore be encrypted. Used by Apple's FairPlay App DRM to create encrypted binaries that will get decrypted by the kernel on-the-fly. LC_CODE_SIGNATURE ~~~~~~~~~~~~~~~~~ Data for code signing, typically located at the very end of the file. Others have already covered the format in detail, so I won't go into it here. You may wish to consult these links instead: * https://alfiecg.uk/2024/01/06/Ad-hoc-signing.html * https://www.mothersruin.com/software/Archaeology/reverse/codesign.html LC_LINKER_OPTIMIZATION_HINT ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hints to allow the linker to rewrite certain instruction pairs with more efficient alternatives if the symbol they reference is nearby, such that its offset can be encoded in fewer bits. This is typically used when targeting ARM. See :doc:`relocations` for more details. See Also -------- * :doc:`deprecated-load-commands` - Load commands no longer used on modern platforms