Header & Load Commands

Inspectable via llvm-objdump --macho --private-headers.

The load commands below are grouped by their functionality. This is not a comprehensive list of all possible load commands; the focus is on the ones that are typically present in modern userspace Mach-O binaries.

VM Layout

LC_SEGMENT[_64]

Instructs the kernel to map regions of the file into memory at some given address.

For position-independent code, an offset (AKA “slide”) can be added to this address, for the purpose of ASLR.

The first segment in a binary is always __PAGEZERO, which has all RWX permissions turned off (maxprot == initprot == 0). This ensures that any NULL pointer dereference will fault. On 64-bit platforms, this segment is 4GB in size. This is likely done to catch bugs caused by 64-bit pointers getting truncated to 32 bits.

The next segment is typically __TEXT. The header and load commands themselves are always mapped in at the beginning of this segment. When the kernel hands off execution to userspace, dyld will look for them there.

Segments may be further subdivided into sections.

Dylib Loading

LC_ID_DYLIB

Only present in dylibs. Set via ld -install_name or install_name_tool -id. This roughly corresponds to the file path of the dylib, but we have to account for expansion of @-prefixes, which we will discuss below.

The linker will use this path when generating LC_LOAD_DYLIB commands in the binaries that want to load a given dylib.

LC_LOAD_DYLIB / LC_LOAD_WEAK_DYLIB

Looks for a dylib at a given path and loads it. The difference between the two load commands is that LC_LOAD_WEAK_DYLIB will not cause a load failure if the dylib is missing, and all its imported symbols are treated as weak imports, i.e. they may be absent from the library and just point to NULL.

There are several special prefixes that can be used in the dylib path:

  • @executable_path/ - expands to the directory containing the main executable.

  • @loader_path/ - expands to the directory containing the binary doing the loading. For executables, this is identical to @executable_path/. But for a LC_LOAD_DYLIB in some libfoo.dylib loaded by some executable bar, this will expand to the directory containing libfoo.dylib and not to the directory containing bar.

  • @rpath/ - expands to one of the paths specified by LC_RPATH, in order. See below for details.

LC_RPATH

Adds an entry to the runtime path list. Whenever dyld encounters @rpath/... in an install name, it goes through the list and substitutes each path for the @rpath. You can append to this list via ld -rpath.

LC_REEXPORT_DYLIB

Re-exports all symbols from the specified dylib. Note that it is possible to have LC_REEXPORT_DYLIB for some libfoo without a matching LC_LOAD_DYLIB; that just means the current dylib does not use libfoo’s symbols even though it re-exports libfoo.

Imports and Exports

LC_DYLD_EXPORTS_TRIE

Contains a pointer to symbol export information encoded as a compact trie data structure. This is a more modern alternative to the export info in LC_DYLD_INFO_ONLY. See Exports Trie for details on the format.

Inspectable via llvm-objdump --macho --exports-trie.

LC_DYLD_CHAINED_FIXUPS

TODO

Debug Metadata

LC_SYMTAB

Symbol table. Most symbols refer to addresses within the binary, although there are some special symbols:

  • STABS symbols indicate where debug info can be located. (STABS can be used to encode more general debug info, but the Mach-O format uses DWARF for that.)

  • $ld$ symbols modify linker behavior. For example, $ld$hide$os10.10$_symbol indicates that _symbol should be hidden when targeting macOS 10.10 or earlier.

Inspectable via llvm-objdump --macho --syms and dsymutil -s. The latter is more useful for inspecting STABS.

Note that symbol entries here are for debugging purposes only and may be stripped. Symbols that have to be discoverable will have another entry in the exports trie (which is never stripped).

LC_DYSYMTAB

Despite what its name suggests, this is not another symbol table. It’s more like metadata for the symbols in the table defined by LC_SYMTAB. It indicates which symbols are local / external / undefined. It also records which entries in the GOT correspond to which symbols.

Inspectable via llvm-objdump --macho --indirect-symbols.

LC_UUID

Contains a UUID generated by hashing the contents of the binary. dsymutil uses this to match debug-info-stripped binaries with their unstripped counterparts.

Inspectable via llvm-dwarfdump --uuid.

LC_FUNCTION_STARTS

Contains the addresses of all function entry points in the binary. This is useful for profiling and debugging tools working with stripped binaries.

The addresses are stored as a stream of ULEB128-encoded deltas, with the first entry being the delta between the start of __TEXT and the first function start, the second one being the delta between the first and second function starts, and so on.

Inspectable via llvm-objdump --macho --function-starts.

LC_DATA_IN_CODE

Points to ranges of code that contain non-instruction data, e.g. jump table offsets. This tells disassemblers not to interpret those bytes as opcodes.

Inspectable via llvm-objdump --macho --data-in-code.

Miscellaneous Load Commands

LC_MAIN

Entry point. Note that this is an offset from the header, not an absolute address.

LC_LOAD_DYLINKER

Path to the dynamic linker (dyld). Note that modern versions of macOS will only execute binaries that use the system loader at /usr/lib/dyld, so this load command has only one valid value. Why they have not removed it entirely is a mystery to me.

LC_BUILD_VERSION

LC_MIN_VERSION_* are used when targeting older platforms; newer ones use LC_BUILD_VERSION.

LC_ENCRYPTION_INFO[_64]

Offsets indicating the range of __TEXT that doesn’t contain load commands and can therefore be encrypted. Used by Apple’s FairPlay App DRM to create encrypted binaries that will get decrypted by the kernel on-the-fly.

LC_CODE_SIGNATURE

Data for code signing, typically located at the very end of the file. Others have already covered the format in detail, so I won’t go into it here. You may wish to consult these links instead:

LC_LINKER_OPTIMIZATION_HINT

Hints to allow the linker to rewrite certain instruction pairs with more efficient alternatives if the symbol they reference is nearby, such that its offset can be encoded in fewer bits. This is typically used when targeting ARM.

See Relocations for more details.

See Also