Header & Load Commands¶
Inspectable via llvm-objdump --macho --private-headers.
The load commands below are grouped by their functionality. This is not a comprehensive list of all possible load commands; the focus is on the ones that are typically present in modern userspace Mach-O binaries.
Header¶
The most interesting field in the header is probably flags. Possible
flag values can be found here. dyld will check these
flags and change its behavior; e.g. it won’t process weak bindings unless
MH_BINDS_TO_WEAK is set.
The header is fixed-size. Its ncmds and sizeofcmds fields indicate the
number of load commands that follow it. The load commands themselves have a
variety of sizes, but they all have a 32-bit tag as their first field. The size
of the command is determined by the tag. These load commands usually contain
offsets to further variable-sized data. All that data is stored in the
__LINKEDIT segment, which is always the last segment in the binary.
VM Layout¶
LC_SEGMENT[_64]¶
Instructs the kernel to map regions of the file into memory at some given address.
For position-independent code, an offset (AKA “slide”) can be added to this address, for the purpose of ASLR.
The first segment in a binary is always __PAGEZERO, which has all RWX
permissions turned off (maxprot == initprot == 0). This ensures that any
NULL pointer dereference will fault. On 64-bit platforms, this segment is 4GB
in size. This is likely done to catch bugs caused by 64-bit pointers getting
truncated to 32 bits.
The next segment is typically __TEXT. The header and load commands
themselves are always mapped in at the beginning of this segment. When the
kernel hands off execution to userspace, dyld will look for them there.
Segments may be further subdivided into sections.
Dylib Loading¶
LC_ID_DYLIB¶
Only present in dylibs. Set via ld -install_name or install_name_tool
-id. This roughly corresponds to the file path of the dylib, but we have to
account for expansion of @-prefixes, which we will discuss below.
The linker will use this path when generating LC_LOAD_DYLIB commands in the
binaries that want to load a given dylib.
LC_LOAD_DYLIB / LC_LOAD_WEAK_DYLIB¶
Looks for a dylib at a given path and loads it. The difference between the two
load commands is that LC_LOAD_WEAK_DYLIB will not cause a load failure if
the dylib is missing, and all its imported symbols are treated as weak imports,
i.e. they may be absent from the library and just point to NULL.
There are several special prefixes that can be used in the dylib path:
@executable_path/- expands to the directory containing the main executable.@loader_path/- expands to the directory containing the binary doing the loading. For executables, this is identical to@executable_path/. But for aLC_LOAD_DYLIBin somelibfoo.dylibloaded by some executablebar, this will expand to the directory containinglibfoo.dyliband not to the directory containingbar.@rpath/- expands to one of the paths specified byLC_RPATH, in order. See below for details.
LC_RPATH¶
Adds an entry to the runtime path list. Whenever dyld encounters
@rpath/... in an install name, it goes through the list and substitutes
each path for the @rpath. You can append to this list via ld -rpath.
LC_REEXPORT_DYLIB¶
Re-exports all symbols from the specified dylib. Note that it is possible to
have LC_REEXPORT_DYLIB for some libfoo without a matching
LC_LOAD_DYLIB; that just means the current dylib does not use libfoo’s
symbols even though it re-exports libfoo.
Imports and Exports¶
LC_DYLD_EXPORTS_TRIE¶
Contains a pointer to symbol export information encoded as a compact trie data
structure. This is a more modern alternative to the export info in
LC_DYLD_INFO_ONLY. See Exports Trie for details on the format.
Inspectable via llvm-objdump --macho --exports-trie.
LC_DYLD_CHAINED_FIXUPS¶
TODO
Debug Metadata¶
LC_SYMTAB¶
Symbol table. Most symbols refer to addresses within the binary, although there are some special symbols:
STABS symbols indicate where debug info can be located. (STABS can be used to encode more general debug info, but the Mach-O format uses DWARF for that.)
$ld$symbols modify linker behavior. For example,$ld$hide$os10.10$_symbolindicates that_symbolshould be hidden when targeting macOS 10.10 or earlier.
Inspectable via llvm-objdump --macho --syms and dsymutil -s. The latter
is more useful for inspecting STABS.
Note that symbol entries here are for debugging purposes only and may be stripped. Symbols that have to be discoverable will have another entry in the exports trie (which is never stripped).
LC_DYSYMTAB¶
Despite what its name suggests, this is not another symbol table. It’s more like metadata for the symbols in the table defined by LC_SYMTAB. It indicates which symbols are local / external / undefined. It also records which entries in the GOT correspond to which symbols.
Inspectable via llvm-objdump --macho --indirect-symbols.
LC_UUID¶
Contains a UUID generated by hashing the contents of the binary. dsymutil
uses this to match debug-info-stripped binaries with their unstripped
counterparts.
Inspectable via llvm-dwarfdump --uuid.
LC_FUNCTION_STARTS¶
Contains the addresses of all function entry points in the binary. This is useful for profiling and debugging tools working with stripped binaries.
The addresses are stored as a stream of ULEB128-encoded deltas, with the first
entry being the delta between the start of __TEXT and the first function
start, the second one being the delta between the first and second function
starts, and so on.
Inspectable via llvm-objdump --macho --function-starts.
LC_DATA_IN_CODE¶
Points to ranges of code that contain non-instruction data, e.g. jump table offsets. This tells disassemblers not to interpret those bytes as opcodes.
Inspectable via llvm-objdump --macho --data-in-code.
Miscellaneous Load Commands¶
LC_MAIN¶
Entry point. Note that this is an offset from the header, not an absolute address.
LC_LOAD_DYLINKER¶
Path to the dynamic linker (dyld). Note that modern versions of macOS will only
execute binaries that use the system loader at /usr/lib/dyld, so this load
command has only one valid value. Why they have not removed it entirely is a
mystery to me.
LC_BUILD_VERSION¶
LC_MIN_VERSION_* are used when targeting older platforms; newer ones use
LC_BUILD_VERSION.
LC_ENCRYPTION_INFO[_64]¶
Offsets indicating the range of __TEXT that doesn’t contain load commands
and can therefore be encrypted. Used by Apple’s FairPlay App DRM to create
encrypted binaries that will get decrypted by the kernel on-the-fly.
LC_CODE_SIGNATURE¶
Data for code signing, typically located at the very end of the file. Others have already covered the format in detail, so I won’t go into it here. You may wish to consult these links instead:
LC_LINKER_OPTIMIZATION_HINT¶
Hints to allow the linker to rewrite certain instruction pairs with more efficient alternatives if the symbol they reference is nearby, such that its offset can be encoded in fewer bits. This is typically used when targeting ARM.
See Relocations for more details.
See Also¶
Deprecated Load Commands - Load commands no longer used on modern platforms