An iOS port of MicroArchBench, measuring microarchitectural characteristics of Apple Silicon directly on-device. All benchmarks use hand-written ARM64 assembly to avoid compiler interference, and results are displayed in a native SwiftUI interface.
- Device: iPhone or iPad with Apple Silicon, or an Apple Silicon Mac running iOS Simulator
- OS: iOS 26.3 or later
- Xcode: 16 or later
Intel Mac simulators are not supported — the benchmark kernels are ARM64-only.
| Benchmark | What it measures |
|---|---|
| Frequency | CPU clock speed (GHz), estimated by timing a chain of dependent integer adds |
| Pipeline Width | Peak instruction throughput (ops/cycle) for integer, FP, load, store, and branch execution units separately |
| Instruction Latency | Latency (cycles) and throughput (ops/cycle) for integer multiply/divide and floating-point add, multiply, FMA, divide, and square root |
| Move Elimination | Throughput of various MOV idioms to detect which are eliminated at the rename stage |
| Pointer Chasing | L1 data cache load-to-use latency, measured via a dependent pointer chain |
| Cache Latency | Random-access latency from 4 KB to 512 MB, revealing the L1 / L2 / DRAM hierarchy |
| Read Bandwidth | Sequential read bandwidth (GB/s) from 4 KB to 512 MB |
| Copy 128B / Copy 256B | Memory copy bandwidth from 4 KB to 512 MB, with one and two cache lines per iteration respectively |
| Update Bandwidth | In-place read-modify-write bandwidth from 4 KB to 512 MB |
| LD/ST Forwarding | Store-to-load forwarding latency across all 128x128 byte-offset combinations within a cache line, displayed as a colour heatmap |
xcodebuild -project MicroArchBench-iOS.xcodeproj \
-scheme MicroArchBench-iOS -configuration Release \
-destination 'platform=iOS,id=<DEVICE_UDID>' \
-allowProvisioningUpdates buildThen install:
xcrun devicectl device install app \
--device <DEVICE_UDID> \
--path "$(find ~/Library/Developer/Xcode/DerivedData \
-name 'MicroArchBench-iOS.app' \
-path '*/Release-iphoneos/*' | head -1)"xcodebuild -scheme MicroArchBench-iOS -configuration Debug \
-destination 'platform=iOS Simulator,name=iPhone 16' buildReplace <DEVICE_UDID> with your device's UDID, visible in Xcode's Devices window or via xcrun devicectl list devices.
Select any benchmark from the sidebar, or tap Run All to execute every test in sequence. For benchmarks that report results in cycles (Pipeline Width, Instruction Latency, Move Elimination), run Frequency first — subsequent benchmarks use the measured frequency to convert nanoseconds to cycles. If you skip it, a 3.0 GHz default is used.
The gear icon opens Thread Scheduling settings, which control which QoS class (and therefore CPU cluster) the benchmark runs on.
- All timing uses
CLOCK_MONOTONIC_RAW(not subject to NTP slew). - Each benchmark calls a warmup routine before any measurement to drive the CPU to its maximum sustained frequency.
- Bandwidth benchmarks transfer 16 GB of total traffic per working-set size to ensure stable readings even at DRAM speeds.
- Copy benchmarks use independently randomised buffer offsets to avoid cache-set aliasing artefacts.
- Cache latency uses CRC32C-based pseudo-random address generation to defeat Apple Silicon's data-memory-dependent prefetcher (DMP) and history-based prefetchers.
- Each measurement takes the minimum of multiple runs to filter out OS jitter (interrupts can only add time, never subtract it).
Based on MicroArchBench by JamesAslan. Assembly kernels adapted from the original; file I/O and command-line entry points replaced with a Swift/SwiftUI host layer.