coreml-cli

A command-line tool to profile CoreML models — showing per-operation compute device assignments (CPU/GPU/ANE), compilation time, and prediction latency across all MLComputeUnits configurations.

Replicates what Xcode's CoreML Performance Report does, but from the terminal and designed for programmatic use by coding agents.

Example

$ coreml-cli test_models/160ms/

Device:  Apple M4 Pro (arm64)
OS:      macOS 26.3.1

── decoder ────────────────────────────────────────────────────────────────────
  Parakeet EOU decoder (RNNT prediction network) (Fluid Inference)
  Mixed (Float16, Float32, Int16, Int32) | torch==2.4.0 | coremltools 8.3.0
  inputs:  targets(Int32 1×1), target_length(Int32 1), h_in(Float32 1×1×640),
           c_in(Float32 1×1×640)
  outputs: decoder(Float32 1×640×1), h_out(Float32 1×1×640),
           c_out(Float32 1×1×640)

  Compute Unit                 CPU    GPU    ANE   Cold Compile   Warm Compile   Predict
  ────────────────────────────────────────────────────────────────────────────────────
  all                       100.0%   0.0%   0.0%           28ms            6ms    0.22ms
  cpu_only                  100.0%   0.0%   0.0%           29ms            6ms    0.22ms
  cpu_and_gpu               100.0%   0.0%   0.0%           31ms            5ms    0.22ms
  cpu_and_neural_engine     100.0%   0.0%   0.0%           29ms            5ms    0.23ms

── streaming_encoder ──────────────────────────────────────────────────────────
  Mixed (Float16, Float32, Int32) | torch==2.4.0 | coremltools 8.3.0
  ...

  Compute Unit                 CPU    GPU    ANE   Cold Compile   Warm Compile   Predict
  ────────────────────────────────────────────────────────────────────────────────────
  all                         0.0% 100.0%   0.0%          874ms           42ms    6.79ms
  cpu_only                  100.0%   0.0%   0.0%          381ms           43ms    4.83ms
  cpu_and_gpu                 0.0% 100.0%   0.0%          466ms           42ms    6.71ms
  cpu_and_neural_engine       1.2%   0.0%  98.8%         7249ms           46ms    2.81ms

Install

Requires macOS 14+ and uv.

git clone https://github.com/yourusername/coreml-cli
cd coreml-cli
uv sync

Usage

# Profile a single model (all compute unit configs)
uv run coreml-cli model.mlmodelc

# Profile all models in a directory
uv run coreml-cli path/to/models/

# Specific compute unit config
uv run coreml-cli model.mlmodelc --units cpu_and_neural_engine

# JSON output (for programmatic use)
uv run coreml-cli model.mlmodelc --json

# Include per-operation breakdown
uv run coreml-cli model.mlmodelc --ops

# Per-op data with private API details (backend support, estimated runtimes)
uv run coreml-cli model.mlmodelc --detailed

# ANE fallback analysis — show CPU ops grouped by rejection reason
uv run coreml-cli model.mlmodelc --fallback

# Fallback analysis as JSON (for agent consumption)
uv run coreml-cli model.mlmodelc --fallback --json

# Control benchmark iterations
uv run coreml-cli model.mlmodelc --iterations 50

# Debug logging to stderr
uv run coreml-cli model.mlmodelc --debug

What it reports

Benchmark mode (default)

For each model and compute unit configuration (all, cpu_only, cpu_and_gpu, cpu_and_neural_engine):

Device assignment — % of operations on CPU, GPU, and ANE (Neural Engine)
Cold compile time — first-ever load with no cached compilation (CoreML cache cleared). Reflects what the user experiences the first time the model runs on their device — if this is too high, the model may not be usable.
Warm compile time — load time with cached compilation. This is the cost paid on every app launch after the first.
Predict latency — median prediction time (5 warmup + 10 timed iterations)
Model metadata — precision, I/O shapes, author, description, coremltools version
Per-op breakdown (--ops) — each operation's name, type, assigned device, and cost weight
Private API data (--detailed) — selected backend, all supported backends, estimated runtime per backend, validation messages explaining why backends were rejected

Fallback analysis mode (`--fallback`)

Shows only ops that are not on ANE, grouped by rejection reason. Designed for the ANE optimization loop: change conversion → reconvert → --fallback → identify blockers → fix → repeat.

For each CPU-fallback op, reports:

Why ANE rejected it — e.g., "Unsupported tensor data type: int32", "Unsupported MIL operation"
How many ops — grouped by rejection reason with op type counts
Estimated CPU cost — how much latency the fallback adds
Which ops — names for tracing back to the conversion script

Common ANE rejection reasons and fixes:

Unsupported tensor data type: int32 — cast to float16 before these operations
Unsupported MIL operation "lstm" — decompose into supported ops (matmul, sigmoid, tanh)
Unsupported MIL operation "logical_and" — replace with float multiply workaround
Unable to resolve operation input — cascading from another CPU op; fix the upstream op first
ANE supported but scheduler chose CPU — data transfer overhead; often not worth fixing

How it works

Uses PyObjC to call macOS CoreML framework APIs directly from Python:

Public API — MLComputePlan (macOS 14+) for per-operation device assignment and cost weights
Private API — MLE5Engine.segmentationAnalyticsAndReturnError: for richer data including backend support matrices and estimated runtimes per backend

Heavily inspired by:

maderix/ANE — reverse-engineered private _ANEClient/_ANECompiler APIs for direct Neural Engine access. Their runtime introspection approach (objc_msgSend, NSClassFromString) informed how we navigate CoreML's internal object graph.
freedomtan/coreml_modelc_profling — per-operation profiling using both public MLComputePlan and undocumented MLE5Engine APIs. Their Objective-C implementation was the direct reference for our private profiler.

Caveats

Note that this was a weekend project, built with Claude Code.

Hardware-specific — compute plans and compilation are tied to the local chip. Results on an M4 Pro will differ from an M1 or A17 Pro.
Private APIs may break — the MLE5Engine path (--detailed) uses undocumented APIs that may change across macOS versions.
macOS 26 tested — CoreML enum values changed in macOS 26 (Tahoe). The tool uses framework constants to stay portable, but has only been tested on macOS 26.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coreml-cli

Example

Install

Usage

What it reports

Benchmark mode (default)

Fallback analysis mode (`--fallback`)

How it works

Caveats

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

coreml-cli

Example

Install

Usage

What it reports

Benchmark mode (default)

Fallback analysis mode (--fallback)

How it works

Caveats

License

Fallback analysis mode (`--fallback`)