dart_mlx_ffi is a Dart and Flutter FFI package for Apple's
MLX C API.
The package vendors mlx, mlx-c, and the native build pieces needed to
compile a local MLX dynamic library for the current Apple target.
- Stable high-level Dart API for arrays, tensor ops, scans, linalg, FFT, quantization, convolutions, streams, runtime helpers, export/import, and custom fast-kernel wrappers
- Full raw binding surface through
package:dart_mlx_ffi/raw.dart - Native build hooks for Apple MLX on
iOSandmacOS - Canonical MLX snapshot preparation through the repository's Unsloth MLX wrapper
- Verified parity against Python MLX on deterministic operator suites
- Publish-time parity coverage for text, VLM, TTS, and ASR checkpoints
iOSmacOS
This package targets Apple platforms only.
MLX is most useful on Apple Silicon with Metal available. If the local Xcode
installation does not contain the MetalToolchain component, the build hook
falls back to CPU-only MLX so the package still compiles.
To install the Metal shader toolchain on the build machine:
xcodebuild -downloadComponent MetalToolchaindart pub add dart_mlx_ffipackage:dart_mlx_ffi/dart_mlx_ffi.dart: stable MLX tensor/runtime APIpackage:dart_mlx_ffi/models.dart: stable Dart model runners shipped by this repositorypackage:dart_mlx_ffi/raw.dart: generated low-levelmlx-cbindings
import 'package:dart_mlx_ffi/dart_mlx_ffi.dart';
final a = MlxArray.fromFloat32List([1, 2, 3, 4], shape: [2, 2]);
final b = MlxArray.fromFloat32List([5, 6, 7, 8], shape: [2, 2]);
final c = mx.matmul(a, b);
final s = c.sum();
print(MlxVersion.current());
print(MlxDevice.defaultDevice());
print(c.toList());
print(s.toList());
s.close();
c.close();
b.close();
a.close();This repository uses a canonical MLX conversion wrapper:
Use it when you want to:
- prepare a local MLX snapshot from a Hugging Face checkpoint
- standardize publish-time benchmark inputs
- keep local evaluation reproducible across machines
That wrapper produces MLX snapshots that can be used directly by:
- Dart model runners under
lib/src/models/ - export/import tooling under
models/text_lm/ - publish-time parity scripts under
benchmark/
For Gemma 4, the current publish-time text coverage uses the official
Unsloth MLX snapshot unsloth/gemma-4-E2B-it-UD-MLX-4bit directly instead of
re-quantizing locally, because Unsloth currently ships gemma4 model patches
for mlx-lm as a separate install step.
The repository includes a Python helper for turning an mlx-lm snapshot into a
shapeless .mlxfn artifact plus matching sample inputs:
Example:
uv sync
uv run python models/text_lm/export_bundle.py \
--snapshot-dir /path/to/mlx-snapshot \
--output-dir /path/to/out-bundleOutputs:
/path/to/out-bundle/function.mlxfn/path/to/out-bundle/inputs.safetensors
The export is shapeless, so the imported function accepts variable-length
input_ids tensors.
The generic Dart runner for exported artifacts is:
dart run models/common/import_run.dart \
/path/to/out-bundle/function.mlxfn \
/path/to/out-bundle/inputs.safetensorsThere are three main model-workflow areas in this repository:
lib/src/models/contains the main stable Dart model implementationsmodels/contains reusable non-runtime export and artifact toolingbenchmark/contains publish-time parity runners and report generation
Current stable Dart model implementations under lib/src/models/
include:
fsmn_vadparakeet_tdtqwen2_5qwen3_5kitten_ttssharedhelpers
Current publish-time validation under benchmark/ is organized
as a release matrix instead of a grab bag of local experiments.
Recommended prepublish text coverage:
unsloth/gemma-4-E2B-it-UD-MLX-4bitmlx-community/Qwen3.5-27B-4bit-DWQmlx-community/translategemma-27b-it-4bitmlx-community/NVIDIA-Nemotron-3-Nano-30B-A3B-4bitmlx-community/GLM-4.7-Flash-4bit
Recommended prepublish multimodal / speech coverage:
funasr/fsmn-vadmlx-community/MiniCPM-o-4_5-4bitmlx-community/Gemma-SEA-LION-v4-4B-VL-mlx-3bitmlx-community/Ming-omni-tts-0.5B-4bitmlx-community/kitten-tts-nano-0.8-6bitmlx-community/parakeet-tdt-0.6b-v3
Deterministic operator parity currently covers 114 checks across arithmetic,
tensor ops, scans, convolutions, linalg, fast ops, quantization, and random
APIs, with 0 failures on the benchmark machine.
- Date:
2026-04-04 - Machine:
MacBook Pro (Mac16,5) - Chip:
Apple M4 Max - CPU cores:
16(12performance +4efficiency) - Memory:
128 GB - OS:
macOS 26.4 (25E5223i) - Kernel:
Darwin 25.4.0 - Dart SDK:
3.11.1 - Python:
3.12viauv - MLX runtime:
0.31.1
Latest measured runtime snapshot on the benchmark machine, refreshed on
2026-04-04:
Text models:
| Model | Python MLX ms | Dart MLX ms | Max abs diff |
|---|---|---|---|
gemma-4-E2B-it-UD-MLX-4bit |
30.47 |
34.30 |
0 |
Qwen3.5-27B-4bit-DWQ |
172.81 |
170.25 |
0 |
translategemma-27b-it-4bit |
166.52 |
170.46 |
0 |
NVIDIA-Nemotron-3-Nano-30B-A3B-4bit |
36.62 |
35.67 |
0 |
GLM-4.7-Flash-4bit |
46.61 |
45.81 |
0 |
Non-text models:
| Model | Kind | Python MLX ms | Dart MLX ms | Max abs diff | Notes |
|---|---|---|---|---|---|
MiniCPM-o-4_5-4bit |
vlm |
130.82 |
131.58 |
0 |
synthetic image + prompt |
Gemma-SEA-LION-v4-4B-VL-mlx-3bit |
vlm |
718.60 |
756.92 |
0 |
synthetic image + prompt |
Ming-omni-tts-0.5B-4bit |
tts |
4.59 |
4.85 |
0 |
deterministic forward_with_cfg |
kitten-tts-nano-0.8-6bit |
tts |
66.25 |
69.20 |
1.19e-07 |
full waveform |
parakeet-tdt-0.6b-v3 |
asr |
30.95 |
29.72 |
5.72e-06 |
transcript matched |
Max abs diff is the maximum absolute difference between the Python MLX output
and the Dart MLX output for the compared tensor.
Examples:
0means the compared tensor matched exactly at the chosen dtype7.62939453125e-06means the worst element differed by about0.00000763- for text and VLM rows, the compared tensor is the final-token
logits[:16] - for
parakeet-tdt-0.6b-v3, the compared tensor is the first-steptoken_logits[:16] + duration_logits
Generate the publish-time report with warmup=3 and iters=10:
uv sync
HF_HUB_DISABLE_XET=1 uv run --no-project --with mlx-lm --with pillow --with mlx-vlm --with parakeet-mlx python benchmark/publish_report.pyThe aggregated results are written to:
benchmark/out/publish_report.json
Useful focused runs:
# fixed-mel Parakeet TDT comparison
uv run --no-project --with parakeet-mlx --with numpy python - <<'PY'
from benchmark.parakeet_tdt_sweep import asr_bench
import json
print(json.dumps(asr_bench('mlx-community/parakeet-tdt-0.6b-v3', warmup=1, iters=1), indent=2))
PY
# FSMN-VAD Dart/Python comparison on fixed features
uv run --no-project --with torch --with safetensors python benchmark/fsmn_vad_sweep.py
# FSMN-VAD Dart/Python comparison on real audio
uv run --no-project --with torch --with safetensors --with numpy python benchmark/fsmn_vad_audio_sweep.pyFor FSMN-VAD, the current Python reference backend is the upstream PyTorch
implementation, not a Python MLX runner. Treat those rows as correctness +
relative reference checks, not as Python-MLX speed comparisons.
Regenerate the raw bindings:
dart run ffigen --config ffigen.yamlTypical local verification:
dart analyze
dart test
dart pub publish --dry-runBenchmark tooling uses uv:
uv sync- This package targets Apple platforms only.
- The raw layer remains the escape hatch for the full MLX C surface.