Skip to content

Fix Metal standard version and MPS encoder lifecycle#1

Open
robtaylor wants to merge 8 commits intomainfrom
fix-metal-mps-lifecycle
Open

Fix Metal standard version and MPS encoder lifecycle#1
robtaylor wants to merge 8 commits intomainfrom
fix-metal-mps-lifecycle

Conversation

@robtaylor
Copy link
Owner

Summary

  • Fix Metal standard version: metal4.0metal3.2 to target macOS 15+ instead of unreleased macOS 26
  • Fix MPS encoder lifecycle: Use stream->commandEncoder() instead of [commandBuffer computeCommandEncoder] to integrate with PyTorch's kernel coalescing. The old pattern crashes on sequential kernel calls.

Files changed

  • build2cmake/src/templates/metal/compile-metal.cmake — metal3.2 instead of metal4.0
  • template/__KERNEL_NAME_NORMALIZED___metal/__KERNEL_NAME_NORMALIZED__.mm — use MPSStream encoder API
  • builder/examples/relu/relu_metal/relu.mm — same fix
  • builder/examples/extra-data/relu_metal/relu.mm — same fix

Test plan

  • metal3.2 metallibs load correctly on macOS 15 (Apple Silicon)
  • fused-rms-norm 74/74 tests pass (was crashing on sequential calls)
  • rotary-embedding 217/217 tests pass

* start terraform scripts.

* up

* up

* up

* bit more details.

* nix fmt.

* fmt

* Apply suggestions from code review

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* address review feedback.

* nix fmt.

* reduce startup time.

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Use stream->commandEncoder() instead of creating encoders directly via
[cmdBuf computeCommandEncoder] to properly integrate with PyTorch's MPS
stream encoder lifecycle management (kernel coalescing). Direct encoder
creation bypasses the stream's internal _commandEncoder state and crashes
on sequential kernel dispatches.

Lower the default Metal standard from metal3.2 (macOS 15+) to metal3.1
(macOS 14+) since all current kernel features (bfloat16_t, simd_sum,
simd_shuffle, threadgroup_barrier) are available in Metal 3.1.

Add multi-strategy Metal toolchain detection for macOS 14+:
- Separate Metal toolchain component (macOS 26+ cryptex mount)
- xcrun/xcode-select based detection
- Direct /Applications/Xcode*.app filesystem scan fallback

Also clear SDKROOT in xcrunHost to prevent Nix-set SDK paths from
interfering with system xcrun.

Fixes: huggingface#307

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Test Metal kernel builds across multiple macOS versions to verify
compatibility with the metal3.1 standard (macOS 14+). Use sandbox=relaxed
for Nix to support __noChroot builds that access the host Metal toolchain.
The separate Metal toolchain download is only needed on macOS 26+.

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
macOS 14 builds succeed but MPS tests may OOM on runners with
limited unified memory. Use continue-on-error so macos-14 failures
don't block the workflow. Update Metal docs to reflect macOS 15+
as the supported baseline with macOS 14 best-effort.

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants