Fix Metal standard version and MPS encoder lifecycle#1
Open
Fix Metal standard version and MPS encoder lifecycle#1
Conversation
fc75d89 to
3f73bf7
Compare
3f73bf7 to
c06d509
Compare
c06d509 to
0c078dc
Compare
0c078dc to
e744d08
Compare
e744d08 to
12c9a21
Compare
12c9a21 to
95a98cf
Compare
* start terraform scripts. * up * up * up * bit more details. * nix fmt. * fmt * Apply suggestions from code review Co-authored-by: Daniël de Kok <me@danieldk.eu> * address review feedback. * nix fmt. * reduce startup time. --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>
95a98cf to
7b4fef4
Compare
7b4fef4 to
acf3f14
Compare
acf3f14 to
ee0c112
Compare
ee0c112 to
33406e1
Compare
33406e1 to
3ae5790
Compare
Use stream->commandEncoder() instead of creating encoders directly via [cmdBuf computeCommandEncoder] to properly integrate with PyTorch's MPS stream encoder lifecycle management (kernel coalescing). Direct encoder creation bypasses the stream's internal _commandEncoder state and crashes on sequential kernel dispatches. Lower the default Metal standard from metal3.2 (macOS 15+) to metal3.1 (macOS 14+) since all current kernel features (bfloat16_t, simd_sum, simd_shuffle, threadgroup_barrier) are available in Metal 3.1. Add multi-strategy Metal toolchain detection for macOS 14+: - Separate Metal toolchain component (macOS 26+ cryptex mount) - xcrun/xcode-select based detection - Direct /Applications/Xcode*.app filesystem scan fallback Also clear SDKROOT in xcrunHost to prevent Nix-set SDK paths from interfering with system xcrun. Fixes: huggingface#307 Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Test Metal kernel builds across multiple macOS versions to verify compatibility with the metal3.1 standard (macOS 14+). Use sandbox=relaxed for Nix to support __noChroot builds that access the host Metal toolchain. The separate Metal toolchain download is only needed on macOS 26+. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
macOS 14 builds succeed but MPS tests may OOM on runners with limited unified memory. Use continue-on-error so macos-14 failures don't block the workflow. Update Metal docs to reflect macOS 15+ as the supported baseline with macOS 14 best-effort. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
0913ee4 to
c810460
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
metal4.0→metal3.2to target macOS 15+ instead of unreleased macOS 26stream->commandEncoder()instead of[commandBuffer computeCommandEncoder]to integrate with PyTorch's kernel coalescing. The old pattern crashes on sequential kernel calls.Files changed
build2cmake/src/templates/metal/compile-metal.cmake— metal3.2 instead of metal4.0template/__KERNEL_NAME_NORMALIZED___metal/__KERNEL_NAME_NORMALIZED__.mm— use MPSStream encoder APIbuilder/examples/relu/relu_metal/relu.mm— same fixbuilder/examples/extra-data/relu_metal/relu.mm— same fixTest plan
fused-rms-norm74/74 tests pass (was crashing on sequential calls)rotary-embedding217/217 tests pass