Skip to content

Commit 4497104

Browse files
authored
update: bump MLX upstream pin to 84961223 (PRs )
Picks up upstream (CUDA qmm_naive / qmm_sm80 kernel bodies extracted into new qmm_naive.cuh / qmm_sm80.cuh headers — public ABI of the symbols declared in mlxcel's patches/.../qmm.h is unchanged), (CPU JIT preamble routed through JitCompiler::get_preamble and the prebuilt symbol renamed from get_kernel_preamble to get_prebuilt_preamble — mlxcel does not call either directly), and (AsStrided contiguity-flag accuracy fix in mlx/backend/common, computing data_size from the actually-occupied stride range). Three-location pin update applied per CLAUDE.md: - src/lib/mlx-cpp/CMakeLists.txt (GIT_TAG) - src/lib/mlxcel-core/build.rs (MLX_EXPECTED_COMMIT) -.github/workflows/release.yml (MLX_EXPECTED_COMMIT env) Patch headers retargeted to the new commit: - patches/mlx/backend/cuda/quantized/qmm/qmm.h - patches/mlx/backend/cuda/quantized/quantized.cpp Fused Metal kernel launchers in src/lib/mlx-cpp/turbo/ revalidated on Apple Silicon. The relevant symbols (mlx::core::fast::metal_kernel, mlx::core::full, mlx::core::Shape, mlx::core::float32, mlx::core::int32, metal::fast::exp) are unchanged across the bump; the three required correctness tests pass with significant headroom on the RMS<5e-3 gate: sparse_v_kernel_threshold_zero_matches_graph OK delegated_fused_kernel_matches_reference_over_200_steps RMS = 1.7263e-4 delegated_steel_envelope_matches_cold_only_fused_over_200_steps RMS = 1.5259e-4
1 parent d97593a commit 4497104

6 files changed

Lines changed: 16 additions & 9 deletions

File tree

.github/workflows/release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ jobs:
236236
env:
237237
# Must match GIT_TAG in src/lib/mlx-cpp/CMakeLists.txt and
238238
# MLX_EXPECTED_COMMIT in src/lib/mlxcel-core/build.rs
239-
MLX_EXPECTED_COMMIT: "c9aa560577d4f41677bc5830a8b7e806a07d4c6f"
239+
MLX_EXPECTED_COMMIT: "84961223c02925bef6bef95d3a0a046779bde935"
240240
run: |
241241
# Check every _deps directory for a valid .mlx-build-commit marker.
242242
# If the marker is missing or doesn't match, purge that _deps/ entirely.

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
1414
### Fixed
1515
- `StreamFilter` extended to cover Hermes-style `<tool_call>` / `</tool_call>` and Mistral Nemo `[TOOL_CALLS]` markers, which previously leaked raw markup into `delta.content` during streaming. Partial-marker buffering at token boundaries correctly holds back prefixes (e.g. `<tool_`) until the full tag can be confirmed, then releases them to `delta.content` if they turn out not to be a boundary. Gemma 4 `<|tool_call>` suppression is unaffected; the delimiter table ordering ensures the Gemma 4 pipe-delimited form wins the tiebreak over the Hermes plain form (#551).
1616

17+
### Changed
18+
- `MLX` upstream pin bumped from `c9aa5605` to `84961223` (3 commits, PRs #3443 / #3463 / #3475). PR #3443 splits the CUDA `qmm_naive` / `qmm_sm80` kernel bodies into new `qmm_naive.cuh` / `qmm_sm80.cuh` headers without changing the public ABI consumed by mlxcel's `patches/mlx/backend/cuda/quantized/qmm/qmm.h`; PR #3463 routes the CPU JIT preamble through `JitCompiler::get_preamble()` and renames the prebuilt symbol from `get_kernel_preamble` to `get_prebuilt_preamble` (mlxcel does not call either directly); PR #3475 fixes contiguity-flag accuracy in `AsStrided` by computing `data_size` from the actually-occupied stride range. Three-location pin update applied to `src/lib/mlx-cpp/CMakeLists.txt`, `src/lib/mlxcel-core/build.rs`, and `.github/workflows/release.yml` per `CLAUDE.md`. Fused Metal kernel launchers in `src/lib/mlx-cpp/turbo/` revalidated against the new pin: `mlx::core::fast::metal_kernel`, `mlx::core::full`, `mlx::core::Shape`, `mlx::core::float32`, `mlx::core::int32`, and `metal::fast::exp` symbols are unchanged across the bump.
19+
1720
### Security
1821
- Path-traversal defense in the downloader: `is_safe_relative_path` pre-filters each sibling filename returned by the HuggingFace API (rejects absolute paths, `..` components, backslash separators, and empty components). A secondary canonicalized `starts_with` guard on the resolved destination path is applied before writing each file. Download target files are written to a temporary path and atomically renamed into place, preventing partial writes from leaving corrupt files in the output directory (fixes C1 and H1 from security review of #457).
1922
- Structured-output schema limits (64 KiB serialized size, 32 nesting depth, 64 `$ref` count) and tightened `llguidance` parser caps (`max_grammar_size: 100 000`, `max_lexer_states: 50 000`) applied before grammar compilation so an adversarial client cannot use the schema endpoint as a CPU/memory exhaustion vector. Schema content is never echoed in public error messages (#550).

src/lib/mlx-cpp/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ else()
8989
FetchContent_Declare(
9090
mlx
9191
GIT_REPOSITORY "https://github.com/ml-explore/mlx.git"
92-
GIT_TAG c9aa560577d4f41677bc5830a8b7e806a07d4c6f)
92+
GIT_TAG 84961223c02925bef6bef95d3a0a046779bde935)
9393

9494
# Use FetchContent_Populate + add_subdirectory so we can apply source
9595
# overlays before the MLX build system processes the files.

src/lib/mlx-cpp/patches/mlx/backend/cuda/quantized/qmm/qmm.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
// Copyright © 2026 Apple Inc.
2-
// Patched by mlxcel: matches upstream c9aa5605. Declarations carry the
2+
// Patched by mlxcel: matches upstream 84961223. Declarations carry the
33
// optional<array> lhs_indices / rhs_indices parameters on qmm_sm80 and
4-
// qmm_naive; no functional change between 68cf2fdd and c9aa5605.
4+
// qmm_naive; upstream #3443 (c9aa5605..84961223) split the kernel body
5+
// into qmm_naive.cuh / qmm_sm80.cuh while preserving the public ABI of
6+
// the symbols declared here.
57

68
#pragma once
79

src/lib/mlx-cpp/patches/mlx/backend/cuda/quantized/quantized.cpp

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
// Copyright © 2025 Apple Inc.
22
// Patched by mlxcel: ensure input contiguity in QuantizedMatmul for
33
// non-contiguous 3D batched weights (e.g. GLM-4 MLA embed_q with
4-
// transpose=false). Synced to upstream c9aa5605, which folds in the
5-
// #3469 cutlass-half-type fix (ensure_row_contiguous on x and indices)
6-
// and continues to accept the optional<array> lhs_indices / rhs_indices
7-
// parameters on qmm_sm80 / qmm_naive.
4+
// transpose=false). Synced to upstream 84961223 (post-c9aa5605: PR #3443
5+
// extracted qmm_naive / qmm_sm80 kernel bodies into .cuh headers but
6+
// preserved the public function signatures consumed here, including the
7+
// #3469 cutlass-half-type fix that landed at c9aa5605: ensure_row_contiguous
8+
// on x and indices, plus the optional<array> lhs_indices / rhs_indices
9+
// parameters on qmm_sm80 / qmm_naive).
810

911
#include "mlx/backend/cuda/quantized/quantized.h"
1012
#include "mlx/backend/cuda/device.h"

src/lib/mlxcel-core/build.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ fn main() {
134134
}
135135

136136
/// Expected MLX git commit — must match GIT_TAG in mlx-cpp/CMakeLists.txt.
137-
const MLX_EXPECTED_COMMIT: &str = "c9aa560577d4f41677bc5830a8b7e806a07d4c6f";
137+
const MLX_EXPECTED_COMMIT: &str = "84961223c02925bef6bef95d3a0a046779bde935";
138138

139139
/// Purge stale cached MLX build artifacts before CMake runs.
140140
///

0 commit comments

Comments
 (0)