Skip to content

Commit 4c7301d

Browse files
authored
feat(moe): add stacked-buffer fast path for SSD streaming (#35)
Introduces `MLX_MOE_STACKED` and `MLX_MOE_FUSE_GATEUP` fast paths to SwitchGLU for SSD-streamed MoE inference. Replaces multiple per-expert kernel dispatches with a single batched gatherQuantizedMM per projection, drastically reducing CPU→GPU enqueue overhead on Apple Silicon. - Defaults to legacy behavior unless env flags are set - Automatically and safely falls back if the layer is ineligible (e.g. non-quantized weights, or batch size > 32) - Added unit tests to ensure fallback safety
1 parent 40d6b67 commit 4c7301d

4 files changed

Lines changed: 553 additions & 0 deletions

File tree

.github/workflows/ci.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,12 @@ jobs:
123123
# Replace the local submodule with the actively checked out code
124124
rm -rf mlx-swift-lm
125125
cp -r ../mlx-swift-lm .
126+
127+
- name: Update mlx-swift submodule to latest main
128+
run: |
129+
cd SwiftLM/mlx-swift
130+
git fetch origin main
131+
git checkout origin/main
126132
127133
- name: Build SwiftLM
128134
run: |

.github/workflows/downstream_integration.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,14 @@ jobs:
3333
mkdir -p SwiftLM/mlx-swift-lm
3434
find . -mindepth 1 -maxdepth 1 ! -name "SwiftLM" -exec cp -r {} SwiftLM/mlx-swift-lm/ \;
3535
36+
- name: Update mlx-swift submodule to latest main
37+
# SwiftLM's mlx-swift submodule pin may lag behind main.
38+
# Pull latest so new APIs are available for the build.
39+
run: |
40+
cd SwiftLM/mlx-swift
41+
git fetch origin main
42+
git checkout origin/main
43+
3644
- name: Install Metal Toolchain
3745
run: xcodebuild -downloadComponent MetalToolchain || true
3846

0 commit comments

Comments
 (0)