Skip to content

fix(perf): skip solar-open layer eval during decode#22

Merged
inureyes merged 1 commit into
mainfrom
fix/perf-skip-solar-open-decode
May 18, 2026
Merged

fix(perf): skip solar-open layer eval during decode#22
inureyes merged 1 commit into
mainfrom
fix/perf-skip-solar-open-decode

Conversation

@inureyes
Copy link
Copy Markdown
Member

Summary

SolarOpenModel::forward was calling mlxcel_core::eval_all on the hidden state after every transformer layer to keep the MoE compute graph bounded (128 experts × ~50 ops/layer). That guard is needed for multi-token prefill — without it, the graph grows quadratically with seq_len × n_layers — but it's pure overhead on single-token decode, where the final-logits evaluation flushes the graph once anyway. The per-layer flush was costing ~48 GPU synchronizations per generated token.

Cherry-picked from mlxcel-internal commit c5e88612 (the internal docs/model_tests_m5max.md update was intentionally excluded, same as #20 — it references an internal-only baseline doc that was never in the public repo).

What changed

src/models/solar_open.rs — gates the per-layer eval_all on prefill only:

let eval_layer_outputs = should_eval_layer_outputs(input_ids);  // false when seq_len == 1
for (i, layer) in self.layers.iter().enumerate() {
    h = layer.forward(&h, &mut caches[i], mask);
    if eval_layer_outputs {
        let ptrs = [h.as_ref().unwrap() as *const MlxArray];
        unsafe { mlxcel_core::eval_all(&ptrs) };
    }
}

should_eval_layer_outputs is a 3-line helper that checks input_ids.shape.last() != Some(1), plus a unit test that pins the decode-vs-prefill behaviour.

Verification

  • make verify-fmt — clean
  • make verify-clippy (CI-faithful: --all-targets --features metal,accelerate -- -D warnings) — clean in 22s (warm cache)
  • make verify-test skipped (15–30 min release-mode run); the upstream commit is already validated in mlxcel-internal against the M5 Max sweep.

@inureyes inureyes added status:review Under review type:bug Bug fixes, error corrections, or issue resolutions type:performance Performance improvements priority:high High priority area:models Model architectures, weights, loading, metadata labels May 18, 2026
@inureyes inureyes merged commit c7f40e1 into main May 18, 2026
1 check passed
@inureyes inureyes deleted the fix/perf-skip-solar-open-decode branch May 18, 2026 12:00
@inureyes inureyes self-assigned this May 18, 2026
@inureyes inureyes added status:done Completed and removed status:review Under review labels May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:models Model architectures, weights, loading, metadata priority:high High priority status:done Completed type:bug Bug fixes, error corrections, or issue resolutions type:performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant