fix(perf): skip solar-open layer eval during decode by inureyes · Pull Request #22 · lablup/mlxcel

inureyes · 2026-05-18T12:00:15Z

Summary

SolarOpenModel::forward was calling mlxcel_core::eval_all on the hidden state after every transformer layer to keep the MoE compute graph bounded (128 experts × ~50 ops/layer). That guard is needed for multi-token prefill — without it, the graph grows quadratically with seq_len × n_layers — but it's pure overhead on single-token decode, where the final-logits evaluation flushes the graph once anyway. The per-layer flush was costing ~48 GPU synchronizations per generated token.

Cherry-picked from mlxcel-internal commit c5e88612 (the internal docs/model_tests_m5max.md update was intentionally excluded, same as #20 — it references an internal-only baseline doc that was never in the public repo).

What changed

src/models/solar_open.rs — gates the per-layer eval_all on prefill only:

let eval_layer_outputs = should_eval_layer_outputs(input_ids);  // false when seq_len == 1
for (i, layer) in self.layers.iter().enumerate() {
    h = layer.forward(&h, &mut caches[i], mask);
    if eval_layer_outputs {
        let ptrs = [h.as_ref().unwrap() as *const MlxArray];
        unsafe { mlxcel_core::eval_all(&ptrs) };
    }
}

should_eval_layer_outputs is a 3-line helper that checks input_ids.shape.last() != Some(1), plus a unit test that pins the decode-vs-prefill behaviour.

Verification

make verify-fmt — clean
make verify-clippy (CI-faithful: --all-targets --features metal,accelerate -- -D warnings) — clean in 22s (warm cache)
make verify-test skipped (15–30 min release-mode run); the upstream commit is already validated in mlxcel-internal against the M5 Max sweep.

fix(perf): skip solar-open layer eval during decode

6d7abe1

inureyes added status:review Under review type:bug Bug fixes, error corrections, or issue resolutions type:performance Performance improvements priority:high High priority area:models Model architectures, weights, loading, metadata labels May 18, 2026

inureyes merged commit c7f40e1 into main May 18, 2026
1 check passed

inureyes deleted the fix/perf-skip-solar-open-decode branch May 18, 2026 12:00

inureyes self-assigned this May 18, 2026

inureyes added status:done Completed and removed status:review Under review labels May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf): skip solar-open layer eval during decode#22

fix(perf): skip solar-open layer eval during decode#22
inureyes merged 1 commit into
mainfrom
fix/perf-skip-solar-open-decode

inureyes commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inureyes commented May 18, 2026

Summary

What changed

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant