[levanter] Share Pallas autotune helpers and restore compile offload#4130
[levanter] Share Pallas autotune helpers and restore compile offload#4130dlwh merged 8 commits intomarin-community:mainfrom
Conversation
…n35-a3b-direct-compare # Conflicts: # lib/levanter/src/levanter/kernels/pallas/fused_cross_entropy_loss/api.py # lib/levanter/tests/kernels/test_pallas_fused_cross_entropy_loss.py
|
@claude review |
|
I'll analyze this and get back to you. |
|
@dlwh-golem will you review please |
|
No findings. Codex work
Codex transcript |
yonromai
left a comment
There was a problem hiding this comment.
Approved. I did not find a clear regression in the helper extraction or in the restored compile-offload path. The shared helper preserves the prior fused CE behavior around manual sharding, shard-map wrapping, tracer lowering, and fallback handling, and the targeted tests cover both the new helper and the fused CE sweep path.
Generated with Codex

Move shard-aware autotune benchmarking out of fused cross-entropy into a shared Pallas helper and restore compile offload for shard-mapped autotune sweeps. This keeps fused CE behavior intact while making the benchmark path reusable from other kernels.
Fixes #4129