Skip to content

FP8 MoE requant: shard_map + lax.scan on TPU#3

Closed
rohan-reddy wants to merge 3 commits intomainfrom
model-loading-shard
Closed

FP8 MoE requant: shard_map + lax.scan on TPU#3
rohan-reddy wants to merge 3 commits intomainfrom
model-loading-shard

Conversation

@rohan-reddy
Copy link
Owner

@rohan-reddy rohan-reddy commented Mar 10, 2026

Summary

Draft PR for reviewing the diff of the updated model-loading branch (with shard_map) against main.

This is NOT meant to be merged — it's for reviewing the change

Changes

  • Shard FP8 MoE weights to TPU before requantization
  • lax.scan batched requantization (memory-bounded)
  • shard_map wrapping requant + process_moe_weights for 48% lower XLA reservation

Signed-off-by: Rohan Reddy <rreddy.nyc@gmail.com>
Signed-off-by: Rohan Reddy <rreddy.nyc@gmail.com>
@rohan-reddy rohan-reddy force-pushed the model-loading-shard branch from a5ec0a9 to b94d9a7 Compare March 10, 2026 02:36
Signed-off-by: Rohan Reddy <rreddy.nyc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant