Skip to content

[Jax][Deepseek] Updated sharding scales to align with GMM_TP settings.#1865

Open
gpolovets1 wants to merge 1 commit intomainfrom
gpolovets/jax_moe_scale_sharding
Open

[Jax][Deepseek] Updated sharding scales to align with GMM_TP settings.#1865
gpolovets1 wants to merge 1 commit intomainfrom
gpolovets/jax_moe_scale_sharding

Conversation

@gpolovets1
Copy link
Collaborator

Description

Fixed scale shardings to properly work with gmm_TP and in the case of per-channel quantization (which is needed for gmm_v2.py as of 3/4/26).

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: George Polovets <gpolovets@gmail.com>
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a Github issue, please include a link, e.g.,:
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@gpolovets1 gpolovets1 requested a review from lk-chen March 5, 2026 03:59
@gpolovets1 gpolovets1 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 5, 2026
@lk-chen lk-chen changed the title Updated sharding scales to align with GMM_TP settings. [Jax][Deepseek] Updated sharding scales to align with GMM_TP settings. Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants