Skip to content

Conversation

@shuningjin
Copy link
Collaborator

@shuningjin shuningjin commented Nov 4, 2025

Description

Onboard deepseek3-671b to checkpoint conversion utility, orbax scan -> hf

Other changes:

  • utils.py - refactor of process_leaf_param function for clarity
  • to_huggingface.py- add timing and log.
  • generate_hf_golden_logits.py - add options to facilitate deepseek load

future work

  • only works with local save, gcs save has max retries exceed: b/457821616
  • hf -> orbax scan: mapping is there, need change stacking and optimize for to_maxtext, need test: b/457820372
  • hf <-> orbax unscan: b/457820735

Tests

Test details: b/450671690#comment12

conversion

ID=$(date +%Y-%m-%d-%H-%M-%S)
RUN_NAME=ds3-hf-$ID

python3 -m MaxText.utils.ckpt_conversion.to_huggingface src/MaxText/configs/base.yml \
model_name=deepseek3-671b \
load_parameters_path=gs://ranran-multipod-dev/deepseek3/conversion/bf16/1/0/items \
base_output_directory=/home/shuningjin/deepseek3-671b/deepseek3-671b-hf-$ID \
scan_layers=true use_multimodal=false \
skip_jax_distributed_system=True attention=dot_product mla_naive_kvcache=false \
checkpoint_storage_concurrent_gb=1024 \
dtype=bfloat16 weight_dtype=bfloat16

forward logit check

generate logits from newly generated hf checkpoint, compare it with the logit from orbax checkpoint, max KL=0.14

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Review Summary

This pull request introduces a conversion utility for the deepseek3-671b model, enabling the transformation of Orbax scan checkpoints to the Hugging Face format. The changes are well-structured, adding new configurations, shape mappings, and parameter mappings for the new model. The code also includes some welcome refactoring for clarity and adds useful logging and timing information to the conversion process.

🔍 General Feedback

  • The addition of the new model is comprehensive, covering all necessary parts of the conversion utility.
  • The refactoring in process_leaf_param significantly improves readability.
  • The inclusion of timing and progress bars (tqdm) is a great enhancement for user experience during a long-running process.

Overall, this is a solid contribution that extends the model support of the conversion tool. Just one minor cleanup item noted in the inline comment.

Copy link
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Could you help create 2 bugs to track the future work for DeepSeek you mentioned, and also has a TODO in the source code?

Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Shuning for adapting a huge model to this tool! Approve to unblock.

I know this is taking a huge amount of time doing one conversion. Could you also follow up with a conversion test in this tool after your change?

@shuningjin shuningjin changed the title Add deepseek3 conversion utility: orbax scan to hf Add deepseek3 conversion utility Nov 4, 2025
@shuningjin shuningjin force-pushed the shuningjin-ckpt-ds3 branch 2 times, most recently from 819891d to 59a4591 Compare November 5, 2025 00:16
@shuningjin shuningjin changed the title Add deepseek3 conversion utility Add deepseek3 conversion utility: orbax scan to hf Nov 5, 2025
Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the awesome work!

@copybara-service copybara-service bot merged commit 9e32bf4 into main Nov 5, 2025
46 checks passed
@copybara-service copybara-service bot deleted the shuningjin-ckpt-ds3 branch November 5, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants