-
Notifications
You must be signed in to change notification settings - Fork 419
Add deepseek3 conversion utility: orbax scan to hf #2596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This pull request introduces a conversion utility for the deepseek3-671b model, enabling the transformation of Orbax scan checkpoints to the Hugging Face format. The changes are well-structured, adding new configurations, shape mappings, and parameter mappings for the new model. The code also includes some welcome refactoring for clarity and adds useful logging and timing information to the conversion process.
🔍 General Feedback
- The addition of the new model is comprehensive, covering all necessary parts of the conversion utility.
- The refactoring in
process_leaf_paramsignificantly improves readability. - The inclusion of timing and progress bars (
tqdm) is a great enhancement for user experience during a long-running process.
Overall, this is a solid contribution that extends the model support of the conversion tool. Just one minor cleanup item noted in the inline comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Could you help create 2 bugs to track the future work for DeepSeek you mentioned, and also has a TODO in the source code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Shuning for adapting a huge model to this tool! Approve to unblock.
I know this is taking a huge amount of time doing one conversion. Could you also follow up with a conversion test in this tool after your change?
eddf8b9 to
744ead5
Compare
819891d to
59a4591
Compare
59a4591 to
d0ce195
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the awesome work!
Description
Onboard deepseek3-671b to checkpoint conversion utility, orbax scan -> hf
Other changes:
utils.py- refactor ofprocess_leaf_paramfunction for clarityto_huggingface.py- add timing and log.generate_hf_golden_logits.py- add options to facilitate deepseek loadfuture work
Tests
Test details: b/450671690#comment12
conversion
forward logit check
generate logits from newly generated hf checkpoint, compare it with the logit from orbax checkpoint, max KL=0.14
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.