[NPU] [BUGFIX] Test ascend memory consumption.py fix#17995
[NPU] [BUGFIX] Test ascend memory consumption.py fix#17995OrangeRedeng wants to merge 68 commits intosgl-project:mainfrom
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary of ChangesHello @OrangeRedeng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on optimizing memory consumption for Mixture-of-Experts (MoE) models running on Ascend NPUs. It achieves this by refining how MoE layer weights are processed and loaded, introducing lazy initialization for expert weights, and cleaning up outdated environment variable references in documentation. A new comprehensive unit test has been added to validate these memory improvements, ensuring the server's NPU memory footprint remains efficient. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several optimizations to reduce memory consumption on Ascend NPU devices, particularly for Mixture-of-Experts (MoE) models. The key changes include refactoring weight processing to avoid unnecessary tensor copies by removing .contiguous() calls, and using lazy initialization for expert weights to speed up model loading. Additionally, a new memory consumption test has been added to verify these improvements. The changes are well-implemented and contribute to better performance and efficiency. I have one minor suggestion for improving the clarity of a comment in the new test file.
I am having trouble creating individual review comments. Click here to see my feedback.
test/registered/ascend/test_ascend_memory_consumption.py (64)
This comment appears to be a copy-paste from above. To improve clarity, it should be updated to reflect that this block of code calculates the memory used by the server after startup.
### Calculate memory used by the server
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
Motivation
Fix of #17994
Due to the inability to load the model from Huggingface, the test from #15904 PR does not work.

Modifications
Change huggingface path to local path on CI server
Accuracy Tests
shall be covered by ci
Benchmarking and Profiling
shall be covered by ci
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci