-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[docker] fix: new images for sgl056 and vllm012 have compatibility issues #4714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the TransformerEngine version from v2.8 to v2.10 in the sglang and vllm Dockerfiles to resolve compatibility issues. The change is correct and addresses the stated problem. My review includes a suggestion to pin the dependency to a specific commit hash instead of a tag to improve build reproducibility and security.
| RUN MAX_JOBS=128 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git | ||
|
|
||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.8 | ||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better reproducibility and security, it's recommended to pin dependencies to a specific commit hash instead of a tag. The tag release_v2.10 can be moved, which could lead to different build results in the future. The commit hash corresponding to this tag is 06082989335780a5f7808246a30146313175883a.
RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@06082989335780a5f7808246a30146313175883a
| RUN MAX_JOBS=128 pip install -v --disable-pip-version-check --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git | ||
|
|
||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.8 | ||
| RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@release_v2.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better reproducibility and security, it's recommended to pin dependencies to a specific commit hash instead of a tag. The tag release_v2.10 can be moved, which could lead to different build results in the future. The commit hash corresponding to this tag is 06082989335780a5f7808246a30146313175883a.
RUN export NVTE_FRAMEWORK=pytorch && MAX_JOBS=128 NVTE_BUILD_THREADS_PER_JOB=4 pip3 install --resume-retries 999 --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@06082989335780a5f7808246a30146313175883a
…sues (volcengine#4714) ### What does this PR do? > TransformerEngine-v2.8 leads to unexpected crashes. Try to update it to v2.10. > Fix other resultant compatibility issues. --------- Co-authored-by: Begunner <[email protected]>
What does this PR do?