Skip to content

feat: add uccl-ep to docker built and add USING_UEP flag to run_pretrain and runner hook#540

Open
zhenhuang12 wants to merge 10 commits intomainfrom
dev/zhuang12/uep
Open

feat: add uccl-ep to docker built and add USING_UEP flag to run_pretrain and runner hook#540
zhenhuang12 wants to merge 10 commits intomainfrom
dev/zhuang12/uep

Conversation

@zhenhuang12
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings February 5, 2026 09:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for UCCL-EP (Unified Collective Communication Library - Expert Parallelism) to the project, enabling an alternative backend for MoE dispatch/combine operations. The changes include Docker build integration, runtime configuration through environment variables, and conditional logic to select between TURBO and DEEP_EP backends.

Changes:

  • Adds UCCL-EP installation to Docker build process with configurable commit hash
  • Implements USING_UEP flag to enable UCCL-EP functionality in runner hooks and pretrain scripts
  • Updates validation logic to conditionally apply turbo_deepep_num_cu constraint based on backend selection

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.github/workflows/ci.yaml Adds UCCL_COMMIT environment variable and passes it to Docker builds
.github/workflows/docker/Dockerfile Adds UCCL-EP installation steps during Docker image build
runner/helpers/hooks/05_using_uep.sh New hook script to validate and configure UCCL-EP environment
examples/run_pretrain.sh Adds UCCL rebuild option and UEP configuration logic
examples/run_local_pretrain.sh Passes REBUILD_UCCL and USING_UEP environment variables to Docker container
examples/moe_package/run_deepseek_v2_pretrain_mi355x.sh Enables UCCL-EP feature (option 8) in MoE features array
examples/moe_package/run_deepseek_v2_lite_pretrain_mi355x.sh Enables UCCL-EP feature (option 8) in MoE features array
primus/modules/trainer/megatron/utils.py Updates validation to only enforce CU constraint when using TURBO backend

Copilot AI review requested due to automatic review settings February 5, 2026 09:12
@AMD-AGI AMD-AGI deleted a comment from Copilot AI Feb 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


git clone https://github.com/uccl-project/uccl.git
cd uccl || exit
cd ep && PYTORCH_ROCM_ARCH="gfx942;gfx950" python3 setup.py build && cd ..
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded GPU architectures 'gfx942;gfx950' should be configurable via an environment variable to support different hardware configurations.

Suggested change
cd ep && PYTORCH_ROCM_ARCH="gfx942;gfx950" python3 setup.py build && cd ..
cd ep && PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH:-gfx942;gfx950}" python3 setup.py build && cd ..

Copilot uses AI. Check for mistakes.
Comment on lines 404 to 424
# ----------------- Using UCCL-EP -----------------
if [ "$USING_UEP" == "1" ]; then
LOG_INFO "USING_UEP is enabled, checking required packages..."

if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"

if [ "$ENABLE_NUMA_BINDING" != "1" ]; then
LOG_INFO "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader workers exiting unexpectedly."
fi

export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=DEEP_EP
LOG_INFO "PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to DEEP_EP"
else
export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=TURBO
LOG_INFO "USING_UEP is disabled. PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to TURBO"
fi
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic block is duplicated between run_pretrain.sh and runner/helpers/hooks/05_using_uep.sh. Consider consolidating this logic into a single location to avoid maintenance issues.

Suggested change
# ----------------- Using UCCL-EP -----------------
if [ "$USING_UEP" == "1" ]; then
LOG_INFO "USING_UEP is enabled, checking required packages..."
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
if [ "$ENABLE_NUMA_BINDING" != "1" ]; then
LOG_INFO "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader workers exiting unexpectedly."
fi
export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=DEEP_EP
LOG_INFO "PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to DEEP_EP"
else
export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=TURBO
LOG_INFO "USING_UEP is disabled. PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to TURBO"
fi
configure_using_uep() {
# ----------------- Using UCCL-EP -----------------
if [ "$USING_UEP" == "1" ]; then
LOG_INFO "USING_UEP is enabled, checking required packages..."
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
if [ "$ENABLE_NUMA_BINDING" != "1" ]; then
LOG_INFO "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader worker exited unexpectedly."
fi
export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=DEEP_EP
LOG_INFO "PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to DEEP_EP"
else
export PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND=TURBO
LOG_INFO "USING_UEP is disabled. PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND set to TURBO"
fi
}
# Configure UCCL-EP / TURBO backend selection
configure_using_uep

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 5, 2026 09:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"

if [ "$ENABLE_NUMA_BINDING" != "1" ]; then
LOG_INFO "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader workers exiting unexpectedly."
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the warning logic from the hook file (runner/helpers/hooks/05_using_uep.sh line 27). Consider using LOG_WARN for consistency with the hook implementation, as both locations serve the same warning purpose.

Suggested change
LOG_INFO "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader workers exiting unexpectedly."
LOG_WARN "ENABLE_NUMA_BINDING is not enabled! Please set ENABLE_NUMA_BINDING=1 to avoid dataloader workers exiting unexpectedly."

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 6, 2026 02:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

LOG_INFO "USING_UEP is enabled, checking required packages..."

if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message only mentions 'uccl' but the check on line 19 validates both uccl and deep_ep packages. The error message should mention both packages to accurately reflect what is being validated.

Suggested change
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
LOG_ERROR "uccl and/or deep_ep are not installed! Please use pre-installed primus image or set REBUILD_UCCL=1 and ensure deep_ep is installed."

Copilot uses AI. Check for mistakes.
args.expert_model_parallel_size >= 16
and os.getenv("PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND") == "TURBO"
):
# Turbo DeepEP is not supported CUs > 32 when using internode dispatch/combine.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error in comment: 'is not supported CUs' should be 'does not support CUs' or 'is not supported for CUs'.

Suggested change
# Turbo DeepEP is not supported CUs > 32 when using internode dispatch/combine.
# Turbo DeepEP is not supported for CUs > 32 when using internode dispatch/combine.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 6, 2026 02:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

LOG_INFO "USING_UEP is enabled, checking required packages..."

if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message on line 409 only mentions 'uccl is not installed' but the condition on line 408 checks for both uccl and deep_ep packages. The error message should clarify that either or both packages may be missing.

Suggested change
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
LOG_ERROR "uccl and/or deep_ep are not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."

Copilot uses AI. Check for mistakes.
Comment on lines 378 to 402
# ----------------- Rebuild UCCL -----------------
export REBUILD_UCCL=${REBUILD_UCCL:-0}
if [ "$REBUILD_UCCL" == "1" ]; then
LOG_INFO "Rebuilding UCCL from source..."
apt update && apt install -y rdma-core libibverbs-dev libnuma-dev libgoogle-glog-dev
mkdir -p "/workspace/"
cd "/workspace" || exit

# Clean up old directory if exists to avoid git clone conflicts
if [ -d "uccl" ]; then
LOG_INFO "Removing existing uccl directory..."
rm -rf uccl
fi

git clone https://github.com/uccl-project/uccl.git
cd uccl || exit
cd ep && PYTORCH_ROCM_ARCH="gfx942;gfx950" python3 setup.py build && cd ..
cp ep/build/**/*.so uccl
pip3 install --no-build-isolation .
cd ep/deep_ep_wrapper && pip3 install --no-build-isolation . -v
cd "${PRIMUS_PATH}" || exit
LOG_INFO "Rebuilding UCCL from source done."
else
LOG_INFO "Skip UCCL rebuild. REBUILD_UCCL=$REBUILD_UCCL"
fi
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UCCL rebuild logic (lines 378-402) is duplicated between examples/run_pretrain.sh and the hook runner/helpers/hooks/05_using_uep.sh validation logic (lines 405-424). The validation logic in lines 405-424 is also duplicated in the hook file. Consider extracting this into a shared function or sourcing a common script to reduce duplication and ensure consistency.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 7, 2026 08:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment on lines +69 to +70
cd ep && TORCH_CUDA_ARCH_LIST="gfx942,gfx950" python3 setup.py build && cd .. && \
cp ep/build/**/*.so uccl && \
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCH_CUDA_ARCH_LIST is a CUDA-oriented env var and is unlikely to affect ROCm builds; additionally, cp ep/build/**/*.so relies on ** glob expansion, which typically won’t work in Docker RUN (default /bin/sh) and may fail to copy any .so files. Use the ROCm-appropriate arch env var (consistent with the runtime script) and replace the ** copy with a find ... -name '*.so' -exec ... (or similar) approach that works under /bin/sh.

Suggested change
cd ep && TORCH_CUDA_ARCH_LIST="gfx942,gfx950" python3 setup.py build && cd .. && \
cp ep/build/**/*.so uccl && \
cd ep && PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH}" python3 setup.py build && cd .. && \
find ep/build -name '*.so' -exec cp {} uccl \; && \

Copilot uses AI. Check for mistakes.
git clone https://github.com/uccl-project/uccl.git
cd uccl || exit
cd ep && PYTORCH_ROCM_ARCH="gfx942;gfx950" python3 setup.py build && cd ..
cp ep/build/**/*.so uccl
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bash, ** recursive globbing requires shopt -s globstar; without it, this copy may fail silently or not match any files (leaving uccl without the needed .so). Replace this with a find ep/build -name '*.so' -exec cp ... style copy, or enable globstar explicitly before using **.

Suggested change
cp ep/build/**/*.so uccl
find ep/build -name '*.so' -exec cp {} uccl \;

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 7, 2026 11:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

UCCL_BUILD_DIR="${UCCL_BUILD_DIR:-/tmp/uccl_${HOSTNAME:-$(hostname)}}"
UCCL_REF="${UCCL_REF:-}"
GPU_ARCHS="${GPU_ARCHS:-gfx942;gfx950}"
GPU_ARCHS="${GPU_ARCHS:-gfx942,gfx950}"
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes both the arch-list separator and the environment variable used for ROCm builds. For PyTorch ROCm extensions, PYTORCH_ROCM_ARCH is typically expected (and list separators are commonly ;), while TORCH_CUDA_ARCH_LIST is CUDA-oriented and may be ignored on ROCm—potentially producing a build without the intended gfx targets. Consider reverting to PYTORCH_ROCM_ARCH and using the separator format expected by the build toolchain.

Copilot uses AI. Check for mistakes.

LOG_INFO_RANK0 "[hook system] Building uccl ep"
cd ep && PYTORCH_ROCM_ARCH="${GPU_ARCHS}" python3 setup.py build && cd ..
cd ep && TORCH_CUDA_ARCH_LIST="${GPU_ARCHS}" python3 setup.py build && cd ..
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes both the arch-list separator and the environment variable used for ROCm builds. For PyTorch ROCm extensions, PYTORCH_ROCM_ARCH is typically expected (and list separators are commonly ;), while TORCH_CUDA_ARCH_LIST is CUDA-oriented and may be ignored on ROCm—potentially producing a build without the intended gfx targets. Consider reverting to PYTORCH_ROCM_ARCH and using the separator format expected by the build toolchain.

Copilot uses AI. Check for mistakes.
if args.expert_model_parallel_size >= 16:
if (
args.expert_model_parallel_size >= 16
and os.getenv("PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND") == "TURBO"
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND is unset, this condition becomes false and the turbo_deepep_num_cu <= 32 constraint is skipped even though the effective default backend elsewhere appears to be TURBO. To avoid silently bypassing validation, default the getenv to \"TURBO\" (or otherwise ensure the env var is always set before validation runs).

Suggested change
and os.getenv("PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND") == "TURBO"
and os.getenv("PRIMUS_TURBO_MOE_DISPATCH_COMBINE_BACKEND", "TURBO") == "TURBO"

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +75
RUN if [ "$PRIMUS_TURBO_FRAMEWORK" != "JAX" ]; then \
cd /opt && \
git clone https://github.com/uccl-project/uccl.git && \
cd uccl && \
git checkout ${UCCL_COMMIT} && \
cd ep && TORCH_CUDA_ARCH_LIST="gfx942,gfx950" python3 setup.py build && cd .. && \
cp ep/build/**/*.so uccl && \
pip3 install --no-build-isolation . -v && \
cd ep/deep_ep_wrapper && pip3 install --no-build-isolation . -v && \
rm -rf /opt/uccl; \
fi
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concrete issues here: (1) cp ep/build/**/*.so uccl relies on ** globstar expansion, which is not enabled by default in bash and may fail to match/copy anything in Docker builds. Prefer an explicit find ... -name '*.so' -exec cp ... pattern or a known build output path. (2) git checkout ${UCCL_COMMIT} will fail if UCCL_COMMIT is unset/empty in non-CI builds; guard with a non-empty check or provide a default to keep local builds working.

Copilot uses AI. Check for mistakes.
Comment on lines +394 to +395
cd ep && TORCH_CUDA_ARCH_LIST="gfx942,gfx950" python3 setup.py build && cd ..
cp ep/build/**/*.so uccl
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp ep/build/**/*.so uccl depends on globstar (**) expansion, which is typically disabled by default and can lead to no .so files being copied (and subsequent install/runtime failures). Use a glob pattern that doesn't require globstar or switch to a find-based copy. Also consider aligning the arch env var/list format with whatever the ROCm build expects (see UCCL rebuild hook).

Suggested change
cd ep && TORCH_CUDA_ARCH_LIST="gfx942,gfx950" python3 setup.py build && cd ..
cp ep/build/**/*.so uccl
cd ep && TORCH_CUDA_ARCH_LIST="gfx942;gfx950" python3 setup.py build && cd ..
find ep/build -name '*.so' -exec cp -t uccl {} +

Copilot uses AI. Check for mistakes.
fi

git clone https://github.com/uccl-project/uccl.git
cd uccl || exit
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebuild path clones UCCL from main without pinning to a commit/tag, which makes runs non-reproducible and can break unexpectedly over time. Consider supporting a UCCL_COMMIT/UCCL_REF env var (similar to CI) and checking out that ref when provided.

Suggested change
cd uccl || exit
cd uccl || exit
# Optionally pin UCCL to a specific ref/commit for reproducible rebuilds.
# If UCCL_REF is not set, UCCL_COMMIT can be used as a fallback.
UCCL_CHECKOUT_REF="${UCCL_REF:-$UCCL_COMMIT}"
if [ -n "${UCCL_CHECKOUT_REF}" ]; then
LOG_INFO "Checking out UCCL ref: ${UCCL_CHECKOUT_REF}"
git checkout "${UCCL_CHECKOUT_REF}" || exit
fi

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +24
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pip directly can point at a different Python environment than python3/pip3 (especially in containers/venvs), leading to false negatives/positives when checking installation. Prefer python3 -m pip show ... (or consistently pip3) to ensure the check targets the same interpreter used elsewhere in the scripts.

Suggested change
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
if ! python3 -m pip show uccl &>/dev/null || ! python3 -m pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(python3 -m pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(python3 -m pip show deep_ep | grep Version)"

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +24
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pip directly can point at a different Python environment than python3/pip3 (especially in containers/venvs), leading to false negatives/positives when checking installation. Prefer python3 -m pip show ... (or consistently pip3) to ensure the check targets the same interpreter used elsewhere in the scripts.

Suggested change
if ! pip show uccl &>/dev/null || ! pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(pip show deep_ep | grep Version)"
if ! python3 -m pip show uccl &>/dev/null || ! python3 -m pip show deep_ep &>/dev/null; then
LOG_ERROR "uccl is not installed! Please use pre-installed primus image or set REBUILD_UCCL=1."
exit 1
fi
LOG_INFO "uccl package is installed: $(python3 -m pip show uccl | grep Version)"
LOG_INFO "deep_ep package is installed: $(python3 -m pip show deep_ep | grep Version)"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant