Skip to content

Commit 185d63a

Browse files
authored
ci: Specify MPI implementation to mpich (flashinfer-ai#2182)
<!-- .github/pull_request_template.md --> ## πŸ“Œ Description We currently have unit tests failing as: ``` ========================================== Running: pytest --continue-on-collection-errors -s --junitxml=/junit/tests/comm/test_trtllm_mnnvl_allreduce.py.xml "tests/comm/test_trtllm_mnnvl_allreduce.py" ========================================== Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): (unknown)(): Unknown error class [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447 : system msg for write_line failure : Bad file descriptor Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: MPIR_Init_thread(192)........: MPID_Init(1665)..............: MPIDI_OFI_mpi_init_hook(1586): ... Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, cuda.bindings._bindings.cydriver, cuda.bindings.cydriver, cuda.bindings.driver, tvm_ffi.core, markupsafe._speedups, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, mpi4py.MPI (total: 22) !!!!!!! Segfault encountered !!!!!!! ... ❌ FAILED: tests/comm/test_trtllm_mnnvl_allreduce.py ``` These tests should be skipping in a single GPU environment, but are failing, which indicates that they are failing at MPI module load time. The current `dockerfile.cuXXX` installs MPI via `RUN conda install -n py312 -y mpi4py`. Upon investigating the docker build logs, [A month ago (Nov. 4)](https://github.com/flashinfer-ai/flashinfer/actions/runs/19084098717/job/54520197904#step:6:802), ``` flashinfer-ai#17 13.68 mpi-1.0.1 | mpich 6 KB conda-forge flashinfer-ai#17 13.68 mpi4py-4.1.1 |py312hd0af0b3_100 866 KB conda-forge flashinfer-ai#17 13.68 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` was being installed, [but yesterday](https://github.com/flashinfer-ai/flashinfer/actions/runs/19960576464/job/57239792717#step:6:673): ``` flashinfer-ai#17 13.59 impi_rt-2021.13.1 | ha770c72_769 41.7 MB conda-forge flashinfer-ai#17 13.59 mpi-1.0 | impi 6 KB conda-forge flashinfer-ai#17 13.59 mpi4py-4.1.1 |py312h18f78f0_102 864 KB conda-forge ``` is being installed. The mpich vs. impi are Implementations to the MPI: MPICH vs. Intel MPI. This is currently the suspected issue underlying the MPI load failures. Current PR specifies the MPI implementation via `RUN conda install -n py312 -y mpi4py mpich`. The result of the current PR produces ([build log](https://github.com/flashinfer-ai/flashinfer/actions/runs/19976372640/job/57293423165?pr=2182#step:6:436)): ``` flashinfer-ai#15 14.63 mpi-1.0.1 | mpich 6 KB conda-forge flashinfer-ai#15 14.63 mpi4py-4.1.1 |py312hd0af0b3_102 865 KB conda-forge flashinfer-ai#15 14.63 mpich-4.3.2 | h79b1c89_100 5.4 MB conda-forge ``` which now matches what we had before <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> ## πŸ” Related Issues <!-- Link any related issues here --> ## πŸš€ Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### βœ… Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## πŸ§ͺ Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
1 parent 6dfc1ba commit 185d63a

8 files changed

Lines changed: 8 additions & 8 deletions

β€Ždocker/Dockerfile.cu126β€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
2525
RUN bash /install/install_python_packages.sh cu126
2626

2727
# Install mpi4py in the conda environment
28-
RUN conda install -n py312 -y mpi4py
28+
RUN conda install -n py312 -y mpi4py mpich

β€Ždocker/Dockerfile.cu126.devβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
5050
RUN bash /install/install_python_packages.sh cu126 && pip3 install pre-commit
5151

5252
# Install mpi4py in the conda environment
53-
RUN conda install -n py312 -y mpi4py
53+
RUN conda install -n py312 -y mpi4py mpich
5454

5555
# Install oh-my-zsh
5656
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended

β€Ždocker/Dockerfile.cu128β€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
2525
RUN bash /install/install_python_packages.sh cu128
2626

2727
# Install mpi4py in the conda environment
28-
RUN conda install -n py312 -y mpi4py
28+
RUN conda install -n py312 -y mpi4py mpich

β€Ždocker/Dockerfile.cu128.devβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
5050
RUN bash /install/install_python_packages.sh cu128 && pip3 install pre-commit
5151

5252
# Install mpi4py in the conda environment
53-
RUN conda install -n py312 -y mpi4py
53+
RUN conda install -n py312 -y mpi4py mpich
5454

5555
# Install oh-my-zsh
5656
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended

β€Ždocker/Dockerfile.cu129β€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
2828
RUN bash /install/install_python_packages.sh cu129
2929

3030
# Install mpi4py in the conda environment
31-
RUN conda install -n py312 -y mpi4py
31+
RUN conda install -n py312 -y mpi4py mpich

β€Ždocker/Dockerfile.cu129.devβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
5050
RUN bash /install/install_python_packages.sh cu129 && pip3 install pre-commit
5151

5252
# Install mpi4py in the conda environment
53-
RUN conda install -n py312 -y mpi4py
53+
RUN conda install -n py312 -y mpi4py mpich
5454

5555
# Install oh-my-zsh
5656
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended

β€Ždocker/Dockerfile.cu130β€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
2828
RUN bash /install/install_python_packages.sh cu130
2929

3030
# Install mpi4py in the conda environment
31-
RUN conda install -n py312 -y mpi4py
31+
RUN conda install -n py312 -y mpi4py mpich

β€Ždocker/Dockerfile.cu130.devβ€Ž

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ COPY docker/install/install_python_packages.sh /install/install_python_packages.
5050
RUN bash /install/install_python_packages.sh cu130 && pip3 install pre-commit
5151

5252
# Install mpi4py in the conda environment
53-
RUN conda install -n py312 -y mpi4py
53+
RUN conda install -n py312 -y mpi4py mpich
5454

5555
# Install oh-my-zsh
5656
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended

0 commit comments

Comments
Β (0)