Skip to content

Commit 22a43c3

Browse files
committed
Modify README
1 parent e4bd999 commit 22a43c3

File tree

3 files changed

+82
-7
lines changed

3 files changed

+82
-7
lines changed

AE.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,6 @@
22

33
This document provides instructions for obtaining the artifact, performing necessary preprocessing, and executing experiments using the provided scripts.
44

5-
```python
6-
git clone --recurse-submodules https://github.com/mcrl/spipe.git
7-
```
8-
95
## Prerequisites
106

117
- CUDA 12.4

PREREQUISITES.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
### UCX (ddd634)
2+
3+
```bash
4+
git clone https://github.com/openucx/ucx && cd ucx && git checkout ddd634
5+
./autogen.sh
6+
mkdir build && cd build
7+
../contrib/configure-release --prefix=${UCX_ROOT} --with-cuda=${CUDA_ROOT} --enable-mt && make -j`nproc` install
8+
# Update PATH and LD_LIBRARY_PATH
9+
```
10+
11+
### ompi (424151)
12+
```bash
13+
git clone --recursive https://github.com/open-mpi/ompi.git && cd ompi && git checkout 424151
14+
./autogen.pl
15+
mkdir build && cd build
16+
../configure --prefix=${MPI_ROOT} --with-ucx=${UCX_ROOT} --with-cuda=${CUDA_ROOT} && make -j`nproc` install
17+
# Update PATH and LD_LIBRARY_PATH
18+
```
19+
20+
### nv_peer_mem
21+
22+
Run following command if you want to use GPUDirectRDMA and have its capable GPUs (e.g., V100). If not (e.g., RTX 3090), this isn't necessary.
23+
```python
24+
insmod /lib/modules/5.4.0-100-generic/updates/dkms/nvidia-peermem.ko
25+
```

README.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,70 @@ spipe/
2727
2828
├── examples/ # SPipe usage examples
2929
├── data/ # Data vocab, merge file
30-
└── scripts/ # Scripts to reproduce
30+
├── scripts/ # Scripts to reproduce AE
31+
└── results/ # Execution log files
3132
```
3233

3334
## Build
3435

35-
TBD
36+
### Dependencies
37+
- Python 3.8
38+
- UCX
39+
- ompi
40+
- CUDA 12.4
41+
- cuDNN 8.5.0
42+
- nv_peer_mem
43+
44+
Setting these up is detailed in [PREREQUISITES](./Prerequisites.md).
45+
46+
### Clone project
47+
```bash
48+
git clone --recurse-submodules https://github.com/mcrl/spipe.git
49+
```
50+
51+
### Environment variables
52+
Set environment variables in bash profile:
53+
```bash
54+
export SPIPE_ROOT=/path/to/spipe
55+
export CUDA_ROOT=/path/to/cuda
56+
export UCX_ROOT=/path/to/ucx/installation
57+
export MPI_ROOT=/path/to/ompi/installation
58+
export SPIPE_CONDA=<conda_env>
59+
60+
export PATH="$CUDA_ROOT/bin:$MPI_ROOT/bin:$UCX_ROOT/bin:$PATH"
61+
export LD_LIBRARY_PATH="$CUDA_ROOT/lib64:$MPI_ROOT/lib:$UCX_ROOT/lib:$LD_LIBRARY_PATH"
62+
```
63+
64+
### Essential installation
65+
```bash
66+
# conda
67+
conda create -n $SPIPE_CONDA python=3.8
68+
conda activate $SPIPE_CONDA
69+
70+
# PyTorch 12.4
71+
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
72+
73+
# Apex 741bdf5
74+
git clone https://github.com/NVIDIA/apex && cd apex && git checkout 741bdf5
75+
CUDA_HOME=$CUDA_ROOT TORCH_CUDA_ARCH_LIST="<cuda;arch;list>" pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
76+
pip install -r requirements.txt
77+
78+
# mpi4py
79+
MPICC=/path/to/mpicc python -m pip install mpi4py --no-cache
80+
81+
# submodule DeepSpeed (checkout 5f631ab & cherry-pick a4cd550)
82+
cd $SPIPE_ROOT/csrc/external/DeepSpeed && TORCH_CUDA_ARCH_LIST="<cuda;arch;list>" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j16" --no-cache -v --disable-pip-version-check
83+
84+
# misc
85+
pip install cmake ninja regex pillow pybind11 pyyaml typing-extensions six psutil nvtx py-cpuinfo einops transformers
86+
87+
# spipe_helper
88+
cd $SPIPE_ROOT/csrc/spipe_helper && CUDA_BUILD_DIR=$CUDA_ROOT MPI_BUILD_DIR=$MPI_ROOT pip install .
89+
```
3690

3791
## Examples
3892

39-
We provide working examples for running SPipe in the [`examples/`](/examples) directory.
93+
TBD
4094

4195
## Artifact Evaluation
4296

0 commit comments

Comments
 (0)