[中文版|English]
💫 Moore Threads(摩尔线程)mthreads
- Based on Triton 3.1, x64/aarch64
- Available for S4000/S5000
If you use this preinstalled image, you do not need to perform the later step 1.x. If your network connection is available, you also do not need to perform the later step 1.x, because dependencies will be fetched automatically during the build.
IMAGE=flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04:202603
# Plan A: docker pull (55.3GB)
docker pull harbor.baai.ac.cn/flagtree/${IMAGE}
# Plan B: docker load (18GB)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gz
docker load -i flagtree-mthreads-py310-torch2.7.1-musa4.3.5-ubuntu22.04.202603.tar.gzCONTAINER=flagtree-dev-xxx
docker run -dit \
--network=host --pid=host --privileged \
--cap-add=SYS_PTRACE \
--shm-size 16gb \
--security-opt seccomp=unconfined \
-e MTHREADS_VISIBLE_DEVICES=all -e MTHREADS_DRIVER_CAPABILITIES=all \
-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu \
-v /lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu \
-v /etc/alternatives:/etc/alternatives \
-v /etc/localtime:/etc/localtime:ro \
-v /data:/data -v /home:/home -v /tmp:/tmp \
-w /root --name ${CONTAINER} ${IMAGE} bash
docker exec -it ${CONTAINER} /bin/bashmkdir -p ~/.flagtree/mthreads; cd ~/.flagtree/mthreads
# x64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-x64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-x64_v0.4.1.tar.gz
# aarch64
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz
tar zxvf mthreads-llvm19-glibc2.35-glibcxx3.4.30-aarch64_v0.4.0.tar.gz \
-C ./mthreads-llvm19-glibc2.35-glibcxx3.4.30 --strip-components=1
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gz
tar zxvf mthreadsTritonPlugin-cpython3.10-glibc2.35-glibcxx3.4.30-cxxabi1.3.13-ubuntu-aarch64_v0.4.0.tar.gzcd ${YOUR_CODE_DIR}/FlagTree
# For Triton 3.1 (x64)
wget https://baai-cp-web.ks3-cn-beijing.ksyuncs.com/trans/build-deps-triton_3.1.x-linux-x64.tar.gz
sh python/scripts/unpack_triton_build_deps.sh ./build-deps-triton_3.1.x-linux-x64.tar.gzAfter executing the above script, the original ~/.triton directory will be renamed, and a new ~/.triton directory will be created to store the pre-downloaded packages. Note that the script will prompt for manual confirmation during execution.
# Note: First install PyTorch, then execute the following commands
python3 -m pip uninstall -y triton # Repeat the cmd until fully uninstalled
RES="--index-url=https://resource.flagos.net/repository/flagos-pypi-hosted/simple"
python3 -m pip install flagtree===0.5.1+mthreads3.1 $RESflagtree is already installed in the preinstalled image. You can check it with:
python3 -m pip show flagtreeapt update; apt install zlib1g zlib1g-dev libxml2 libxml2-dev
cd ${YOUR_CODE_DIR}/FlagTree/python
python3 -m pip install -r requirements.txt
export FLAGTREE_BACKEND=mthreads
MAX_JOBS=32 python3 -m pip install . --no-build-isolation -vRefer to Tests of mthreads backend
For triton kernels that use tl.dot, setting the environment variable export MUSA_ENABLE_SQMMA=1 can improve performance.