Skip to content

add xpu parameters to install.sh #1307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions doc/en/balance-serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,8 @@ git submodule update --init --recursive

# Install single NUMA dependencies
USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# For those who have two cpu and 1T RAM(Dual NUMA):
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
```

## Running DeepSeek-R1-Q4KM Models
Expand Down
2 changes: 0 additions & 2 deletions doc/en/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,13 +117,11 @@ Download source code and compile:

```shell
USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
```
- For Multi-concurrency with two cpu and 1T RAM:

```shell
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
```
- For Windows (Windows native temporarily deprecated, please try WSL)

Expand Down
2 changes: 0 additions & 2 deletions doc/en/llama4.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,8 @@ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.o
```bash
# Install single NUMA dependencies
USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# For those who have two cpu and 1T RAM(Dual NUMA):
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
```

### 4. Use our custom config.json
Expand Down
4 changes: 1 addition & 3 deletions doc/en/xpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,7 @@ cd ktransformers
git submodule update --init

# Install dependencies
bash install.sh
pip uninstall triton pytorch-triton-xpu
pip install pytorch-triton-xpu==3.3.0 --extra-index-url https://download.pytorch.org/whl/xpu # to avoid potential triton import error
bash install.sh --dev xpu
```

## Running DeepSeek-R1 Models
Expand Down
2 changes: 0 additions & 2 deletions doc/zh/DeepseekR1_V3_tutorial_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,10 +127,8 @@ cd ktransformers
git submodule update --init --recursive
# 如果使用双 numa 版本
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# 如果使用单 numa 版本
USE_BALANCE_SERVE=1 bash ./install.sh
pip install third_party/custom_flashinfer/
# 启动命令
python ktransformers/server/main.py --model_path <your model path> --gguf_path <your gguf path> --cpu_infer 62 --optimize_config_path <inject rule path> --port 10002 --chunk_size 256 --max_new_tokens 1024 --max_batch_size 4 --port 10002 --cache_lens 32768 --backend_type balance_serve
```
Expand Down
22 changes: 20 additions & 2 deletions install.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
#!/bin/bash
set -e

# default backend
DEV="cuda"

# parse --dev argument
while [[ "$#" -gt 0 ]]; do
case $1 in
--dev) DEV="$2"; shift ;;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done
export DEV_BACKEND="$DEV"
echo "Selected backend: $DEV_BACKEND"

# clear build dirs
rm -rf build
rm -rf *.egg-info
Expand All @@ -13,13 +27,17 @@ rm -rf ~/.ktransformers
echo "Installing python dependencies from requirements.txt"
pip install -r requirements-local_chat.txt
pip install -r ktransformers/server/requirements.txt

echo "Installing ktransformers"
KTRANSFORMERS_FORCE_BUILD=TRUE pip install -v . --no-build-isolation

if [[ "$DEV_BACKEND" == "cuda" ]]; then
echo "Installing custom_flashinfer for CUDA backend"
pip install third_party/custom_flashinfer/
fi
# SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
# echo "Copying thirdparty libs to $SITE_PACKAGES"
# cp -a csrc/balance_serve/build/third_party/prometheus-cpp/lib/libprometheus-cpp-*.so* $SITE_PACKAGES/
# patchelf --set-rpath '$ORIGIN' $SITE_PACKAGES/sched_ext.cpython*


echo "Installation completed successfully"
echo "Installation completed successfully"
1 change: 0 additions & 1 deletion requirements-local_chat.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,3 @@ cpufeature; sys_platform == 'win32' or sys_platform == 'Windows'
protobuf
tiktoken
blobfile
triton>=3.2
10 changes: 10 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@
MUSA_HOME=None
KTRANSFORMERS_BUILD_XPU = torch.xpu.is_available()

# 检测 DEV_BACKEND 环境变量
dev_backend = os.environ.get("DEV_BACKEND", "").lower()
if dev_backend == "xpu":
triton_dep = [
"pytorch-triton-xpu==3.3.0"
]
else:
triton_dep = ["triton>=3.2"]

with_balance = os.environ.get("USE_BALANCE_SERVE", "0") == "1"

class CpuInstructInfo:
Expand Down Expand Up @@ -659,6 +668,7 @@ def build_extension(self, ext) -> None:
setup(
name=VersionInfo.PACKAGE_NAME,
version=VersionInfo().get_package_version(),
install_requires=triton_dep,
cmdclass={"bdist_wheel":BuildWheelsCommand ,"build_ext": CMakeBuild},
ext_modules=ext_modules
)