Add Intel nightly tests for XPU and CPU platforms by MingxuZh · Pull Request #22677 · sgl-project/sglang

MingxuZh · 2026-04-13T08:15:58Z

Summary

This PR adds comprehensive nightly testing infrastructure for Intel platforms (XPU and CPU) and improves the existing XPU CI configuration.

Changes

1. Nightly Test Workflow (`.github/workflows/nightly-test-intel.yml`)

Schedule: Runs daily at 23:00 Beijing time (15:00 UTC)
Two parallel jobs:
- nightly-test-xpu: Runs on sglang-bmg runner with Intel Arc B580 GPUs
- nightly-test-cpu: Runs on xeon-gnr runner with Intel Xeon CPU
Docker configuration: Fixed multi-GPU support with proper /dev/dri and /dev/dri/by-path volume mounts for oneCCL communication
Permissions: Added render group (GID 992) for GPU device access

2. PR Test Workflow Updates (`.github/workflows/pr-test-xpu.yml`)

Applied the same Docker mount fixes for consistent behavior between PR and nightly tests

3. New XPU Test Files

test_llama_tp.py: Llama 3.2 3B model with TP=2 for multi-GPU testing (nightly only)
test_deepseek_ocr.py: Added --mem-fraction-static 0.7 to prevent OOM
test_deepseek_ocr_triton.py: Adjusted est_time=400 for proper test ordering

Test Configuration

Test	Model	TP	Suite	Notes
DeepSeek-OCR	deepseek-ai/DeepSeek-OCR	1	per-commit, nightly	With triton backend
Llama TP=2	meta-llama/Llama-3.2-3B-Instruct	2	nightly	Multi-GPU validation

Hardware Requirements

XPU: Intel Arc B580 (4x 12GB) on sglang-bmg runner
CPU: Intel Xeon on xeon-gnr runner

gemini-code-assist

Code Review

This pull request introduces XPU (Intel GPU) support to the CI registration system and adds several XPU-specific tests, including a new multi-GPU tensor parallelism test for Llama 3.2. Key changes include the definition of register_xpu_ci, updates to the test suite runner to recognize XPU backends, and memory limit adjustments for DeepSeek OCR tests. Feedback focuses on improving test robustness by checking HTTP response statuses in the new Llama test and enabling strict suite validation for the XPU platform in the test runner.

gemini-code-assist · 2026-04-13T08:17:30Z

test/run_suite.py

+    HWBackend.XPU: [
+        "per-commit-xpu",
+    ],


While adding HWBackend.XPU to the suite mappings is correct, it should also be added to the _SUITE_CHECKED_BACKENDS set (around line 136) to enable strict suite validation for the XPU platform. Currently, validation is skipped for XPU tests, which could lead to incorrectly registered tests going unnoticed during CI runs.

gemini-code-assist · 2026-04-13T08:17:30Z

test/registered/xpu/test_llama_tp.py

+        )
+        ret = response.json()


It is recommended to call response.raise_for_status() before attempting to parse the JSON response. This ensures that if the server returns an error (e.g., 500 Internal Server Error), the test fails with a clear HTTP error message rather than a potentially confusing KeyError or JSONDecodeError later.

Suggested change

)

ret = response.json()

)

response.raise_for_status()

ret = response.json()

gemini-code-assist · 2026-04-13T08:17:30Z

test/registered/xpu/test_llama_tp.py

+        )
+        ret = response.json()


Similar to the /generate endpoint, adding response.raise_for_status() here will improve the robustness of the test by providing immediate feedback if the chat completion request fails.

Suggested change

)

ret = response.json()

)

response.raise_for_status()

ret = response.json()

MingxuZh and others added 13 commits April 12, 2026 19:15

xpu nightly enable

425269e

modiry runner label

352c242

fix: queue jobs instead of cancelling

678f796

fix: use correct repo and branch for Docker build

3d93d36

add kernel ut & bench

77aeab8

Merge branch 'sgl-project:main' into main

c118b77

add tp=2 test case

537fb60

bug fix

ef977b6

lint check

1c6a2ce

Merge branch 'sgl-project:main' into main

5c0a22b

lint check

310a2bc

Add Intel nightly tests and improve XPU CI

51f0cff

Merge branch 'sgl-project:main' into main

3fbdc8c

MingxuZh requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners April 13, 2026 08:15

github-actions bot added the deepseek label Apr 13, 2026

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

MingxuZh added 2 commits April 13, 2026 17:00

debug: /dev/dri

ceabffe

Merge branch 'main' of https://github.com/MingxuZh/sglang

e26c72c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Intel nightly tests for XPU and CPU platforms#22677

Add Intel nightly tests for XPU and CPU platforms#22677
MingxuZh wants to merge 15 commits intosgl-project:mainfrom
MingxuZh:main

MingxuZh commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MingxuZh commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Nightly Test Workflow (.github/workflows/nightly-test-intel.yml)

2. PR Test Workflow Updates (.github/workflows/pr-test-xpu.yml)

3. New XPU Test Files

Test Configuration

Hardware Requirements

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MingxuZh commented Apr 13, 2026 •

edited

Loading

1. Nightly Test Workflow (`.github/workflows/nightly-test-intel.yml`)

2. PR Test Workflow Updates (`.github/workflows/pr-test-xpu.yml`)