Skip to content

vLLM CI tests#251

Merged
sreeram-11 merged 4 commits intomainfrom
sreeram/vLLM-tests
May 4, 2026
Merged

vLLM CI tests#251
sreeram-11 merged 4 commits intomainfrom
sreeram/vLLM-tests

Conversation

@sreeram-11
Copy link
Copy Markdown
Collaborator

@sreeram-11 sreeram-11 commented May 4, 2026

What the CI tests validate

  1. Required asset scripts exist:
    • curl_script.sh
    • chat_with_model.py
  2. chat_with_model.py has valid Python syntax.
  3. Python virtual environment can be created and activated.
  4. PyTorch 2.9.1 built for ROCm 7.12.0 can be installed.
  5. vLLM can be installed from AMD’s prebuilt ROCm wheel index.
  6. Required ROCm-related environment variables can be set:
    • PYTHONPATH
    • FLASH_ATTENTION_TRITON_AMD_ENABLE
  7. vLLM, PyTorch, and flash-attn can be imported successfully.
  8. PyTorch detects HIP/GPU availability.
  9. vLLM server can start on 127.0.0.1:8000 with Qwen/Qwen3-1.7B
  10. vLLM server responds to the /health endpoint.
  11. vLLM server responds to the OpenAI-compatible /v1/models endpoint.
  12. Chat completion works through the direct README-style curl -X POST command.
  13. Chat completion works through curl_script.sh.
  14. Chat completion works through the OpenAI Python API using the local vLLM server.
  15. vLLM server is stopped/cleaned up after the smoke test.

User Actions Required

  1. Make sure Python 3.12 is available on the Linux runner.
  2. Make sure the runner has internet access to:
    • install Python packages from the AMD ROCm wheel indexes.
    • download Qwen/Qwen3-1.7B from Hugging Face, or has the model already cached.

Info worth noting

  1. The Qwen/Qwen3-1.7B model does not need to be manually downloaded before running the tests. vllm serve Qwen/Qwen3-1.7B downloads the model automatically on first run if it is not already cached.

  2. Updated test-playbooks.yml to use Python 3.12 for the vLLM playbook.

  3. System/global ROCm on runner: 7.2.0
    Python venv ROCm/PyTorch/vLLM packages: ROCm 7.12.0-based wheels

    • So the runner can have the global ROCm stack at 7.2.0, while .venv contains ROCm 7.12 Python packages.
    • The important condition is that the system driver must be compatible enough with the user-space ROCm packages we are using. In this case, that condition is satisfied.
    • Installing PyTorch/vLLM from the ROCm 7.12 wheel indexes installs ROCm 7.12-compatible Python/user-space packages into the virtual environment, but it does not install or upgrade the global system ROCm driver stack.
    • No reboot is required because pip is only changing the venv, not the system driver.

@sreeram-11 sreeram-11 requested a review from danielholanda May 4, 2026 16:30
@sreeram-11 sreeram-11 marked this pull request as draft May 4, 2026 16:37
@sreeram-11 sreeram-11 marked this pull request as ready for review May 4, 2026 18:09
Copy link
Copy Markdown
Collaborator

@danielholanda danielholanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the most thorough tests we added so far. Thank you for the great work here!

@sreeram-11 sreeram-11 merged commit ddf25eb into main May 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants