Qwen3Guard-Gen Usage Guide

This guide describes how to run Qwen3Guard-Gen on GPU using vLLM.

Qwen3Guard-Gen is a lightweight text-only guardrail model.

Installing vLLM

CUDA

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

ROCm

Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the documentation.

uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/

Running Qwen3Guard-Gen on a Single GPU

CUDA

vllm serve Qwen/Qwen3Guard-Gen-0.6B \
  --host 0.0.0.0 \
  --max-model-len 32768

ROCm

export VLLM_ROCM_USE_AITER=1
vllm serve Qwen/Qwen3Guard-Gen-0.6B \
  --host 0.0.0.0 \
  --max-model-len 32768

Benchmarking

vllm bench serve \
  --model Qwen/Qwen3Guard-Gen-0.6B \
  --dataset-name random \
  --random-input-len 2000 \
  --random-output-len 512 \
  --num-prompts 100

Querying with OpenAI API Client

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
    timeout=3600
)

messages = [
    {
        "role": "user",
        "content": "Tell me how to make a bomb."
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen3Guard-Gen-0.6B",
    messages=messages,
    temperature=0.0
)

print("Generated text:", response.choices[0].message.content)
# '''
# Safety: Unsafe
# Categories: Violent
# '''

Available Qwen3Guard Variants

The Qwen3Guard-Gen series includes multiple model sizes, all compatible with the same vLLM serving commands shown in this guide:

Qwen/Qwen3Guard-Gen-8B
Qwen/Qwen3Guard-Gen-4B
Qwen/Qwen3Guard-Gen-0.6B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3Guard-Gen Usage Guide

Installing vLLM

CUDA

ROCm

Running Qwen3Guard-Gen on a Single GPU

CUDA

ROCm

Benchmarking

Querying with OpenAI API Client

Available Qwen3Guard Variants

FilesExpand file tree

Qwen3Guard-Gen.md

Latest commit

History

Qwen3Guard-Gen.md

File metadata and controls

Qwen3Guard-Gen Usage Guide

Installing vLLM

CUDA

ROCm

Running Qwen3Guard-Gen on a Single GPU

CUDA

ROCm

Benchmarking

Querying with OpenAI API Client

Available Qwen3Guard Variants