This guide explains how to run live tests and generate recordings for supported providers.
- vllm - VLLM inference server (default, requires local container)
- vertex - Google Cloud Vertex AI
The script auto-detects the provider from environment variables. If no provider is specified, it defaults to vllm.
- Podman - For building and running containers
- gcloud CLI - For GCP authentication (Vertex AI only)
# Start the vllm container
podman run -d \
--name vllm \
--privileged=true \
--net=host \
quay.io/higginsd/vllm-cpu:65393ee064-qwen3 \
--host 0.0.0.0 \
--port 8000 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--model /root/.cache/Qwen3-0.6B \
--served-model-name Qwen/Qwen3-0.6B \
--max-model-len 8192
# Wait for vllm to be ready
timeout 900 bash -c 'until curl -fsS http://localhost:8000/health >/dev/null; do
echo "Waiting for vllm..."
sleep 5
done'
# Run the script
./scripts/run-live-tests-local.sh
# Clean up
podman rm -f vllm# Set GCP project and authenticate
export VERTEX_AI_PROJECT=your-gcp-project-id
gcloud auth application-default login
# Run the script
./scripts/run-live-tests-local.shThe script will:
- Build the container image
- Start the Llama Stack container with the selected provider
- Run integration tests in live mode
- Extract recordings to
tests/integration/recordings/ - Clean up the container
Note: The script automatically creates a podman secret from your GCP credentials if needed.
The workflow .github/workflows/live-tests.yml automatically runs live tests and updates recordings.
- Scheduled: Runs every Monday at 2 AM UTC
- Manual:
gh workflow run live-tests.ymlor via GitHub UI
VERTEX_AI_PROJECT- GCP project IDGCP_WORKLOAD_IDENTITY_PROVIDER- For OIDC authentication
The workflow uses us-central1 as the GCP region.
- Authenticates to GCP using OIDC (Workload Identity Federation)
- Builds and starts the Llama Stack container with Vertex AI provider
- Runs integration tests in
livemode to generate recordings - Creates a PR to
llamastack/llama-stackwith updated recordings (if tests pass)
Recordings are generated during live tests and saved to tests/integration/recordings/. Look for files with "vertex"/"vertexai" (Vertex AI) or "vllm"/"vllm-inference" (VLLM) in their names or paths.
- Local: Recordings are automatically extracted to
tests/integration/recordings/after running the script - CI: Recordings are pushed to
llamastack/llama-stackvia PR
To include recordings in your PR, commit and push the changes from tests/integration/recordings/.
To add a new provider:
- Update
scripts/run-live-tests-local.sh- Add provider detection, environment variables, and recording patterns - Update
tests/run_integration_tests.sh- Add provider detection and model selection - Update
.github/workflows/live-tests.yml- Add authentication and container setup if needed - Update
distribution/run.yaml- Ensure provider configuration is present