-
Notifications
You must be signed in to change notification settings - Fork 25
docs: Add README.md guide with examples for doc test CI test scripts. #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ganeshku1
wants to merge
2
commits into
main
Choose a base branch
from
docs/add-testing-guide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,296 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
| --> | ||
|
|
||
| # Adding New End-to-End Tests for Documentation Examples | ||
|
|
||
| This guide explains how to add new end-to-end tests for server examples in the AIPerf documentation. | ||
|
|
||
| ## Overview | ||
|
|
||
| The end-to-end test framework automatically discovers and tests server examples from markdown documentation files. It: | ||
| 1. Parses markdown files for specially tagged bash commands | ||
| 2. Builds an AIPerf Docker container | ||
| 3. For each discovered server: | ||
| - Runs the server setup command | ||
| - Waits for the server to become healthy | ||
| - Executes AIPerf benchmark commands | ||
| - Validates results and cleans up | ||
|
|
||
| ## How Tests Are Discovered | ||
|
|
||
| The test parser (`parser.py`) scans all markdown files (`*.md`) in the repository and looks for HTML comment tags with specific patterns: | ||
|
|
||
| - **Setup commands**: `<!-- setup-{server-name}-endpoint-server -->` | ||
| - **Health checks**: `<!-- health-check-{server-name}-endpoint-server -->` | ||
| - **AIPerf commands**: `<!-- aiperf-run-{server-name}-endpoint-server -->` | ||
|
|
||
| Each tag must be followed by a bash code block (` ```bash ... ``` `) containing the actual command. | ||
|
|
||
| ## Adding a New Server Test | ||
|
|
||
| To add tests for a new server, you need to add three types of tagged commands to your documentation: | ||
|
|
||
| ### 1. Server Setup Command | ||
|
|
||
| Tag the bash command that starts your server: | ||
|
|
||
| <!-- setup-myserver-endpoint-server --> | ||
| ```bash | ||
| # Start your server | ||
| docker run --gpus all -p 8000:8000 myserver/image:latest \ | ||
| --model my-model \ | ||
| --host 0.0.0.0 --port 8000 | ||
| ``` | ||
| <!-- /setup-myserver-endpoint-server --> | ||
|
|
||
| **Important notes:** | ||
| - The server name (`myserver` in this example) must be consistent across all three tag types | ||
| - The setup command runs in the background | ||
| - The command should start a long-running server process | ||
| - Use port 8000 or ensure your health check targets the correct port | ||
|
|
||
| ### 2. Health Check Command | ||
|
|
||
| Tag a bash command that waits for your server to be ready: | ||
|
|
||
| <!-- health-check-myserver-endpoint-server --> | ||
| ```bash | ||
| timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/health -H "Content-Type: application/json")" != "200" ]; do sleep 2; done' || { echo "Server not ready after 15min"; exit 1; } | ||
| ``` | ||
| <!-- /health-check-myserver-endpoint-server --> | ||
|
|
||
| **Important notes:** | ||
| - The health check should poll the server until it responds successfully | ||
| - Use a reasonable timeout (e.g., 900 seconds = 15 minutes) | ||
| - The command must exit with code 0 when the server is healthy | ||
| - The command must exit with non-zero code if the server fails to start | ||
|
|
||
| ### 3. AIPerf Run Commands | ||
|
|
||
| Tag one or more AIPerf benchmark commands: | ||
|
|
||
| <!-- aiperf-run-myserver-endpoint-server --> | ||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --endpoint-type chat \ | ||
| --endpoint /v1/chat/completions \ | ||
| --service-kind openai \ | ||
| --streaming \ | ||
| --num-prompts 10 \ | ||
| --max-tokens 100 | ||
| ``` | ||
| <!-- /aiperf-run-myserver-endpoint-server --> | ||
|
|
||
| You can have multiple `aiperf-run` commands for the same server. Each will be executed sequentially against the same running server instance (the server is NOT restarted between commands): | ||
|
|
||
| <!-- aiperf-run-myserver-endpoint-server --> | ||
| ```bash | ||
| # First test: streaming mode | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --endpoint-type chat \ | ||
| --endpoint /v1/chat/completions \ | ||
| --service-kind openai \ | ||
| --streaming \ | ||
| --num-prompts 10 | ||
| ``` | ||
| <!-- /aiperf-run-myserver-endpoint-server --> | ||
|
|
||
| <!-- aiperf-run-myserver-endpoint-server --> | ||
| ```bash | ||
| # Second test: non-streaming mode | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --endpoint-type chat \ | ||
| --endpoint /v1/chat/completions \ | ||
| --service-kind openai \ | ||
| --num-prompts 10 | ||
| ``` | ||
| <!-- /aiperf-run-myserver-endpoint-server --> | ||
|
|
||
| **Important notes:** | ||
| - Do NOT include `--ui-type` flag - the test framework adds `--ui-type simple` automatically | ||
| - Each command is executed inside the AIPerf Docker container | ||
| - Commands should complete in a reasonable time (default timeout: 300 seconds) | ||
| - Use small values for `--num-prompts` and `--max-tokens` to keep tests fast | ||
| - The server is NOT restarted between multiple aiperf commands - all commands run against the same server instance | ||
|
|
||
| ## Complete Example | ||
|
|
||
| Here's a complete example for a new server called "fastapi": | ||
|
|
||
| ### Running FastAPI Server | ||
|
|
||
| Start the FastAPI server: | ||
|
|
||
| <!-- setup-fastapi-endpoint-server --> | ||
| ```bash | ||
| docker run --gpus all -p 8000:8000 mycompany/fastapi-llm:latest \ | ||
| --model-name meta-llama/Llama-3.2-1B \ | ||
| --host 0.0.0.0 \ | ||
| --port 8000 | ||
| ``` | ||
| <!-- /setup-fastapi-endpoint-server --> | ||
|
|
||
| Wait for the server to be ready: | ||
|
|
||
| <!-- health-check-fastapi-endpoint-server --> | ||
| ```bash | ||
| timeout 600 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done' || { echo "FastAPI server not ready after 10min"; exit 1; } | ||
| ``` | ||
| <!-- /health-check-fastapi-endpoint-server --> | ||
|
|
||
| Profile the model: | ||
|
|
||
| <!-- aiperf-run-fastapi-endpoint-server --> | ||
| ```bash | ||
| aiperf profile \ | ||
| --model meta-llama/Llama-3.2-1B \ | ||
| --endpoint-type chat \ | ||
| --endpoint /v1/chat/completions \ | ||
| --service-kind openai \ | ||
| --streaming \ | ||
| --num-prompts 20 \ | ||
| --max-tokens 50 | ||
| ``` | ||
| <!-- /aiperf-run-fastapi-endpoint-server --> | ||
|
|
||
| ## Running the Tests | ||
|
|
||
| ### Run all discovered tests: | ||
|
|
||
| ```bash | ||
| cd tests/ci/test_docs_end_to_end | ||
| python main.py | ||
| ``` | ||
|
|
||
| ### Dry run to see what would be tested: | ||
|
|
||
| ```bash | ||
| python main.py --dry-run | ||
| ``` | ||
|
|
||
| ### Test specific servers: | ||
|
|
||
| Currently, the framework tests the first discovered server by default. Use `--all-servers` to test all: | ||
|
|
||
| ```bash | ||
| python main.py --all-servers | ||
| ``` | ||
|
|
||
| ## Validation Rules | ||
|
|
||
| The test framework validates that each server has: | ||
| - Exactly ONE setup command (duplicates cause test failure) | ||
| - Exactly ONE health check command (duplicates cause test failure) | ||
| - At least ONE aiperf command | ||
|
|
||
| If any of these requirements are not met, the tests will fail with a clear error message. | ||
|
|
||
| ## Test Execution Flow | ||
|
|
||
| For each server, the test runner: | ||
|
|
||
| 1. **Build Phase**: Builds the AIPerf Docker container (once for all tests) | ||
| 2. **Setup Phase**: Starts the server in the background | ||
| 3. **Health Check Phase**: Waits for server to be ready (runs in parallel with setup) | ||
| 4. **Test Phase**: Executes all AIPerf commands sequentially against the same running server instance | ||
| 5. **Cleanup Phase**: Gracefully shuts down the server and cleans up Docker resources | ||
|
|
||
| **Note**: The server remains running throughout all AIPerf commands. It is only shut down once during the cleanup phase after all tests complete. | ||
|
|
||
| ## Common Patterns | ||
|
|
||
| ### Pattern: OpenAI-compatible API | ||
|
|
||
| <!-- setup-myserver-endpoint-server --> | ||
| ```bash | ||
| docker run --gpus all -p 8000:8000 myserver:latest \ | ||
| --model model-name \ | ||
| --host 0.0.0.0 --port 8000 | ||
| ``` | ||
| <!-- /setup-myserver-endpoint-server --> | ||
|
|
||
| <!-- health-check-myserver-endpoint-server --> | ||
| ```bash | ||
| timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"model-name\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "Server not ready"; exit 1; } | ||
| ``` | ||
| <!-- /health-check-myserver-endpoint-server --> | ||
|
|
||
| <!-- aiperf-run-myserver-endpoint-server --> | ||
| ```bash | ||
| aiperf profile \ | ||
| --model model-name \ | ||
| --endpoint-type chat \ | ||
| --endpoint /v1/chat/completions \ | ||
| --service-kind openai \ | ||
| --streaming \ | ||
| --num-prompts 10 \ | ||
| --max-tokens 100 | ||
| ``` | ||
| <!-- /aiperf-run-myserver-endpoint-server --> | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Tests not discovered | ||
|
|
||
| - Verify tag format: `setup-{name}-endpoint-server`, `health-check-{name}-endpoint-server`, `aiperf-run-{name}-endpoint-server` | ||
| - Ensure bash code block immediately follows the tag | ||
| - Check that the server name is consistent across all three tag types | ||
| - Run `python main.py --dry-run` to see what's discovered | ||
|
|
||
| ### Health check timeout | ||
|
|
||
| - Increase the timeout value in your health check command | ||
| - Verify the health check endpoint is correct | ||
| - Check server logs: the test runner shows setup output for 30 seconds | ||
| - Ensure your server starts on the expected port | ||
|
|
||
| ### AIPerf command fails | ||
|
|
||
| - Test your AIPerf command manually first | ||
| - Use small values for `--num-prompts` and `--max-tokens` | ||
| - Verify the model name matches what the server expects | ||
| - Check that the endpoint URL is correct | ||
|
|
||
| ### Duplicate command errors | ||
|
|
||
| If you see errors like "DUPLICATE SETUP COMMAND", you have multiple commands with the same server name: | ||
| - Search your docs for all instances of that tag | ||
| - Ensure each server has a unique name | ||
| - Or remove duplicate tags if they're truly duplicates | ||
|
|
||
| ## Best Practices | ||
|
|
||
| 1. **Keep tests fast**: Use minimal `--num-prompts` (10-20) and small `--max-tokens` values | ||
| 2. **Use standard ports**: Default to 8000 for consistency | ||
| 3. **Add timeouts**: Always include timeouts in health checks | ||
| 4. **Test locally first**: Run commands manually before adding tags | ||
| 5. **One server per doc section**: Avoid mixing multiple servers in the same doc section | ||
| 6. **Clear error messages**: Include helpful error messages in health checks | ||
| 7. **Document requirements**: Note any GPU, memory, or dependency requirements in surrounding text | ||
|
|
||
| ## Architecture Reference | ||
|
|
||
| Key files in the test framework: | ||
|
|
||
| - `main.py`: Entry point, orchestrates parsing and testing | ||
| - `parser.py`: Markdown parser that discovers tagged commands | ||
| - `test_runner.py`: Executes tests for each server | ||
| - `constants.py`: Configuration constants (timeouts, tag patterns) | ||
| - `data_types.py`: Data models for commands and servers | ||
| - `utils.py`: Utility functions for Docker operations | ||
|
|
||
| ## Constants and Configuration | ||
|
|
||
| Key constants in `constants.py`: | ||
|
|
||
| - `SETUP_MONITOR_TIMEOUT`: 30 seconds (how long to monitor setup output) | ||
| - `CONTAINER_BUILD_TIMEOUT`: 600 seconds (Docker build timeout) | ||
| - `AIPERF_COMMAND_TIMEOUT`: 300 seconds (per-command timeout) | ||
| - `AIPERF_UI_TYPE`: "simple" (auto-added to all aiperf commands) | ||
|
|
||
| To modify these, edit `constants.py`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note that the server is not taken down in between aiperf profile commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is a good note, i missed that, will add that to the readme.