diff --git a/tests/ci/test_docs_end_to_end/DOC_CI_TEST_README.md b/tests/ci/test_docs_end_to_end/DOC_CI_TEST_README.md new file mode 100644 index 000000000..73bf166f4 --- /dev/null +++ b/tests/ci/test_docs_end_to_end/DOC_CI_TEST_README.md @@ -0,0 +1,296 @@ + + +# Adding New End-to-End Tests for Documentation Examples + +This guide explains how to add new end-to-end tests for server examples in the AIPerf documentation. + +## Overview + +The end-to-end test framework automatically discovers and tests server examples from markdown documentation files. It: +1. Parses markdown files for specially tagged bash commands +2. Builds an AIPerf Docker container +3. For each discovered server: + - Runs the server setup command + - Waits for the server to become healthy + - Executes AIPerf benchmark commands + - Validates results and cleans up + +## How Tests Are Discovered + +The test parser (`parser.py`) scans all markdown files (`*.md`) in the repository and looks for HTML comment tags with specific patterns: + +- **Setup commands**: `` +- **Health checks**: `` +- **AIPerf commands**: `` + +Each tag must be followed by a bash code block (` ```bash ... ``` `) containing the actual command. + +## Adding a New Server Test + +To add tests for a new server, you need to add three types of tagged commands to your documentation: + +### 1. Server Setup Command + +Tag the bash command that starts your server: + + +```bash +# Start your server +docker run --gpus all -p 8000:8000 myserver/image:latest \ + --model my-model \ + --host 0.0.0.0 --port 8000 +``` + + +**Important notes:** +- The server name (`myserver` in this example) must be consistent across all three tag types +- The setup command runs in the background +- The command should start a long-running server process +- Use port 8000 or ensure your health check targets the correct port + +### 2. Health Check Command + +Tag a bash command that waits for your server to be ready: + + +```bash +timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/health -H "Content-Type: application/json")" != "200" ]; do sleep 2; done' || { echo "Server not ready after 15min"; exit 1; } +``` + + +**Important notes:** +- The health check should poll the server until it responds successfully +- Use a reasonable timeout (e.g., 900 seconds = 15 minutes) +- The command must exit with code 0 when the server is healthy +- The command must exit with non-zero code if the server fails to start + +### 3. AIPerf Run Commands + +Tag one or more AIPerf benchmark commands: + + +```bash +aiperf profile \ + --model my-model \ + --endpoint-type chat \ + --endpoint /v1/chat/completions \ + --service-kind openai \ + --streaming \ + --num-prompts 10 \ + --max-tokens 100 +``` + + +You can have multiple `aiperf-run` commands for the same server. Each will be executed sequentially against the same running server instance (the server is NOT restarted between commands): + + +```bash +# First test: streaming mode +aiperf profile \ + --model my-model \ + --endpoint-type chat \ + --endpoint /v1/chat/completions \ + --service-kind openai \ + --streaming \ + --num-prompts 10 +``` + + + +```bash +# Second test: non-streaming mode +aiperf profile \ + --model my-model \ + --endpoint-type chat \ + --endpoint /v1/chat/completions \ + --service-kind openai \ + --num-prompts 10 +``` + + +**Important notes:** +- Do NOT include `--ui-type` flag - the test framework adds `--ui-type simple` automatically +- Each command is executed inside the AIPerf Docker container +- Commands should complete in a reasonable time (default timeout: 300 seconds) +- Use small values for `--num-prompts` and `--max-tokens` to keep tests fast +- The server is NOT restarted between multiple aiperf commands - all commands run against the same server instance + +## Complete Example + +Here's a complete example for a new server called "fastapi": + +### Running FastAPI Server + +Start the FastAPI server: + + +```bash +docker run --gpus all -p 8000:8000 mycompany/fastapi-llm:latest \ + --model-name meta-llama/Llama-3.2-1B \ + --host 0.0.0.0 \ + --port 8000 +``` + + +Wait for the server to be ready: + + +```bash +timeout 600 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done' || { echo "FastAPI server not ready after 10min"; exit 1; } +``` + + +Profile the model: + + +```bash +aiperf profile \ + --model meta-llama/Llama-3.2-1B \ + --endpoint-type chat \ + --endpoint /v1/chat/completions \ + --service-kind openai \ + --streaming \ + --num-prompts 20 \ + --max-tokens 50 +``` + + +## Running the Tests + +### Run all discovered tests: + +```bash +cd tests/ci/test_docs_end_to_end +python main.py +``` + +### Dry run to see what would be tested: + +```bash +python main.py --dry-run +``` + +### Test specific servers: + +Currently, the framework tests the first discovered server by default. Use `--all-servers` to test all: + +```bash +python main.py --all-servers +``` + +## Validation Rules + +The test framework validates that each server has: +- Exactly ONE setup command (duplicates cause test failure) +- Exactly ONE health check command (duplicates cause test failure) +- At least ONE aiperf command + +If any of these requirements are not met, the tests will fail with a clear error message. + +## Test Execution Flow + +For each server, the test runner: + +1. **Build Phase**: Builds the AIPerf Docker container (once for all tests) +2. **Setup Phase**: Starts the server in the background +3. **Health Check Phase**: Waits for server to be ready (runs in parallel with setup) +4. **Test Phase**: Executes all AIPerf commands sequentially against the same running server instance +5. **Cleanup Phase**: Gracefully shuts down the server and cleans up Docker resources + +**Note**: The server remains running throughout all AIPerf commands. It is only shut down once during the cleanup phase after all tests complete. + +## Common Patterns + +### Pattern: OpenAI-compatible API + + +```bash +docker run --gpus all -p 8000:8000 myserver:latest \ + --model model-name \ + --host 0.0.0.0 --port 8000 +``` + + + +```bash +timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"model-name\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "Server not ready"; exit 1; } +``` + + + +```bash +aiperf profile \ + --model model-name \ + --endpoint-type chat \ + --endpoint /v1/chat/completions \ + --service-kind openai \ + --streaming \ + --num-prompts 10 \ + --max-tokens 100 +``` + + +## Troubleshooting + +### Tests not discovered + +- Verify tag format: `setup-{name}-endpoint-server`, `health-check-{name}-endpoint-server`, `aiperf-run-{name}-endpoint-server` +- Ensure bash code block immediately follows the tag +- Check that the server name is consistent across all three tag types +- Run `python main.py --dry-run` to see what's discovered + +### Health check timeout + +- Increase the timeout value in your health check command +- Verify the health check endpoint is correct +- Check server logs: the test runner shows setup output for 30 seconds +- Ensure your server starts on the expected port + +### AIPerf command fails + +- Test your AIPerf command manually first +- Use small values for `--num-prompts` and `--max-tokens` +- Verify the model name matches what the server expects +- Check that the endpoint URL is correct + +### Duplicate command errors + +If you see errors like "DUPLICATE SETUP COMMAND", you have multiple commands with the same server name: +- Search your docs for all instances of that tag +- Ensure each server has a unique name +- Or remove duplicate tags if they're truly duplicates + +## Best Practices + +1. **Keep tests fast**: Use minimal `--num-prompts` (10-20) and small `--max-tokens` values +2. **Use standard ports**: Default to 8000 for consistency +3. **Add timeouts**: Always include timeouts in health checks +4. **Test locally first**: Run commands manually before adding tags +5. **One server per doc section**: Avoid mixing multiple servers in the same doc section +6. **Clear error messages**: Include helpful error messages in health checks +7. **Document requirements**: Note any GPU, memory, or dependency requirements in surrounding text + +## Architecture Reference + +Key files in the test framework: + +- `main.py`: Entry point, orchestrates parsing and testing +- `parser.py`: Markdown parser that discovers tagged commands +- `test_runner.py`: Executes tests for each server +- `constants.py`: Configuration constants (timeouts, tag patterns) +- `data_types.py`: Data models for commands and servers +- `utils.py`: Utility functions for Docker operations + +## Constants and Configuration + +Key constants in `constants.py`: + +- `SETUP_MONITOR_TIMEOUT`: 30 seconds (how long to monitor setup output) +- `CONTAINER_BUILD_TIMEOUT`: 600 seconds (Docker build timeout) +- `AIPERF_COMMAND_TIMEOUT`: 300 seconds (per-command timeout) +- `AIPERF_UI_TYPE`: "simple" (auto-added to all aiperf commands) + +To modify these, edit `constants.py`. diff --git a/tests/ci/test_docs_end_to_end/parser.py b/tests/ci/test_docs_end_to_end/parser.py index e63d4933d..1df539140 100644 --- a/tests/ci/test_docs_end_to_end/parser.py +++ b/tests/ci/test_docs_end_to_end/parser.py @@ -35,6 +35,10 @@ def parse_directory(self, directory: str) -> dict[str, Server]: logger.info(f"Parsing markdown files in {directory}") for file_path in Path(directory).rglob("*.md"): + # Skip the documentation file for this test framework + if file_path.name == "DOC_CI_TEST_README.md": + logger.info(f"Skipping documentation file: {file_path}") + continue logger.info(f"Parsing file: {file_path}") self._parse_file(str(file_path))