-
Notifications
You must be signed in to change notification settings - Fork 25
ci: Add CI testing for markdown-based end-to-end tests #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4783cf1 to
4a0ce73
Compare
4a0ce73 to
936cd6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive CI testing framework that extracts and executes markdown-tagged bash commands for end-to-end testing of documentation examples. The framework provides sequential server testing with complete lifecycle management, health checks, and detailed logging.
- Introduces markdown parser to extract tagged bash commands from documentation files
- Implements server manager with background process handling and port cleanup capabilities
- Creates test orchestrator for coordinating complete test workflows with configurable timeouts
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test_orchestrator.py | Main orchestrator coordinating test execution workflow with sequential server testing |
| test.sh | Updates test script to use new Python framework instead of direct aiperf call |
| server_manager.py | Manages server lifecycle including background processes, health checks, and cleanup |
| parse_and_execute_md_tags.py | Legacy markdown parser with direct command execution capabilities |
| markdown_parser.py | Core markdown parser for extracting tagged bash commands from documentation |
| main.py | Entry point providing CLI interface for the test framework |
| aiperf_manager.py | Handles AIPerf setup and test execution with result tracking |
| init.py | Package initialization file |
| tutorial.md | Updates documentation with proper markdown tags for CI testing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
cd43e7e to
3e4cd36
Compare
bb9074d to
e558eab
Compare
|
I think the solution is pretty solid. Just two things
|
0c2162f to
31b2824
Compare
9ce6061 to
460f344
Compare
debermudez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly stuff that got lost from a couple PRs in the recent history.
06aecc3 to
d0cb76d
Compare
|
@matthewkotila and @lkomali is it okay to remove basic_end_to_end CI test after merging this PR? |
Will let @matthewkotila decide as he's the one who added the test. |
d0cb76d to
c5b65a5
Compare
9dfb7b4 to
b21b636
Compare
- Create utils.py with centralized utilities - Add repo root path calculation to utils.py (single source of truth) - Include Docker management utilities for container lifecycle - Add command execution utilities with timeout handling - Modular structure with reusable functions across all Python files - Support --all-servers CLI option for CI integration - Graceful server shutdown with Docker-based management
- Create utils.py with centralized utilities (repo root, Docker management, command execution) - Enhanced docs/tutorial.md with HTML comment tags for both Dynamo and vLLM servers - Support --all-servers CLI option for CI integration - Graceful Docker-based server shutdown (no process killing) - Fixed health check commands with proper 15-minute timeouts - Clean AIPerf commands with proper parameter formatting - Ready for CI pipeline execution with python3 main.py --all-servers
b21b636 to
7bbd8c0
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughDocs tutorial updated with new setup/health-check tags and reduced-load examples for Dynamo (port 8080) and vLLM (port 8000). Legacy CI E2E shell scripts removed. New docs-driven E2E testing framework added: parser for tagged markdown, test runner orchestrating servers and an AIPerf container, utilities, constants, and entrypoint script. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as CI Job
participant Main as main.py
participant Parser as MarkdownParser
participant Runner as TestRunner
participant Docker as Docker Engine
participant Server as Target Server
participant AIP as AIPerf Container
CI->>Main: Run tests/ci/.../main.py --all-servers
Main->>Parser: parse_directory(docs/)
Parser-->>Main: servers: {name -> setup/health/aiperf cmds}
Main->>Runner: run_tests(servers)
Runner->>Docker: Build AIPerf image
Docker-->>Runner: Build OK / version checked
Runner->>Docker: Start AIPerf container
Docker-->>Runner: Container ID
loop For each server
Runner->>Server: setup_command (background)
Note over Runner,Server: Monitor setup output (timeout window)
Runner->>Server: health_check_command (foreground/loop)
Server-->>Runner: Ready (HTTP 200)
loop AIPerf commands
Runner->>AIP: exec aiperf ... --ui simple
AIP-->>Runner: Exit code/result
end
Runner->>Server: Graceful shutdown/cleanup
end
Runner-->>Main: Success/Failure
Main-->>CI: Exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (28)
tests/ci/test_docs_end_to_end/constants.py (2)
11-11: Include the hyphen in TAG_SUFFIX to avoid trailing '-' in server names.Current slicing leaves a trailing '-' (e.g., "dynamo-default-openai-"). Make the suffix "-endpoint-server" so names are clean and consistent.
-TAG_SUFFIX = "endpoint-server" +TAG_SUFFIX = "-endpoint-server"
22-26: Wire timeouts through the runner.CONTAINER_BUILD_TIMEOUT, CONTAINER_START_TIMEOUT, and AIPERF_COMMAND_TIMEOUT should be enforced in the subprocess calls (build/run/exec) to prevent hangs.
Would you like a follow-up patch for test_runner.py to honor these constants?
tests/ci/test_docs_end_to_end/data_types.py (1)
26-28: Confirm Python >= 3.10 for PEP 604 unions.
Command | Nonerequires Python 3.10+. If CI may run older, switch toOptional[Command]+from typing import Optional.tests/ci/test_docs_end_to_end/main.py (2)
35-39: --all-servers flag is unused.Either remove it or wire it to filter/select servers. Keeping unused flags increases confusion.
49-51: Returning success when no servers are found—confirm CI intent.Exiting 0 on “No servers found” may mask discovery issues. Consider failing the job if discovery unexpectedly returns empty.
tests/ci/test_docs_end_to_end/parser.py (8)
45-51: Narrow the exception and log with traceback.Catching
Exceptionhides issues; useOSErrorandlogger.exceptionto include stack traces.- except Exception as e: - logger.error(f"Failed to read file {file_path}: {e}") + except OSError as e: + logger.exception(f"Failed to read file {file_path}: {e}")
58-61: Simplify tag extraction regex.Loosen to capture the whole tag content; prefix/suffix checks already validate it.
- tag_match = re.match(r"<!--\s*([^-\s]+.*?)\s*-->", line) + tag_match = re.match(r"<!--\s*(.*?)\s*-->", line)
62-64: Fix comment to reflect all supported tags.Includes health-check as well.
- # Check for setup or aiperf-run tags ending with endpoint-server + # Check for setup/health-check/aiperf-run tags ending with endpoint-server
99-106: Support more fenced code block languages (bash/sh/shell/console).Improves robustness to common doc styles.
- if line == "```bash": + if re.fullmatch(r"```(bash|sh|shell|console)?", line): break elif line and not line.startswith("#"): return None @@ - if line.strip() == "```": + if line.strip() == "```": return "".join(bash_lines).strip()Also applies to: 115-116
126-131: Strip trailing '-' from derived server names.Slicing with current suffix leaves a trailing separator.
- server_name = tag_name[ - SETUP_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN - ] # Remove prefix and suffix + server_name = tag_name[SETUP_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN].rstrip("-") + # Remove prefix/suffix and any trailing separator
150-155: Apply the same trailing '-' fix for health-check tags.- server_name = tag_name[ - HEALTH_CHECK_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN - ] # Remove prefix and suffix + server_name = tag_name[HEALTH_CHECK_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN].rstrip("-")
176-181: Apply the same trailing '-' fix for aiperf-run tags.- server_name = tag_name[ - AIPERF_RUN_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN - ] # Remove prefix and suffix + server_name = tag_name[AIPERF_RUN_TAG_PREFIX_LEN:-TAG_SUFFIX_LEN].rstrip("-")
140-147: Prefer raising exceptions over sys.exit for duplicates.Let the caller decide how to handle errors; improves testability and reuse.
- sys.exit(1) + raise ValueError(f"Duplicate setup tag for server '{server_name}'") @@ - sys.exit(1) + raise ValueError(f"Duplicate health-check tag for server '{server_name}'")Also applies to: 164-172
docs/tutorial.md (4)
27-28: Typo: “Dyanmo” → “Dynamo”.Fix the misspelling in the comment to avoid confusion.
Apply:
-# Download the Dyanmo container +# Download the Dynamo container
102-103: Consistent capitalization: “vLLM”.Use “vLLM” consistently in headings and text for polish.
-## Profile Qwen3-0.6B using vllm <a id="vllm-qwen3-0.6B"> +## Profile Qwen3-0.6B using vLLM <a id="vllm-qwen3-0.6B">
106-110: Avoid mutable “latest” image tag in CI docs. Pin a known-good version.“latest” is non-reproducible and can break later. Pin the vLLM image or refer to an env var.
-docker pull vllm/vllm-openai:latest -docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \ +export VLLM_IMAGE="vllm/vllm-openai:0.6.3" # example: pin to a tested version +docker pull "${VLLM_IMAGE}" +docker run --gpus all -p 8000:8000 "${VLLM_IMAGE}" \
20-21: No duplicate docs-testing tags found — add fail-fast detectionScanned all .md files for tags — NO_DUPLICATES_FOUND.
- Parser: add duplicate-detection that errors with file:line on collision (recommended).
- Alternative: require stable per-file prefix for tags (e.g., tutorial-*) to avoid future collisions.
tests/ci/test_docs_end_to_end/test_runner.py (5)
124-140: Hard-coded venv path may be wrong. Verify aiperf without sourcing.Relying on
/opt/aiperf/venvassumes a specific Dockerfile. Safer to invokeaiperf --versiondirectly and log stderr on failure.- verify_result = subprocess.run( - f"docker exec {container_name} bash -c 'source /opt/aiperf/venv/bin/activate && aiperf --version'", + verify_result = subprocess.run( + f"docker exec {container_name} bash -lc 'command -v aiperf >/dev/null 2>&1 && aiperf --version'", shell=True, capture_output=True, text=True, timeout=30, ) if verify_result.returncode != 0: logger.error("AIPerf verification failed") - logger.error(f"Stdout: {verify_result.stdout}") - logger.error(f"Stderr: {verify_result.stderr}") + logger.error(f"Stdout: {verify_result.stdout.strip()}") + logger.error(f"Stderr: {verify_result.stderr.strip()}") return False - logger.info(f"AIPerf version: {verify_result.stdout.strip()}") + logger.info(f"AIPerf version: {verify_result.stdout.strip() or verify_result.stderr.strip()}")Optionally, add a fallback to print PATH if detection fails.
262-266: Avoid duplicating--ui-typeif the doc command already includes it.Appending unconditionally can produce duplicated flags; safer to add only when missing.
- aiperf_command_with_ui = f"{aiperf_cmd.command} --ui-type {AIPERF_UI_TYPE}" + aiperf_command_with_ui = ( + aiperf_cmd.command + if "--ui-type" in aiperf_cmd.command + else f"{aiperf_cmd.command} --ui-type {AIPERF_UI_TYPE}" + )
77-85: Minor:text=Truealready implies universal newlines.Drop
universal_newlines=Truefor brevity; same below for other Popen calls.- text=True, - bufsize=1, - universal_newlines=True, + text=True, + bufsize=1,Apply similarly at Lines 178-186, 224-232, 274-282.
88-96: Use logger instead of print for streaming output.Keeps logs unified and captures in files; add a prefix via logger if needed.
- print(f"BUILD: {line.rstrip()}") + logger.info(f"BUILD: {line.rstrip()}")Repeat for SETUP/HEALTH/AIPERF prints.
Also applies to: 191-201, 236-243, 286-293
42-46: Server duplicate handling is upstream; add checks or enforce via parser.Because servers arrive as a dict, duplicates may be lost. Consider receiving a list and validating uniqueness or augmenting the parser to surface duplicates explicitly.
Would you like a small addition in parser to collect duplicates and fail?
tests/ci/test_docs_end_to_end/utils.py (6)
16-17: Update stale comment for repo root.Directory is test_docs_end_to_end now.
- # We're in tests/ci/basic_end_to_end, so go up 4 levels to reach repo root + # We're in tests/ci/test_docs_end_to_end, so go up 4 levels to reach repo root
45-49: Use logging.exception and re-raise bare to preserve stack.Improves diagnostics and traceback fidelity.
- except subprocess.TimeoutExpired as e: - logger.warning(f"Command timed out after {timeout}s: {command}") - raise e - except Exception as e: - logger.error(f"Command failed: {command} - {e}") - raise e + except subprocess.TimeoutExpired: + logger.warning(f"Command timed out after {timeout}s: {command}") + raise + except Exception: + logger.exception(f"Command failed: {command}") + raise
74-75: Standardize on logger for real-time output, not print.Unifies console/file logs.
- print(f"{prefix}: {line.rstrip()}") + logger.info(f"{prefix}: {line.rstrip()}")
168-172: Avoid shell for killing processes; use os.kill.Reduces S602 and avoids shell dependency.
-def safe_kill_process(pid: int, signal: int = 9) -> bool: +def safe_kill_process(pid: int, signal: int = 9) -> bool: """Safely kill a process by PID with error handling""" try: - subprocess.run(f"kill -{signal} {pid}", shell=True, timeout=5) + import os, signal as sigmod + os.kill(pid, getattr(sigmod, f"SIG{signal}") if isinstance(signal, str) else signal) return True except Exception as e: logger.debug(f"Could not kill process {pid}: {e}") return False
175-185: Quote/validatefilter_criteriaor constrain to known filters.Current shell interpolation risks breakage with spaces/special chars. At minimum, quote the arg.
- result = subprocess.run( - f"docker ps --filter {filter_criteria} --format '{{{{.ID}}}}'", + import shlex + result = subprocess.run( + f"docker ps --filter {shlex.quote(filter_criteria)} --format '{{{{.ID}}}}'", shell=True, capture_output=True, text=True, timeout=5, )Optionally, accept two params (filter key, value) and build safely.
193-207: Be careful with global Docker prunes in CI.Even without
force, prune can remove resources other jobs might rely on. Keep as-is but ensure tests run in isolated runners or namespaces.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/tutorial.md(4 hunks)tests/ci/basic_end_to_end/start_server.sh(0 hunks)tests/ci/basic_end_to_end/test.sh(0 hunks)tests/ci/common/setup_test.sh(0 hunks)tests/ci/test_docs_end_to_end/constants.py(1 hunks)tests/ci/test_docs_end_to_end/data_types.py(1 hunks)tests/ci/test_docs_end_to_end/main.py(1 hunks)tests/ci/test_docs_end_to_end/parser.py(1 hunks)tests/ci/test_docs_end_to_end/setup_test.sh(1 hunks)tests/ci/test_docs_end_to_end/test_runner.py(1 hunks)tests/ci/test_docs_end_to_end/utils.py(1 hunks)
💤 Files with no reviewable changes (3)
- tests/ci/common/setup_test.sh
- tests/ci/basic_end_to_end/start_server.sh
- tests/ci/basic_end_to_end/test.sh
🧰 Additional context used
🧬 Code graph analysis (3)
tests/ci/test_docs_end_to_end/test_runner.py (2)
tests/ci/test_docs_end_to_end/data_types.py (1)
Server(22-28)tests/ci/test_docs_end_to_end/utils.py (3)
cleanup_docker_resources(193-208)docker_stop_and_remove(129-153)get_repo_root(14-17)
tests/ci/test_docs_end_to_end/parser.py (1)
tests/ci/test_docs_end_to_end/data_types.py (2)
Command(11-18)Server(22-28)
tests/ci/test_docs_end_to_end/main.py (3)
tests/ci/test_docs_end_to_end/parser.py (2)
MarkdownParser(27-190)parse_directory(33-41)tests/ci/test_docs_end_to_end/test_runner.py (2)
TestRunner(25-411)run_tests(32-61)tests/ci/test_docs_end_to_end/utils.py (2)
get_repo_root(14-17)setup_logging(20-28)
🪛 Ruff (0.13.1)
tests/ci/test_docs_end_to_end/test_runner.py
77-77: subprocess call with shell=True identified, security issue
(S602)
112-112: subprocess call with shell=True identified, security issue
(S602)
125-125: subprocess call with shell=True identified, security issue
(S602)
178-178: subprocess call with shell=True identified, security issue
(S602)
224-224: subprocess call with shell=True identified, security issue
(S602)
274-274: subprocess call with shell=True identified, security issue
(S602)
371-371: subprocess call with shell=True identified, security issue
(S602)
387-387: Do not catch blind exception: Exception
(BLE001)
tests/ci/test_docs_end_to_end/parser.py
48-48: Do not catch blind exception: Exception
(BLE001)
49-49: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/ci/test_docs_end_to_end/utils.py
36-36: subprocess call with shell=True identified, security issue
(S602)
43-43: Consider moving this statement to an else block
(TRY300)
46-46: Use raise without specifying exception name
Remove exception name
(TRY201)
48-48: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
49-49: Use raise without specifying exception name
Remove exception name
(TRY201)
57-57: subprocess call with shell=True identified, security issue
(S602)
117-117: subprocess call with shell=True identified, security issue
(S602)
125-125: Do not catch blind exception: Exception
(BLE001)
133-133: subprocess call with shell=True identified, security issue
(S602)
142-142: subprocess call with shell=True identified, security issue
(S602)
150-150: Consider moving this statement to an else block
(TRY300)
151-151: Do not catch blind exception: Exception
(BLE001)
161-161: Do not catch blind exception: Exception
(BLE001)
168-168: subprocess call with shell=True identified, security issue
(S602)
169-169: Consider moving this statement to an else block
(TRY300)
170-170: Do not catch blind exception: Exception
(BLE001)
178-178: subprocess call with shell=True identified, security issue
(S602)
188-188: Consider moving this statement to an else block
(TRY300)
189-189: Do not catch blind exception: Exception
(BLE001)
205-205: subprocess call with shell=True identified, security issue
(S602)
207-207: Do not catch blind exception: Exception
(BLE001)
🪛 Shellcheck (0.11.0)
tests/ci/test_docs_end_to_end/setup_test.sh
[warning] 6-6: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build (ubuntu-latest, 3.10)
🔇 Additional comments (2)
docs/tutorial.md (2)
86-87: Double‑check Dynamo endpoint port alignment with the running server.The aiperf command targets localhost:8080. Ensure this matches the server’s effective bind/forwarding (especially if the CI image moves to 8000).
75-75: Confirm Dynamo health-check should remain on port 8080 (update if CI flips to 8000).
docs/tutorial.md uses localhost:8080 for Dynamo health-check and --url (lines 75, 86); vLLM uses localhost:8000 (lines 115, 130).
If a user want to test all the docs with e2e example execute this command.
Ensure the machine where you are running these commands have a GPU on it.
Summary by CodeRabbit
Documentation
Tests
Chores