[None][chore] Add multinode e2e and accuracy cases on DGX-Spark#12110
[None][chore] Add multinode e2e and accuracy cases on DGX-Spark#12110JennyLiu-nv wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughThe changes add three new model entries to MMLU accuracy references, introduce a helper function for multi-node accuracy testing with multiple new test classes and methods, create and modify end-to-end multi-node tests, and reorganize test manifests by converting a text file to YAML with GPU count-based conditions. Changes
Sequence DiagramsequenceDiagram
participant Test as Test Case
participant Func as _run_multinode_accuracy
participant LLM as LLM Engine
participant Spec as Speculative Decoder
participant Bench as Benchmark (MMLU/GSM8K)
Test->>Func: invoke with model_path, benchmarks, tp_size, draft_model_path
Func->>LLM: construct with tensor/pipeline parallelism
activate LLM
LLM-->>Func: LLM initialized
deactivate LLM
alt draft_model_path provided
Func->>Spec: configure with draft model
activate Spec
Spec-->>Func: speculative decoding ready
deactivate Spec
end
loop for each benchmark
Func->>Bench: evaluate benchmark
activate Bench
Bench->>LLM: query with kv_cache_config
LLM-->>Bench: results
Bench-->>Func: accuracy metrics
deactivate Bench
end
Func-->>Test: accuracy results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)
94-120: Validatebenchmarksbefore opening the multinodeLLM.An unsupported entry currently fails only after the model is fully constructed, which turns a simple typo into a very expensive two-node failure. Precompute the task list before entering
with LLM(...).♻️ Proposed fix
def _run_multinode_accuracy(model_path, model_name, *, benchmarks, tp_size=2, pp_size=1, ep_size=1, draft_model_path=None, max_draft_len=2, kv_cache_config=None, max_num_tokens=4096, max_batch_size=1, **llm_kwargs): benchmark_task_map = { "mmlu": MMLU, "gsm8k": GSM8K, } + invalid_benchmarks = [ + benchmark for benchmark in benchmarks + if benchmark not in benchmark_task_map + ] + if invalid_benchmarks: + raise ValueError(f"Unsupported benchmarks: {invalid_benchmarks}") + tasks = [benchmark_task_map[benchmark](model_name) + for benchmark in benchmarks] if kv_cache_config is None: kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.5, enable_block_reuse=draft_model_path is None) @@ with LLM(model_path, tensor_parallel_size=tp_size, pipeline_parallel_size=pp_size, moe_expert_parallel_size=ep_size, max_num_tokens=max_num_tokens, max_batch_size=max_batch_size, kv_cache_config=kv_cache_config, speculative_config=spec_config, **llm_kwargs) as llm: - for benchmark in benchmarks: - task_cls = benchmark_task_map.get(benchmark) - if task_cls is None: - raise ValueError(f"Unsupported benchmark: {benchmark}") - task = task_cls(model_name) + for task in tasks: task.evaluate(llm)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py` around lines 94 - 120, Pre-validate the requested benchmarks before constructing the distributed LLM: iterate over the incoming benchmarks list and map each entry using benchmark_task_map to produce a task list (or raise ValueError for unknown names) prior to entering the with LLM(...) context; update code around benchmark_task_map, benchmarks, and task_cls so the unsupported-benchmark check happens before LLM(...) is called, and then use the precomputed task list inside the with-block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Line 81: SPDX header year is stale after modifying this file; update the
top-of-file copyright year to the latest meaningful modification year. Open the
file and edit the SPDX header on line 1 to replace 2025 with 2026 so it matches
the recent changes (this file contains the function _run_multinode_accuracy
which was modified), ensure only the year is changed and keep the rest of the
header intact.
- Around line 81-123: Add an optional expected_quant_algo parameter to
_run_multinode_accuracy (default None) and after the LLM is constructed (inside
the with LLM(...) as llm: block) if expected_quant_algo is not None assert that
llm.args.quant_config.quant_algo == expected_quant_algo (or raise ValueError
with a clear message) so callers can verify post-load quantization; update any
callers to pass expected_quant_algo=QuantAlgo.NVFP4/FP8 as needed.
---
Nitpick comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 94-120: Pre-validate the requested benchmarks before constructing
the distributed LLM: iterate over the incoming benchmarks list and map each
entry using benchmark_task_map to produce a task list (or raise ValueError for
unknown names) prior to entering the with LLM(...) context; update code around
benchmark_task_map, benchmarks, and task_cls so the unsupported-benchmark check
happens before LLM(...) is called, and then use the precomputed task list inside
the with-block.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f04ce185-8cd4-4b89-96d9-865833df93f9
📒 Files selected for processing (5)
tests/integration/defs/accuracy/references/mmlu.yamltests/integration/defs/accuracy/test_llm_api_pytorch.pytests/integration/defs/test_e2e.pytests/integration/test_lists/qa/llm_spark_func.txttests/integration/test_lists/qa/llm_spark_func.yml
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/qa/llm_spark_func.txt
|
PR_Github #38581 [ run ] triggered by Bot. Commit: |
|
PR_Github #38581 [ run ] completed with state
|
|
/bot run |
Summary by CodeRabbit
Tests
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.