test(lmeval): add GPU integration tests with vLLM runtime and fix accelerator typo#1275
test(lmeval): add GPU integration tests with vLLM runtime and fix accelerator typo#1275ssaleem-rh wants to merge 9 commits intoopendatahub-io:mainfrom
Conversation
…CELERATOR_TYPE rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
Add test_lmeval_gpu to verify LMEval works with GPU-backed model deployments via vLLM runtime. Includes: - New test for GPU model evaluation with SmolLM-1.7B - wait_for_vllm_model_ready utility for model readiness checks - GPU-specific fixtures: ServingRuntime, InferenceService, LMEvalJob, and pod; skip when no supported accelerator rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
…TED_ACCLERATOR_TYPE → SUPPORTED_ACCELERATOR_TYPE in runtime option. rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
- Relocate skip_if_no_supported_accelerator_type fixture to tests/conftest.py for reuse across test modules - Introduce ACCELERATOR_IDENTIFIER constant and update imports - Relax type annotation in lmeval_vllm_inference_service to str | None Signed-off-by: ssaleem-rh <ssaleem@redhat.com> Signed-off-by: Shehan Saleem <ssaleem@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/cherry-pick', '/build-push-pr-image', '/wip', '/hold', '/lgtm', '/verified'} |
Previously removed as unnecessary, but required to suppress BLE001 warning. Signed-off-by: Shehan Saleem <ssaleem@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCorrects a misspelled pytest env-var default, centralizes a session skip fixture, adds GPU LMEval support (accelerator identifiers, vLLM ServingRuntime/InferenceService/LMEvalJob fixtures, pod readiness waiter, GPU test), and removes a duplicate skip fixture. Attention: HF-related env vars and PVC/service provisioning may expose secrets or data (CWE-200); verify secret handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/conftest.py`:
- Around line 990-994: The fixture skip_if_no_supported_accelerator_type
currently only skips when supported_accelerator_type is missing; update it to
also skip when the provided supported_accelerator_type is not a GPU-type (e.g.,
values like "spyre" or "cpu_x86"). In the skip_if_no_supported_accelerator_type
fixture, change the condition to check that supported_accelerator_type is truthy
AND matches an expected GPU indicator (for example contains "gpu" or "cuda" or
is in a set of known GPU identifiers) and call pytest.skip with a clear message
when it does not; keep the parameter name supported_accelerator_type and the
pytest.skip call but make the value check stricter so non-GPU strings are
skipped.
In `@tests/model_explainability/lm_eval/conftest.py`:
- Around line 582-597: The fixture lmeval_vllm_serving_runtime currently
hardcodes RuntimeTemplates.VLLM_CUDA; change it to choose the runtime template
based on the cluster/accelerator type (e.g., detect NVIDIA vs AMD/ROCm vs Gaudi
from whatever cluster/fixture value is available) by creating a small mapping
(accelerator_type -> runtime_template) and use that variable instead of
RuntimeTemplates.VLLM_CUDA when calling ServingRuntimeFromTemplate; ensure the
chosen template covers ROCm and Gaudi variants (e.g., VLLM_ROCM, VLLM_GAUDI or
their project equivalents) and keep other args (runtime_image,
support_tgis_open_ai_endpoints, deployment_type) unchanged so the correct
ServingRuntime backend is provisioned for each accelerator.
In `@tests/model_explainability/lm_eval/test_lm_eval.py`:
- Around line 203-214: The GPU test test_lmeval_gpu is executing online model
downloads and must be skipped on disconnected clusters; add the pytest marker
`@pytest.mark.skip_on_disconnected` above the test_lmeval_gpu definition
(alongside the existing `@pytest.mark.gpu` and `@pytest.mark.parametrize`
decorators) to guard this GPU path so it will be skipped when Hub egress is not
available.
In `@tests/model_explainability/lm_eval/utils.py`:
- Around line 160-182: The generic except Exception around predictor_pod.log()
in the waiting loop and the final logs retrieval should be narrowed to only
transient/k8s-unavailable errors; change both handlers to catch
kubernetes.client.exceptions.ApiException and the project ResourceNotFoundError
(or whatever local pod-missing exception is used) and let any other exception
propagate so permanent errors fail fast; also add the necessary imports for
ApiException and ResourceNotFoundError and keep the same logging behavior for
the caught exceptions while re-raising unexpected exceptions instead of
swallowing them and retrying until max_wait_time; target the
predictor_pod.log(...) calls and the UnexpectedFailureError raise for locating
the changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 2db22a44-3620-4fa0-8bb3-715d65beb111
📒 Files selected for processing (7)
conftest.pytests/conftest.pytests/model_explainability/lm_eval/conftest.pytests/model_explainability/lm_eval/constants.pytests/model_explainability/lm_eval/test_lm_eval.pytests/model_explainability/lm_eval/utils.pytests/model_serving/model_runtime/conftest.py
💤 Files with no reviewable changes (1)
- tests/model_serving/model_runtime/conftest.py
Add support for NVIDIA, AMD, and Gaudi in LMEval GPU tests. Update the skip_if_no_supported_accelerator_type fixture to validate supported GPU types. Improve exception handling in the vLLM readiness check and mark tests with skip_on_disconnected. Signed-off-by: Shehan Saleem <ssaleem@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/model_explainability/lm_eval/conftest.py (2)
675-678: InstantiatingServiceonly to retrieve its name is unnecessary.The service name follows a predictable pattern (
{isvc_name}-predictor). Creating aServiceobject solely to access.nameadds overhead and obscures intent.Simplify to direct string construction
- model_service = Service( - name=f"{lmeval_vllm_inference_service.name}-predictor", - namespace=lmeval_vllm_inference_service.namespace, - ) + model_service_name = f"{lmeval_vllm_inference_service.name}-predictor"Then at line 696:
- "value": f"http://{model_service.name}.{model_namespace.name}.svc.cluster.local:80/v1/completions", + "value": f"http://{model_service_name}.{model_namespace.name}.svc.cluster.local:80/v1/completions",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/lm_eval/conftest.py` around lines 675 - 678, Replace the unnecessary Service instantiation used only to obtain a name: instead of creating model_service = Service(...) just build the predictor name string directly from lmeval_vllm_inference_service.name (e.g. f"{lmeval_vllm_inference_service.name}-predictor") and use that string where model_service.name was referenced (see model_service variable and its later usage around the predictor creation at line ~696); remove the unused Service object to simplify intent and eliminate overhead.
621-626: Extractmodel_pathto a module-level constant to avoid duplication.
"HuggingFaceTB/SmolLM-1.7B"is defined here and again at line 674 inlmevaljob_gpu. If the model changes, both locations must be updated.Suggested refactor
Add near the top of the file with other constants:
SMOLLM_MODEL_PATH: str = "HuggingFaceTB/SmolLM-1.7B"Then reference
SMOLLM_MODEL_PATHin both fixtures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_explainability/lm_eval/conftest.py` around lines 621 - 626, Extract the literal "HuggingFaceTB/SmolLM-1.7B" into a module-level constant (e.g., SMOLLM_MODEL_PATH) near the top with other constants, then replace uses of the literal in the fixtures (referenced by model_path in current fixture and in lmevaljob_gpu) to reference SMOLLM_MODEL_PATH instead so both fixtures share the single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_explainability/lm_eval/conftest.py`:
- Around line 595-599: The code silently defaults supported_accelerator_type to
"nvidia" which can provision CUDA incorrectly; change the logic so it no longer
defaults — if supported_accelerator_type is None either (a) declare this fixture
dependent on skip_if_no_supported_accelerator_type to guarantee a value, or (b)
immediately call pytest.skip (or raise) when supported_accelerator_type is None
before computing accelerator_type/template_name; update the block that currently
sets accelerator_type, template_name and the subsequent skip to first check
supported_accelerator_type and skip with a clear message rather than falling
back to "nvidia".
---
Nitpick comments:
In `@tests/model_explainability/lm_eval/conftest.py`:
- Around line 675-678: Replace the unnecessary Service instantiation used only
to obtain a name: instead of creating model_service = Service(...) just build
the predictor name string directly from lmeval_vllm_inference_service.name (e.g.
f"{lmeval_vllm_inference_service.name}-predictor") and use that string where
model_service.name was referenced (see model_service variable and its later
usage around the predictor creation at line ~696); remove the unused Service
object to simplify intent and eliminate overhead.
- Around line 621-626: Extract the literal "HuggingFaceTB/SmolLM-1.7B" into a
module-level constant (e.g., SMOLLM_MODEL_PATH) near the top with other
constants, then replace uses of the literal in the fixtures (referenced by
model_path in current fixture and in lmevaljob_gpu) to reference
SMOLLM_MODEL_PATH instead so both fixtures share the single source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: aa15b6ad-be6f-4a1f-a355-46d489d4cfc3
📒 Files selected for processing (4)
tests/conftest.pytests/model_explainability/lm_eval/conftest.pytests/model_explainability/lm_eval/test_lm_eval.pytests/model_explainability/lm_eval/utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/conftest.py
- tests/model_explainability/lm_eval/test_lm_eval.py
- tests/model_explainability/lm_eval/utils.py
| accelerator_type = supported_accelerator_type.lower() if supported_accelerator_type else "nvidia" | ||
| template_name = accelerator_to_template.get(accelerator_type) | ||
|
|
||
| if not template_name: | ||
| pytest.skip(f"Unsupported accelerator type for vLLM: {supported_accelerator_type}") |
There was a problem hiding this comment.
Silent fallback to "nvidia" when supported_accelerator_type is None can cause confusing failures.
Line 595 defaults to "nvidia" when supported_accelerator_type is None, but the CLI option (per root conftest.py) returns None when the environment variable is unset. If a test runs on a non-NVIDIA cluster without the accelerator type configured, this fixture will provision a CUDA runtime and fail with a misleading error instead of skipping gracefully.
Consider either:
- Requiring the value explicitly (skip/error when
None) - Relying on the
skip_if_no_supported_accelerator_typefixture as a dependency to guarantee the value is neverNonehere
Option 1: Fail fast when accelerator type is missing
- accelerator_type = supported_accelerator_type.lower() if supported_accelerator_type else "nvidia"
- template_name = accelerator_to_template.get(accelerator_type)
-
- if not template_name:
- pytest.skip(f"Unsupported accelerator type for vLLM: {supported_accelerator_type}")
+ if not supported_accelerator_type:
+ pytest.skip("supported_accelerator_type is required for GPU-backed vLLM tests")
+
+ accelerator_type = supported_accelerator_type.lower()
+ template_name = accelerator_to_template.get(accelerator_type)
+
+ if not template_name:
+ pytest.skip(f"Unsupported accelerator type for vLLM: {accelerator_type}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| accelerator_type = supported_accelerator_type.lower() if supported_accelerator_type else "nvidia" | |
| template_name = accelerator_to_template.get(accelerator_type) | |
| if not template_name: | |
| pytest.skip(f"Unsupported accelerator type for vLLM: {supported_accelerator_type}") | |
| if not supported_accelerator_type: | |
| pytest.skip("supported_accelerator_type is required for GPU-backed vLLM tests") | |
| accelerator_type = supported_accelerator_type.lower() | |
| template_name = accelerator_to_template.get(accelerator_type) | |
| if not template_name: | |
| pytest.skip(f"Unsupported accelerator type for vLLM: {accelerator_type}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/model_explainability/lm_eval/conftest.py` around lines 595 - 599, The
code silently defaults supported_accelerator_type to "nvidia" which can
provision CUDA incorrectly; change the logic so it no longer defaults — if
supported_accelerator_type is None either (a) declare this fixture dependent on
skip_if_no_supported_accelerator_type to guarantee a value, or (b) immediately
call pytest.skip (or raise) when supported_accelerator_type is None before
computing accelerator_type/template_name; update the block that currently sets
accelerator_type, template_name and the subsequent skip to first check
supported_accelerator_type and skip with a clear message rather than falling
back to "nvidia".
There was a problem hiding this comment.
@ssaleem-rh this seems like a legit comment. Can you please address it?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
I still see no changes here and find this comment resolved again. @kpunwatk can you please work with @ssaleem-rh here?
| accelerator_type = supported_accelerator_type.lower() if supported_accelerator_type else "nvidia" | ||
| template_name = accelerator_to_template.get(accelerator_type) | ||
|
|
||
| if not template_name: | ||
| pytest.skip(f"Unsupported accelerator type for vLLM: {supported_accelerator_type}") |
There was a problem hiding this comment.
@ssaleem-rh this seems like a legit comment. Can you please address it?
Replace manual timeout and stabilization loop with TimeoutSampler. Add specific exceptions (ResourceNotFoundError, UnexpectedResourceCountError). Use component=predictor label selector for pod filtering. Use collect_pod_information for better logging. Signed-off-by: Shehan Saleem <ssaleem@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
Addressed all review comments. Please take another look. |
| accelerator_type = supported_accelerator_type.lower() if supported_accelerator_type else "nvidia" | ||
| template_name = accelerator_to_template.get(accelerator_type) | ||
|
|
||
| if not template_name: | ||
| pytest.skip(f"Unsupported accelerator type for vLLM: {supported_accelerator_type}") |
There was a problem hiding this comment.
I still see no changes here and find this comment resolved again. @kpunwatk can you please work with @ssaleem-rh here?
| except TimeoutExpiredError as e: | ||
| LOGGER.error(f"vLLM pod failed to start within {max_wait_time} seconds") | ||
| collect_pod_information(pod=predictor_pod) | ||
| raise UnexpectedFailureError(f"vLLM model failed to load within {max_wait_time} seconds") from e |
There was a problem hiding this comment.
Please re-raise TimeoutExpiredError
| """Unexpected value found""" | ||
|
|
||
|
|
||
| class ResourceNotFoundError(Exception): |
There was a problem hiding this comment.
Please use from kubernetes.dynamic.exceptions import ResourceNotFoundError
Pull Request
Summary
Introduces GPU based LMEval integration testing and fixes a typo in the accelerator environment variable.
test_lmeval_gputo validate LMEval functionality on GPU-backed model deployments using vLLMServingRuntimeInferenceServiceLMEvalJobwait_for_vllm_model_readyutility to ensure model readinessskip_if_no_supported_accelerator_typefixture to tests/conftest.py for reuse across test modulesSUPPORTED_ACCLERATOR_TYPE→SUPPORTED_ACCELERATOR_TYPERelated Issues
How it has been tested
Test Results:
Successfully executed test_lmeval_gpu on a GPU-enabled cluster
Additional Requirements
Summary by CodeRabbit
New Features
Bug Fixes
Tests