Model Validation automation - v2#464
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis change updates the model serving test infrastructure to support per-model and default serving arguments and GPU count, as specified in a YAML configuration. The fixtures and test parameter builders are refactored to extract and propagate these values. Outdated test snapshot files are deleted, and the YAML config is expanded for more granular control. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (12)
💤 Files with no reviewable changes (12)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/wip', '/cherry-pick', '/verified', '/lgtm', '/hold', '/build-push-pr-image'} |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].1.json (1)
1-50: Same constantindexissue as flagged in the previous snapshottests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].1.json (1)
1-50: Same constantindexissue as flagged in the previous snapshottests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].1.json (1)
1-50: Same constantindexissue as flagged in the previous snapshot
🧹 Nitpick comments (5)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].1.json (1)
1-50: Snapshot size is large—consider trimming or compressingThis snapshot stores ~50 lines of generated text. Repositories can bloat quickly when we track full-length generation outputs for every prompt/version.
Options:
• Assert only key fields (e.g.,finish_reason, non-emptytext) via custom assertion helper.
• Store a gzipped fixture and decompress in-memory during tests.
Keeps diffs readable and repo thin.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json (1)
1-50: Same maintenance concern about verbose inference snapshotsThe file embeds multi-paragraph generations (renewable energy, Great Wall, Apollo 11, etc.).
Unless the exact wording is contractually required, snapshotting such long texts may create noisy diffs on every minor model drift. Recommend asserting shorter, deterministic substrings or using golden-hash comparisons instead.tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml (2)
81-88: Indentation inconsistency is harmless but noisy
default:children use 4-space indentation whereas the rest of the file uses 2 spaces. Mixing widths is legal YAML, yet standardising makes diffs cleaner.-default: - serving_arguments: +default: + serving_arguments:
1-80: None of the listed models exercise the new ≥2-GPU pathThe PR description emphasises support for models needing ≥2 GPUs, but every entry specifies
gpu_count: 1(or falls back to the default). Add at least one representative withgpu_count: 2to verify the new code path during CI.tests/model_serving/model_runtime/model_validation/conftest.py (1)
211-240: LGTM! Good fallback mechanism for serving configuration.The implementation correctly extracts per-model serving arguments with proper fallback to default configuration. The parameter passing to helper functions is consistent.
Consider adding validation for the expected YAML structure to ensure
serving_argumentscontains the expectedargsandgpu_countkeys, which would provide clearer error messages if the YAML format is incorrect.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
tests/conftest.py(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].1.json(1 hunks)tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].json(1 hunks)tests/model_serving/model_runtime/model_validation/conftest.py(6 hunks)tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (14)
📓 Common learnings
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#401
File: tests/model_registry/rest_api/mariadb/conftest.py:89-110
Timestamp: 2025-07-04T00:17:47.799Z
Learning: In tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should always use OAUTH_PROXY_CONFIG_DICT for the oauth_proxy parameter regardless of the is_model_registry_oauth parameter value, based on expected product behavior for MariaDB-backed ModelRegistry instances.
📚 Learning: for trustyai image validation tests: operator image tests require admin_client, related_images_refs,...
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.
Applied to files:
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].jsontests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: the grpc_predict_v2.proto file in utilities/manifests/common/ is from the official kserve v2 inferen...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#325
File: utilities/manifests/common/grpc_predict_v2.proto:16-16
Timestamp: 2025-06-03T06:31:46.614Z
Learning: The grpc_predict_v2.proto file in utilities/manifests/common/ is from the official KServe V2 inference protocol specification and should not be modified. The package declaration "inference" is standardized and should remain unchanged even if it triggers Buf linting errors about package directory mismatch.
Applied to files:
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].jsontests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json
📚 Learning: the grpc_predict_v2.proto file in utilities/manifests/common/ is from the official kserve v2 inferen...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#325
File: utilities/manifests/common/grpc_predict_v2.proto:16-16
Timestamp: 2025-06-03T06:31:46.614Z
Learning: The grpc_predict_v2.proto file in utilities/manifests/common/ is from the official KServe V2 inference protocol specification and should not be modified. The package declaration "inference" is standardized and should remain unchanged even if it triggers Buf linting errors about package directory mismatch. This file should be excluded from Buf linting as it's a reference implementation.
Applied to files:
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json
📚 Learning: in tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deplo...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fix...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the a...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fi...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the adm...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:666-676
Timestamp: 2025-07-17T15:41:54.284Z
Learning: In tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the admin_client parameter in their signatures, unlike some other Secret fixtures in the codebase. The user lugi0 confirmed this is the correct pattern for these specific fixtures.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#401
File: tests/model_registry/rest_api/mariadb/conftest.py:89-110
Timestamp: 2025-07-04T00:17:47.799Z
Learning: In tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should always use OAUTH_PROXY_CONFIG_DICT for the oauth_proxy parameter regardless of the is_model_registry_oauth parameter value, based on expected product behavior for MariaDB-backed ModelRegistry instances.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a bu...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#429
File: tests/model_registry/rbac/test_mr_rbac_sa.py:45-45
Timestamp: 2025-07-30T14:15:25.605Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a built-in assertion `assert len(mr_instances) >= 1` that ensures at least one model registry instance exists before returning the endpoint list. This validation makes bounds checking redundant when accessing the first element of the returned list in test methods.
Applied to files:
tests/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated m...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:0-0
Timestamp: 2025-07-17T15:42:26.275Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated Model Registry instance fixtures) do not require admin_client, db_deployment_1, or db_secret_1 parameters as explicit dependencies, even though these dependencies exist implicitly through the fixture dependency chain.
Applied to files:
tests/conftest.pytests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in the opendatahub-tests repository, prefer keeping test parameterization configurations inline rath...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: the helper `create_isvc` (used in tests/model_serving utilities) already waits until the created inf...
Learnt from: israel-hdez
PR: opendatahub-io/opendatahub-tests#346
File: tests/model_serving/model_server/inference_graph/conftest.py:85-92
Timestamp: 2025-06-11T16:40:11.593Z
Learning: The helper `create_isvc` (used in tests/model_serving utilities) already waits until the created InferenceService reports Condition READY=True before returning, so additional readiness waits in fixtures are unnecessary.
Applied to files:
tests/model_serving/model_runtime/model_validation/conftest.py
🔇 Additional comments (18)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].json (1)
3-3: Confirm model-ID canonicalisation rules
"id": "phi-4-quantized-w8a8"replaces the original dot-separated name (phi-4-quantized.w8a8) with dashes.
Before these snapshots become a contract, double-check that the test harness / runtime always performs exactly the same transform; otherwise look-ups by model-car code or external tooling may break once the real image reports the original identifier.If the dash-conversion is intentional, consider documenting the rule in the helper that builds the snapshot to avoid future confusion.
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json (1)
3-3: Truncated model ID – verify against upstream value
"id": "granite-3-1-8b-start"looks truncated (…starter-v2missing). If the snapshot generator deliberately limits IDs to 20 chars (or similar), all good—but please confirm that:
- The runtime really returns this shortened ID, not the full name.
- Any downstream logic (e.g., metrics, clean-up) uses the same shortened key.
A quick grep across the YAML config & fixtures for the full vs truncated string will reveal mismatches.
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].json (1)
3-3: Potential inconsistency: “instruc” suffix clippedThe snapshot stores
"id": "llama-3-1-8b-instruc"(missing trailing “t”). Please verify that the live VLLM endpoint really returns this exact string; otherwise the snapshot will cause false-negative regressions.
If the truncation is automatic, documenting the cut-off logic would help future reviewers.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].json (1)
3-3: Decimal replaced by dash – ensure uniform mapping
"id": "qwen2-5-7b-instruct"converts “2.5” → “2-5”. Validate that every consumer (YAML config, deployment label selectors, etc.) applies the same mapping; otherwise reconciliation steps may silently miss the model.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].json (1)
3-3: Snapshot ID deviates from file nameFile name suggests
granite-3.1-8b-base-quantized.w4a16, while snapshot records"granite-3-1-8b-base".
Confirm this is the exact string returned by the runtime; otherwise the snapshot may mask an upstream change or a parsing bug in the test harness.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].json (1)
3-4: Model identifier looks truncated—confirm alignment with runtime-reported nameThe
idfield isdeepseek-r1-distill, whereas the model under test (per the filename/YAML) appears to beDeepSeek-R1-Distill-Llama-8B-FP8-dynamic.
If vLLM intentionally normalises names, ignore; otherwise update the snapshot to avoid false positives when the runtime reports the full identifier.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].json (1)
3-4: Inconsistent model id vs test parameter
idisgranite-3-1-8b-instr(hyphens, shortened “instr”), while the test param usesgranite-3.1-8b-instruct-quantized.w8a8.
Double-check the expected identifier coming from the runtime to prevent brittle snapshot failures.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].json (1)
3-4: Verify model id normalisation
idismistral-7b-instruct(no version/quantisation), differing from the filename and YAML entry. Ensure this is the canonical identifier emitted by vLLM; if not, update to keep snapshots stable.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].json (1)
1-25: LGTM! Well-structured model metadata snapshot.This JSON snapshot properly captures the phi-4-quantized model's metadata including ID, token limits, permissions, and ownership information. The structure is consistent with OpenAI API model responses and serves as appropriate regression test data.
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].1.json (1)
1-51: LGTM! Valid inference output snapshot.This JSON snapshot captures diverse inference outputs from the phi-4-quantized model, including both empty responses with "stop" finish reasons and substantive text generation with "length" finish reason. The varied response patterns provide good coverage for regression testing of model inference behavior.
tests/conftest.py (2)
206-210: LGTM! Enhanced configuration flexibility.The change to extract both
argsandgpu_countfrom theserving_argumentsdictionary provides better support for per-model configuration, allowing different models to specify both their serving arguments and GPU requirements.
206-210: No consumers found for the updatedserving_argumentfixture—no action needed
A search across the entiretests/tree shows:
- No test functions outside
tests/conftest.pyinjectserving_argument.- The nested
tests/model_serving/model_runtime/model_validation/conftest.pydeclares its ownserving_argumentfixture, so it isn’t affected by the top‐level change.You can safely keep the fixture returning
(args, gpu_count)without updating any consumers.tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].1.json (1)
1-51: LGTM! Comprehensive inference output snapshot.This JSON snapshot captures diverse inference outputs from the Mistral-7B-Instruct model across multiple topics (sustainability, translations, poetry, history, AI ethics, space exploration). All responses show "length" finish reasons, indicating consistent token limit behavior. This provides excellent coverage for regression testing of model inference consistency.
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].1.json (1)
1-51: LGTM! Thorough model inference validation snapshot.This JSON snapshot effectively captures the Llama-3.1-8B-Instruct model's inference behavior across diverse topics (renewable energy, multilingual translations, poetry, historical facts, AI ethics, space exploration). All responses demonstrate consistent "length" finish reasons and high-quality content generation, providing robust regression test coverage for model inference consistency.
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].1.json (1)
1-50: Repeatedindexvalues may break downstream snapshot comparisonsEvery element in the array reports
"index": 0. Most OpenAI-style streaming APIs emit a monotonically-increasing index so that consumers can reconstruct the full response deterministically. Using a constant value loses ordering information and can cause flaky diffing when the snapshot is regenerated.If the test harness does not rely on the field, consider omitting it; otherwise emit the correct incremental value.
tests/model_serving/model_runtime/model_validation/conftest.py (3)
140-157: LGTM! Improved parameter extraction pattern.The refactoring to extract
runtime_argumentandgpu_countdirectly fromrequest.paramsimplifies the fixture dependency chain and aligns with the dynamic parameter handling approach described in the PR objectives.
160-177: LGTM! Consistent parameter handling.The addition of
argsandgpu_countparameters properly enables dynamic configuration per model. The parameter structure correctly matches what thedeployment_configfixture expects.
180-197: LGTM! Maintains consistency with raw deployment parameters.The function updates mirror those in
build_raw_params, ensuring both deployment types handle serving arguments and GPU count consistently.
|
@edwardquarm Can you remove the JSON outputs in the snapshot folder? We don't need them since we're going to change the models anyway. Let's keep the snapshot folder empty. |
|
Status of building tag latest: success. |
Jira: RHOAIENG-29982
Description
This PR is the second version of a test suite for automating the validation of modelcar OCI images using only vLLM runtimes.
Key features:
How Has This Been Tested?
Tested on an Openshift cluster
Merge criteria: