Skip to content

Model Validation automation - v2#464

Merged
Raghul-M merged 9 commits intoopendatahub-io:mainfrom
edwardquarm:model-validation-v2
Aug 7, 2025
Merged

Model Validation automation - v2#464
Raghul-M merged 9 commits intoopendatahub-io:mainfrom
edwardquarm:model-validation-v2

Conversation

@edwardquarm
Copy link
Copy Markdown
Contributor

@edwardquarm edwardquarm commented Jul 25, 2025

Jira: RHOAIENG-29982

Description

This PR is the second version of a test suite for automating the validation of modelcar OCI images using only vLLM runtimes.

Key features:

  • Custom serving arguments
  • Support Model validation for larger models that require GPU >= 2 and Tensor parrallel size >= 2

How Has This Been Tested?

Tested on an Openshift cluster

export OCI_REGISTRY_PULL_SECRET=
uv run pytest -vv tests/model_serving/model_runtime/model_validation/test_modelvalidation.py \
  --model_car_yaml_path=./tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml \
--vllm-runtime-image=quay.io/modh/vllm@sha256:47b7e95b51d8d6a82de9d6bbc5eb61a794dc8d2998e8186bd6bd713517c97f08 \
--supported-accelerator-type=Nvidia \
--registry-host=registry.redhat.io \
  --snapshot-update \
  --log-file=pytest-logs.txt

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 25, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced test configuration to support dynamic runtime arguments and GPU count for model serving tests.
    • Expanded sample configuration with additional models and customizable serving parameters.
  • Tests

    • Updated test fixtures and parameterization to utilize new configuration options for runtime arguments and GPU allocation.
    • Removed outdated test snapshot files to reflect changes in test data and configuration structure.

Walkthrough

This change updates the model serving test infrastructure to support per-model and default serving arguments and GPU count, as specified in a YAML configuration. The fixtures and test parameter builders are refactored to extract and propagate these values. Outdated test snapshot files are deleted, and the YAML config is expanded for more granular control.

Changes

Cohort / File(s) Change Summary
Serving Argument Fixture Refactor
tests/conftest.py
The serving_argument fixture now returns a tuple of argument list and GPU count, extracting args and gpu_count from the serving_arguments dictionary in the YAML config.
Model Validation Test Parameter Propagation
tests/model_serving/model_runtime/model_validation/conftest.py
Refactored fixtures and test parameter builders to accept and propagate args and gpu_count from the YAML config, removing the previous serving_argument parameter and dynamically setting GPU count. Default values are now sourced from the config's default section if not specified per-model.
Expanded and Restructured Modelcar YAML Config
tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml
The YAML config is expanded with multiple model entries, each optionally specifying serving_arguments (args and gpu_count). The global serving arguments are replaced by a default section for fallback.
Snapshot Deletions: Model Validation Outputs
tests/model_serving/model_runtime/model_validation/__snapshots__/*
All JSON snapshot files for model validation tests are deleted. These files contained reference outputs for various models and scenarios, including text completions and model metadata, and are now removed from the repository.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

  • The code changes involve moderate complexity due to refactoring of fixtures, changes to parameter propagation, and YAML restructuring. The bulk of the deletions are test snapshots, which are low-effort to review.

Possibly related PRs

  • model validation automation v1 #340: Introduced the original serving_argument fixture and model validation automation framework. This PR directly builds upon and modifies those components to support tuple return and GPU count propagation.

Suggested labels

lgtm-by-dbasunag

Suggested reviewers

  • Raghul-M

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae2bb14 and f0710ba.

📒 Files selected for processing (12)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-raw].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-raw].json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-llama-3-1-8b-instruct1.5-raw].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-llama-3-1-8b-instruct1.5-raw].json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-raw].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-raw].json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-serverless].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-serverless].json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-llama-3-1-8b-instruct1.5-serverless].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-llama-3-1-8b-instruct1.5-serverless].json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-serverless].1.json (0 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-serverless].json (0 hunks)
💤 Files with no reviewable changes (12)
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-llama-3-1-8b-instruct1.5-serverless].1.json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-serverless].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-raw].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-raw].1.json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-llama-3-1-8b-instruct1.5-serverless].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-raw].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-llama-3-1-8b-instruct1.5-raw].1.json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-raw].1.json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarRaw.test_oci_modelcar_raw_openai_inference[modelcar-llama-3-1-8b-instruct1.5-raw].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-serverless].json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-granite-3-1-8b-base-quantized-w4a161.5-serverless].1.json
  • tests/model_serving/model_runtime/model_validation/snapshots/test_modelvalidation/TestVLLMModelcarServerless.test_oci_modelcar_serverless_openai_inference[modelcar-mistral-7b-instruct-v0-3-quantized-w4a161.5-serverless].1.json
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/wip', '/cherry-pick', '/verified', '/lgtm', '/hold', '/build-push-pr-image'}

@github-actions github-actions bot added size/xxl and removed size/m labels Jul 29, 2025
@edwardquarm edwardquarm marked this pull request as ready for review August 5, 2025 18:04
@edwardquarm edwardquarm requested a review from a team as a code owner August 5, 2025 18:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].1.json (1)

1-50: Same constant index issue as flagged in the previous snapshot

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].1.json (1)

1-50: Same constant index issue as flagged in the previous snapshot

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].1.json (1)

1-50: Same constant index issue as flagged in the previous snapshot

🧹 Nitpick comments (5)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].1.json (1)

1-50: Snapshot size is large—consider trimming or compressing

This snapshot stores ~50 lines of generated text. Repositories can bloat quickly when we track full-length generation outputs for every prompt/version.
Options:
• Assert only key fields (e.g., finish_reason, non-empty text) via custom assertion helper.
• Store a gzipped fixture and decompress in-memory during tests.
Keeps diffs readable and repo thin.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json (1)

1-50: Same maintenance concern about verbose inference snapshots

The file embeds multi-paragraph generations (renewable energy, Great Wall, Apollo 11, etc.).
Unless the exact wording is contractually required, snapshotting such long texts may create noisy diffs on every minor model drift. Recommend asserting shorter, deterministic substrings or using golden-hash comparisons instead.

tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml (2)

81-88: Indentation inconsistency is harmless but noisy

default: children use 4-space indentation whereas the rest of the file uses 2 spaces. Mixing widths is legal YAML, yet standardising makes diffs cleaner.

-default:
-    serving_arguments:
+default:
+  serving_arguments:

1-80: None of the listed models exercise the new ≥2-GPU path

The PR description emphasises support for models needing ≥2 GPUs, but every entry specifies gpu_count: 1 (or falls back to the default). Add at least one representative with gpu_count: 2 to verify the new code path during CI.

tests/model_serving/model_runtime/model_validation/conftest.py (1)

211-240: LGTM! Good fallback mechanism for serving configuration.

The implementation correctly extracts per-model serving arguments with proper fallback to default configuration. The parameter passing to helper functions is consistent.

Consider adding validation for the expected YAML structure to ensure serving_arguments contains the expected args and gpu_count keys, which would provide clearer error messages if the YAML format is incorrect.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4afb16 and c4584a9.

📒 Files selected for processing (21)
  • tests/conftest.py (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].1.json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].json (1 hunks)
  • tests/model_serving/model_runtime/model_validation/conftest.py (6 hunks)
  • tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (14)
📓 Common learnings
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#401
File: tests/model_registry/rest_api/mariadb/conftest.py:89-110
Timestamp: 2025-07-04T00:17:47.799Z
Learning: In tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should always use OAUTH_PROXY_CONFIG_DICT for the oauth_proxy parameter regardless of the is_model_registry_oauth parameter value, based on expected product behavior for MariaDB-backed ModelRegistry instances.
📚 Learning: for trustyai image validation tests: operator image tests require admin_client, related_images_refs,...
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.

Applied to files:

  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: the grpc_predict_v2.proto file in utilities/manifests/common/ is from the official kserve v2 inferen...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#325
File: utilities/manifests/common/grpc_predict_v2.proto:16-16
Timestamp: 2025-06-03T06:31:46.614Z
Learning: The grpc_predict_v2.proto file in utilities/manifests/common/ is from the official KServe V2 inference protocol specification and should not be modified. The package declaration "inference" is standardized and should remain unchanged even if it triggers Buf linting errors about package directory mismatch.

Applied to files:

  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json
  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json
📚 Learning: the grpc_predict_v2.proto file in utilities/manifests/common/ is from the official kserve v2 inferen...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#325
File: utilities/manifests/common/grpc_predict_v2.proto:16-16
Timestamp: 2025-06-03T06:31:46.614Z
Learning: The grpc_predict_v2.proto file in utilities/manifests/common/ is from the official KServe V2 inference protocol specification and should not be modified. The package declaration "inference" is standardized and should remain unchanged even if it triggers Buf linting errors about package directory mismatch. This file should be excluded from Buf linting as it's a reference implementation.

Applied to files:

  • tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].1.json
📚 Learning: in tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deplo...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fix...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the a...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fi...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the adm...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:666-676
Timestamp: 2025-07-17T15:41:54.284Z
Learning: In tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the admin_client parameter in their signatures, unlike some other Secret fixtures in the codebase. The user lugi0 confirmed this is the correct pattern for these specific fixtures.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#401
File: tests/model_registry/rest_api/mariadb/conftest.py:89-110
Timestamp: 2025-07-04T00:17:47.799Z
Learning: In tests/model_registry/rest_api/mariadb/conftest.py, the model_registry_with_mariadb fixture should always use OAUTH_PROXY_CONFIG_DICT for the oauth_proxy parameter regardless of the is_model_registry_oauth parameter value, based on expected product behavior for MariaDB-backed ModelRegistry instances.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a bu...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#429
File: tests/model_registry/rbac/test_mr_rbac_sa.py:45-45
Timestamp: 2025-07-30T14:15:25.605Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a built-in assertion `assert len(mr_instances) >= 1` that ensures at least one model registry instance exists before returning the endpoint list. This validation makes bounds checking redundant when accessing the first element of the returned list in test methods.

Applied to files:

  • tests/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated m...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:0-0
Timestamp: 2025-07-17T15:42:26.275Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated Model Registry instance fixtures) do not require admin_client, db_deployment_1, or db_secret_1 parameters as explicit dependencies, even though these dependencies exist implicitly through the fixture dependency chain.

Applied to files:

  • tests/conftest.py
  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: in the opendatahub-tests repository, prefer keeping test parameterization configurations inline rath...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/model_serving/model_runtime/model_validation/conftest.py
📚 Learning: the helper `create_isvc` (used in tests/model_serving utilities) already waits until the created inf...
Learnt from: israel-hdez
PR: opendatahub-io/opendatahub-tests#346
File: tests/model_serving/model_server/inference_graph/conftest.py:85-92
Timestamp: 2025-06-11T16:40:11.593Z
Learning: The helper `create_isvc` (used in tests/model_serving utilities) already waits until the created InferenceService reports Condition READY=True before returning, so additional readiness waits in fixtures are unnecessary.

Applied to files:

  • tests/model_serving/model_runtime/model_validation/conftest.py
🔇 Additional comments (18)
tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w8a8-raw].json (1)

3-3: Confirm model-ID canonicalisation rules

"id": "phi-4-quantized-w8a8" replaces the original dot-separated name (phi-4-quantized.w8a8) with dashes.
Before these snapshots become a contract, double-check that the test harness / runtime always performs exactly the same transform; otherwise look-ups by model-car code or external tooling may break once the real image reports the original identifier.

If the dash-conversion is intentional, consider documenting the rule in the helper that builds the snapshot to avoid future confusion.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-starter-v2-raw].json (1)

3-3: Truncated model ID – verify against upstream value

"id": "granite-3-1-8b-start" looks truncated (…starter-v2 missing). If the snapshot generator deliberately limits IDs to 20 chars (or similar), all good—but please confirm that:

  1. The runtime really returns this shortened ID, not the full name.
  2. Any downstream logic (e.g., metrics, clean-up) uses the same shortened key.

A quick grep across the YAML config & fixtures for the full vs truncated string will reveal mismatches.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].json (1)

3-3: Potential inconsistency: “instruc” suffix clipped

The snapshot stores "id": "llama-3-1-8b-instruc" (missing trailing “t”). Please verify that the live VLLM endpoint really returns this exact string; otherwise the snapshot will cause false-negative regressions.
If the truncation is automatic, documenting the cut-off logic would help future reviewers.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].json (1)

3-3: Decimal replaced by dash – ensure uniform mapping

"id": "qwen2-5-7b-instruct" converts “2.5” → “2-5”. Validate that every consumer (YAML config, deployment label selectors, etc.) applies the same mapping; otherwise reconciliation steps may silently miss the model.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-base-quantized.w4a16-raw].json (1)

3-3: Snapshot ID deviates from file name

File name suggests granite-3.1-8b-base-quantized.w4a16, while snapshot records "granite-3-1-8b-base".
Confirm this is the exact string returned by the runtime; otherwise the snapshot may mask an upstream change or a parsing bug in the test harness.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[DeepSeek-R1-Distill-Llama-8B-FP8-dynamic-raw].json (1)

3-4: Model identifier looks truncated—confirm alignment with runtime-reported name

The id field is deepseek-r1-distill, whereas the model under test (per the filename/YAML) appears to be DeepSeek-R1-Distill-Llama-8B-FP8-dynamic.
If vLLM intentionally normalises names, ignore; otherwise update the snapshot to avoid false positives when the runtime reports the full identifier.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[granite-3.1-8b-instruct-quantized.w8a8-raw].json (1)

3-4: Inconsistent model id vs test parameter

id is granite-3-1-8b-instr (hyphens, shortened “instr”), while the test param uses granite-3.1-8b-instruct-quantized.w8a8.
Double-check the expected identifier coming from the runtime to prevent brittle snapshot failures.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].json (1)

3-4: Verify model id normalisation

id is mistral-7b-instruct (no version/quantisation), differing from the filename and YAML entry. Ensure this is the canonical identifier emitted by vLLM; if not, update to keep snapshots stable.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].json (1)

1-25: LGTM! Well-structured model metadata snapshot.

This JSON snapshot properly captures the phi-4-quantized model's metadata including ID, token limits, permissions, and ownership information. The structure is consistent with OpenAI API model responses and serves as appropriate regression test data.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[phi-4-quantized.w4a16-raw].1.json (1)

1-51: LGTM! Valid inference output snapshot.

This JSON snapshot captures diverse inference outputs from the phi-4-quantized model, including both empty responses with "stop" finish reasons and substantive text generation with "length" finish reason. The varied response patterns provide good coverage for regression testing of model inference behavior.

tests/conftest.py (2)

206-210: LGTM! Enhanced configuration flexibility.

The change to extract both args and gpu_count from the serving_arguments dictionary provides better support for per-model configuration, allowing different models to specify both their serving arguments and GPU requirements.


206-210: No consumers found for the updated serving_argument fixture—no action needed
A search across the entire tests/ tree shows:

  • No test functions outside tests/conftest.py inject serving_argument.
  • The nested tests/model_serving/model_runtime/model_validation/conftest.py declares its own serving_argument fixture, so it isn’t affected by the top‐level change.

You can safely keep the fixture returning (args, gpu_count) without updating any consumers.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Mistral-7B-Instruct-v0.3-quantized.w4a16-raw].1.json (1)

1-51: LGTM! Comprehensive inference output snapshot.

This JSON snapshot captures diverse inference outputs from the Mistral-7B-Instruct model across multiple topics (sustainability, translations, poetry, history, AI ethics, space exploration). All responses show "length" finish reasons, indicating consistent token limit behavior. This provides excellent coverage for regression testing of model inference consistency.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Llama-3.1-8B-Instruct-raw].1.json (1)

1-51: LGTM! Thorough model inference validation snapshot.

This JSON snapshot effectively captures the Llama-3.1-8B-Instruct model's inference behavior across diverse topics (renewable energy, multilingual translations, poetry, historical facts, AI ethics, space exploration). All responses demonstrate consistent "length" finish reasons and high-quality content generation, providing robust regression test coverage for model inference consistency.

tests/model_serving/model_runtime/model_validation/__snapshots__/test_modelvalidation/TestVLLMModelCarRaw.test_oci_model_car_raw_openai_inference[Qwen2.5-7B-Instruct-quantized.w8a8-raw].1.json (1)

1-50: Repeated index values may break downstream snapshot comparisons

Every element in the array reports "index": 0. Most OpenAI-style streaming APIs emit a monotonically-increasing index so that consumers can reconstruct the full response deterministically. Using a constant value loses ordering information and can cause flaky diffing when the snapshot is regenerated.

If the test harness does not rely on the field, consider omitting it; otherwise emit the correct incremental value.

tests/model_serving/model_runtime/model_validation/conftest.py (3)

140-157: LGTM! Improved parameter extraction pattern.

The refactoring to extract runtime_argument and gpu_count directly from request.param simplifies the fixture dependency chain and aligns with the dynamic parameter handling approach described in the PR objectives.


160-177: LGTM! Consistent parameter handling.

The addition of args and gpu_count parameters properly enables dynamic configuration per model. The parameter structure correctly matches what the deployment_config fixture expects.


180-197: LGTM! Maintains consistency with raw deployment parameters.

The function updates mirror those in build_raw_params, ensuring both deployment types handle serving arguments and GPU count consistently.

Comment thread tests/conftest.py Outdated
Comment thread tests/model_serving/model_runtime/model_validation/sample_modelcar_config.yaml Outdated
Comment thread pytest-logs.txt Outdated
Comment thread tests/model_serving/model_runtime/model_validation/conftest.py Outdated
@github-actions github-actions bot added size/xl and removed size/xxl labels Aug 5, 2025
@Raghul-M
Copy link
Copy Markdown
Contributor

Raghul-M commented Aug 6, 2025

@edwardquarm Can you remove the JSON outputs in the snapshot folder? We don't need them since we're going to change the models anyway. Let's keep the snapshot folder empty.

dbasunag
dbasunag previously approved these changes Aug 6, 2025
@Raghul-M Raghul-M enabled auto-merge (squash) August 6, 2025 14:39
@Raghul-M Raghul-M merged commit 0d9e2c7 into opendatahub-io:main Aug 7, 2025
8 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 7, 2025

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants