feat: add tests for llamastack lmeval provider#496
feat: add tests for llamastack lmeval provider#496dbasunag merged 3 commits intoopendatahub-io:mainfrom
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis change refactors and expands test infrastructure for LlamaStack and vLLM integration in the model explainability test suite. It introduces new pytest fixtures in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/wip', '/lgtm', '/hold', '/verified', '/build-push-pr-image', '/cherry-pick'} |
|
/verified |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/model_explainability/lm_eval/test_llamastack_lmeval_provider.py (2)
8-8: Remove unused constant.The
PII_REGEX_SHIELD_IDconstant is defined but never used in this test file. Consider removing it to avoid confusion.-PII_REGEX_SHIELD_ID = "regex"
29-29: Consider tracking the TODO in an issue.The TODO comment indicates planned functionality for evaluation runs. This would be valuable for comprehensive LMEval provider testing.
Would you like me to help create a GitHub issue to track the implementation of the
run_evaltest?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.gitignore(1 hunks)tests/model_explainability/conftest.py(2 hunks)tests/model_explainability/constants.py(1 hunks)tests/model_explainability/guardrails/conftest.py(2 hunks)tests/model_explainability/guardrails/test_guardrails.py(1 hunks)tests/model_explainability/guardrails/test_llamastack_fms_provider.py(6 hunks)tests/model_explainability/lm_eval/test_llamastack_lmeval_provider.py(1 hunks)
🧰 Additional context used
🧠 Learnings (15)
📓 Common learnings
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
📚 Learning: in tests/model_registry/rbac/test_mr_rbac_sa.py, bounds checking for model_registry_instance_rest_en...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#429
File: tests/model_registry/rbac/test_mr_rbac_sa.py:45-45
Timestamp: 2025-07-30T14:15:25.605Z
Learning: In tests/model_registry/rbac/test_mr_rbac_sa.py, bounds checking for model_registry_instance_rest_endpoint list access is not needed because upstream fixture validation already ensures endpoints exist before the tests execute. The Model Registry setup process validates endpoint availability, making additional bounds checks redundant.
Applied to files:
tests/model_explainability/guardrails/test_guardrails.pytests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in the opendatahub-tests repository, prefer keeping test parameterization configurations inline rath...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/model_explainability/guardrails/test_guardrails.pytests/model_explainability/guardrails/test_llamastack_fms_provider.py
📚 Learning: in model registry rbac tests, client instantiation tests are designed to verify the ability to creat...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#354
File: tests/model_registry/rbac/test_mr_rbac.py:64-77
Timestamp: 2025-06-16T11:26:53.789Z
Learning: In Model Registry RBAC tests, client instantiation tests are designed to verify the ability to create and use the MR python client, with actual API functionality testing covered by separate existing tests.
Applied to files:
tests/model_explainability/lm_eval/test_llamastack_lmeval_provider.pytests/model_explainability/guardrails/test_llamastack_fms_provider.py
📚 Learning: for trustyai image validation tests: operator image tests require admin_client, related_images_refs,...
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.
Applied to files:
tests/model_explainability/lm_eval/test_llamastack_lmeval_provider.pytests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fi...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:595-612
Timestamp: 2025-07-17T15:42:04.167Z
Learning: In tests/model_registry/conftest.py, the db_deployment_1 fixture (and similar duplicated resource fixtures) do not require the admin_client parameter or explicit dependencies on related fixtures like db_secret_1, db_pvc_1, and db_service_1, even though the original model_registry_db_deployment fixture includes these parameters.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated m...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:0-0
Timestamp: 2025-07-17T15:42:26.275Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 fixture (and similar duplicated Model Registry instance fixtures) do not require admin_client, db_deployment_1, or db_secret_1 parameters as explicit dependencies, even though these dependencies exist implicitly through the fixture dependency chain.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fix...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:733-770
Timestamp: 2025-07-17T15:42:23.880Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_1 and model_registry_instance_2 fixtures do not need explicit database dependency fixtures (like db_deployment_1, db_secret_1, etc.) in their function signatures. Pytest's dependency injection automatically handles the fixture dependencies when they reference db_name_1 and db_name_2 parameters. This is the correct pattern for these Model Registry instance fixtures.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deplo...
Learnt from: Snomaan6846
PR: opendatahub-io/opendatahub-tests#444
File: tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py:48-714
Timestamp: 2025-07-16T12:20:29.672Z
Learning: In tests/model_serving/model_runtime/mlserver/basic_model_deployment/test_mlserver_basic_model_deployment.py, the same get_deployment_config_dict() function is called twice in each pytest.param because different fixtures (mlserver_inference_service and mlserver_serving_runtime) need the same deployment configuration data. This duplication is intentional to provide identical configuration to multiple fixtures.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the a...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, the db_service_1 and db_service_2 fixtures do not require the admin_client parameter for Service resource creation, despite the existing model_registry_db_service fixture using client=admin_client. This inconsistency was confirmed as intentional by user lugi0.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the adm...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:666-676
Timestamp: 2025-07-17T15:41:54.284Z
Learning: In tests/model_registry/conftest.py, the db_secret_1 and db_secret_2 fixtures do not require the admin_client parameter in their signatures, unlike some other Secret fixtures in the codebase. The user lugi0 confirmed this is the correct pattern for these specific fixtures.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a bu...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#429
File: tests/model_registry/rbac/test_mr_rbac_sa.py:45-45
Timestamp: 2025-07-30T14:15:25.605Z
Learning: In tests/model_registry/conftest.py, the model_registry_instance_rest_endpoint fixture contains a built-in assertion `assert len(mr_instances) >= 1` that ensures at least one model registry instance exists before returning the endpoint list. This validation makes bounds checking redundant when accessing the first element of the returned list in test methods.
Applied to files:
tests/model_explainability/guardrails/conftest.py
📚 Learning: in tests/model_registry/rbac/conftest.py, predictable names are intentionally used for test resource...
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#354
File: tests/model_registry/rbac/conftest.py:166-175
Timestamp: 2025-06-16T11:25:39.599Z
Learning: In tests/model_registry/rbac/conftest.py, predictable names are intentionally used for test resources (like RoleBindings and groups) instead of random names. This design choice prioritizes exposing cleanup failures from previous test runs through name collisions rather than masking such issues with random names. The philosophy is that test failures should be observable and informative to help debug underlying infrastructure or cleanup issues.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/guardrails/test_llamastack_fms_provider.pytests/model_explainability/conftest.py
📚 Learning: the helper `create_isvc` (used in tests/model_serving utilities) already waits until the created inf...
Learnt from: israel-hdez
PR: opendatahub-io/opendatahub-tests#346
File: tests/model_serving/model_server/inference_graph/conftest.py:85-92
Timestamp: 2025-06-11T16:40:11.593Z
Learning: The helper `create_isvc` (used in tests/model_serving utilities) already waits until the created InferenceService reports Condition READY=True before returning, so additional readiness waits in fixtures are unnecessary.
Applied to files:
tests/model_explainability/guardrails/conftest.pytests/model_explainability/conftest.py
📚 Learning: in tests/model_registry/conftest.py, service resources can be created without explicitly passing the...
Learnt from: lugi0
PR: opendatahub-io/opendatahub-tests#446
File: tests/model_registry/conftest.py:579-591
Timestamp: 2025-07-17T15:43:04.876Z
Learning: In tests/model_registry/conftest.py, Service resources can be created without explicitly passing the admin_client parameter when using the context manager approach with "with Service()". The client parameter is optional for Service resource creation.
Applied to files:
tests/model_explainability/guardrails/conftest.py
🧬 Code Graph Analysis (1)
tests/model_explainability/guardrails/conftest.py (1)
tests/model_explainability/conftest.py (1)
llamastack_distribution(56-107)
🔇 Additional comments (9)
tests/model_explainability/constants.py (1)
1-1: LGTM! Good refactoring to centralize the constant.The centralization of
MNT_MODELSimproves maintainability by providing a single source of truth for this mount path across multiple test files..gitignore (1)
168-170: LGTM! Appropriate addition to gitignore.Adding
CLAUDE.mdunder the "AI Assistant Config Files" section is appropriate as these configuration files are typically user-specific and shouldn't be tracked in version control.tests/model_explainability/guardrails/test_guardrails.py (1)
10-10: LGTM! Proper use of centralized constant.The import of
MNT_MODELSfrom the centralized constants module eliminates code duplication and follows good practice for constant management.tests/model_explainability/lm_eval/test_llamastack_lmeval_provider.py (1)
32-49: LGTM! Basic benchmark registration test looks solid.The test correctly verifies model registration, benchmark registration, and validates the expected benchmark properties. The test logic is appropriate for validating the LlamaStack LMEval provider integration.
tests/model_explainability/guardrails/test_llamastack_fms_provider.py (3)
10-10: LGTM! Consistent with constant centralization.The import of
MNT_MODELSfrom the centralized constants module aligns with the broader refactoring effort.
17-40: LGTM! Fixture refactoring looks appropriate.The updates to use the consolidated LlamaStack fixtures (
llamastack_distribution,llamastack_client) and additional pytest markers align well with the fixture reusability improvements mentioned in the PR objectives.
54-124: LGTM! Test methods updated correctly for new fixtures.All test methods properly use the updated
llamastack_clientfixture while maintaining the same test logic. The transition from specific fixtures to more generic, reusable ones is well-executed.tests/model_explainability/guardrails/conftest.py (1)
284-287: LGTM! Clean fixture parameter refactoring.The fixture parameter has been properly renamed from
llamastack_distribution_trustyaito the more genericllamastack_distribution, aligning with the fixture consolidation into the parent conftest.py.tests/model_explainability/conftest.py (1)
180-204: Consider the implications ofwait_for_predictor_pods=FalseThe InferenceService is created with
wait_for_predictor_pods=False, which means the fixture may return before the predictor pods are ready. This could lead to race conditions in tests that immediately try to use the service.Consider either:
- Setting
wait_for_predictor_pods=Trueif the readiness check is reliable- Adding explicit pod readiness checks in tests that depend on this fixture
- Documenting why this is set to False if there's a specific reason (e.g., known timeout issues)
| .vscode/ | ||
|
|
||
| # AI Assistant Config Files | ||
| CLAUDE.md |
There was a problem hiding this comment.
Don't be sad @lugi0 , it's just to avoid committing my own CLAUDE.md just yet!!
| .vscode/ | ||
|
|
||
| # AI Assistant Config Files | ||
| CLAUDE.md |
|
Status of building tag latest: success. |
|
/cherry-pick 2.23 |
* feat: add tests for llamastack lmeval provider * add assertions * change ns name
|
Cherry pick action created PR #506 successfully 🎉! |
add tests for llamastack lmeval provider
Description
How Has This Been Tested?
Running the test on PSI cluster
Merge criteria: