Add HF positive tests with wildcard#1003
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/lgtm', '/wip', '/cherry-pick', '/verified', '/hold', '/build-push-pr-image'} |
📝 WalkthroughWalkthroughAdds a pytest fixture to count HuggingFace models matching include/exclude patterns, introduces retry-backed utilities to wait for HF model import and catalog retrieval match (including pod log inspection), and adds parameterized tests validating inclusion/exclusion pattern behavior against the model catalog. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (4)tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/test_*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/utils.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/conftest.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧬 Code graph analysis (1)tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
🪛 Ruff (0.14.11)tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py130-130: Unused method argument: (ARG002) 133-133: Unused method argument: (ARG002) 🔇 Additional comments (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
@tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py:
- Around line 59-150: The test test_hugging_face_models currently catches
TimeoutExpiredError and falls back to checking pod logs via
get_model_catalog_pod() and a hardcoded log string, which masks API failures and
couples the test to log text; remove the TimeoutExpiredError fallback so the
test relies solely on wait_for_huggingface_retrival_match() (or increase its
timeout there if needed) and let the test fail if the catalog API times out, and
also correct the huggingface_api fixture type annotation from bool to HfApi (or
remove the unused parameter) so the fixture typing matches actual return type.
In @tests/model_registry/model_catalog/huggingface/utils.py:
- Around line 131-135: Update the docstring for get_model_catalog_pod to
accurately describe what the function returns: replace the misleading "Get the
PostgreSQL pod for model catalog database." with something like "Get the model
catalog pod in the given namespace." Keep the rest of the function unchanged
(catalog_pods, Pod.get, label_selector and return of catalog_pods[0]) so only
the docstring text is corrected.
🧹 Nitpick comments (1)
tests/model_registry/model_catalog/huggingface/conftest.py (1)
14-36: Consider adding error handling for HuggingFace API calls.The fixture calls
huggingface_api.list_models()without error handling. Network issues or API rate limits could cause test failures that are hard to diagnose.🛡️ Proposed error handling
@pytest.fixture() def expected_num_models_from_hf_api(request: pytest.FixtureRequest, huggingface_api: HfApi) -> int: excluded_str = request.param.get("excluded_str") included_str = request.param.get("included_str") - models = huggingface_api.list_models(author=request.param["org_name"], limit=10000) + try: + models = huggingface_api.list_models(author=request.param["org_name"], limit=10000) + except Exception as e: + LOGGER.error(f"Failed to fetch models from HuggingFace API: {e}") + raise model_list = []
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/model_registry/model_catalog/conftest.pytests/model_registry/model_catalog/huggingface/conftest.pytests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.pytests/model_registry/model_catalog/huggingface/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/model_registry/model_catalog/conftest.py (2)
tests/model_registry/utils.py (1)
wait_for_model_catalog_api(718-727)tests/model_registry/conftest.py (1)
model_registry_rest_headers(399-400)
🪛 Ruff (0.14.10)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
126-126: Unused method argument: updated_catalog_config_map_scope_function
(ARG002)
129-129: Unused method argument: huggingface_api
(ARG002)
🔇 Additional comments (3)
tests/model_registry/model_catalog/conftest.py (1)
389-390: LGTM! Ensures API readiness after teardown.The addition of
wait_for_model_catalog_apiafter teardown aligns with the pattern used in other fixtures and ensures the catalog API is fully operational before the next test begins.tests/model_registry/model_catalog/huggingface/conftest.py (1)
3-5: LGTM! Proper logging setup.Logger initialization follows the established pattern in the codebase.
tests/model_registry/model_catalog/huggingface/utils.py (1)
3-12: LGTM! Clean imports and logging setup.The new imports support HuggingFace integration and retry logic appropriately.
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
40-57: Incorrect type annotation forhuggingface_apiparameter.The
huggingface_apifixture (defined in conftest.py) returns anHfApiinstance, not abool. The type annotation should match the actual fixture return type.📝 Proposed fix
def test_huggingface_model_metadata( self: Self, updated_catalog_config_map: tuple[ConfigMap, str, str], model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], expected_catalog_values: dict[str, str], - huggingface_api: bool, + huggingface_api: HfApi, ):Add the import at the top of the file:
from huggingface_hub import HfApi
🧹 Nitpick comments (4)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
60-145: Class docstring missing and test docstring doesn't follow conventions.Per coding guidelines, tests must have docstrings using Given-When-Then format for behavioral clarity. Additionally:
- Class docstring:
TestHFPatternMatchinglacks a docstring explaining what it tests- Test docstring: The current docstring says "excluded models" but the test covers multiple patterns including wildcards, allowed organizations, and inclusion patterns
- Type annotation: Same
huggingface_api: boolissue as above - should beHfApiNote: The static analysis flags
updated_catalog_config_map_scope_functionandhuggingface_apias unused, but these are required — the former triggers the indirect parametrization fixture, and the latter is a dependency forexpected_num_models_from_hf_api.📝 Proposed improvements
class TestHFPatternMatching: + """Tests for HuggingFace catalog pattern matching including wildcards, org filtering, and inclusion/exclusion.""" + `@pytest.mark.parametrize`( ... ) def test_hugging_face_models( self: Self, updated_catalog_config_map_scope_function: ConfigMap, model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], - huggingface_api: bool, + huggingface_api: HfApi, expected_num_models_from_hf_api: int, ): """ - Test that excluded models do not appear in the catalog API response + Test HuggingFace model pattern matching in catalog configurations. + + Given: A catalog configuration with inclusion/exclusion patterns + When: Models are imported from HuggingFace + Then: The catalog API returns only models matching the configured patterns """tests/model_registry/model_catalog/huggingface/utils.py (3)
104-127: Typo in function name and missing docstring.
- Typo:
wait_for_huggingface_retrival_matchshould bewait_for_huggingface_retrieval_match("retrieval" not "retrival")- Missing docstring: Per coding guidelines, utility functions should have Google-format docstrings
- Hardcoded page size:
pageSize=1000on line 112 could truncate results if the HF organization has more models📝 Proposed fix
`@retry`(wait_timeout=60, sleep=5) -def wait_for_huggingface_retrival_match( +def wait_for_huggingface_retrieval_match( source_id: str, model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], expected_num_models_from_hf_api: int, ) -> bool | None: + """Poll the catalog API until the model count matches the expected HuggingFace count. + + Args: + source_id: The catalog source identifier. + model_catalog_rest_url: List of catalog API base URLs. + model_registry_rest_headers: Headers for API authentication. + expected_num_models_from_hf_api: Expected number of models from HuggingFace. + + Returns: + True if counts match, None otherwise (triggers retry). + """ # Get all models from the catalog API for the given source - url = f"{model_catalog_rest_url[0]}models?source={source_id}&pageSize=1000" + url = f"{model_catalog_rest_url[0]}models?source={source_id}&pageSize=10000"Note: The caller in
test_huggingface_model_validation.pywill also need to be updated to use the corrected function name.
130-133: Add docstring toget_model_catalog_pod.The function lacks a docstring explaining its purpose and parameters. As noted in past reviews, the previous docstring incorrectly mentioned PostgreSQL.
📝 Proposed fix
def get_model_catalog_pod(namespace: str = "rhoai-model-registries") -> Pod: + """Get the model catalog pod from the specified namespace. + + Args: + namespace: Kubernetes namespace to search. Defaults to "rhoai-model-registries". + + Returns: + The first model catalog Pod found. + + Raises: + AssertionError: If no model catalog pod is found. + """ catalog_pods = list(Pod.get(namespace=namespace, label_selector="app.kubernetes.io/name=model-catalog")) assert catalog_pods, f"No model catalog pod found in namespace {namespace}" return catalog_pods[0]
136-146: Log level and verbosity issues inwait_for_hugging_face_model_import.
- Incorrect log levels: Lines 138 and 142 use
LOGGER.warningfor informational messages. UseLOGGER.infoinstead since these are not warning conditions.- Excessive logging: Line 145 dumps the entire pod log on every retry iteration, which could be very verbose and flood logs.
- Missing docstring: Add Google-format docstring per coding guidelines.
📝 Proposed fix
`@retry`(wait_timeout=60, sleep=5) def wait_for_hugging_face_model_import(hf_id: str, expected_num_models_from_hf_api: int) -> bool: - LOGGER.warning("Checking pod log for model import information") + """Wait for HuggingFace model import to complete by checking pod logs. + + Args: + hf_id: The HuggingFace catalog source identifier. + expected_num_models_from_hf_api: Expected number of models to be imported. + + Returns: + True if import is confirmed, False otherwise (triggers retry). + """ + LOGGER.info("Checking pod log for model import information") pod = get_model_catalog_pod() log = pod.log(container="catalog") if f"{hf_id}: loaded {expected_num_models_from_hf_api} models" in log and f"{hf_id}: cleaned up 0 models" in log: - LOGGER.warning(f"Found log entry confirming model(s) imported for id: {hf_id}") + LOGGER.info(f"Found log entry confirming model(s) imported for id: {hf_id}") return True else: - LOGGER.warning(f"No relevant log entry: {log}") + LOGGER.debug("Import not yet complete, retrying...") return False
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.pytests/model_registry/model_catalog/huggingface/utils.py
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Every test MUST have a docstring explaining what it tests
Apply relevant markers frompytest.ini: tier (smoke,sanity,tier1,tier2), component (model_serving,model_registry,llama_stack), infrastructure (gpu,parallel,slow)
Use Given-When-Then format in test docstrings for behavioral clarity
Use openshift-python-wrapper for all Kubernetes API calls
Kubernetes resource lifecycle MUST use context managers to ensure cleanup
Add type annotations to test code and fixtures (mypy strict enforced)
Write Google-format docstrings for tests and fixtures
Files:
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.pytests/model_registry/model_catalog/huggingface/utils.py
tests/**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not import heavy resources at module level; defer to fixture scope
Files:
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
tests/**/utils.py
📄 CodeRabbit inference engine (AGENTS.md)
Component-specific utility functions should be placed in component-level utils.py files
Files:
tests/model_registry/model_catalog/huggingface/utils.py
🧬 Code graph analysis (1)
tests/model_registry/model_catalog/huggingface/utils.py (4)
tests/model_registry/utils.py (1)
execute_get_command(730-738)tests/model_registry/model_catalog/conftest.py (1)
model_catalog_rest_url(413-422)tests/model_registry/conftest.py (1)
model_registry_rest_headers(399-400)tests/model_registry/model_catalog/huggingface/conftest.py (1)
expected_num_models_from_hf_api(14-36)
🪛 Ruff (0.14.11)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
127-127: Unused method argument: updated_catalog_config_map_scope_function
(ARG002)
130-130: Unused method argument: huggingface_api
(ARG002)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
40-57: Pre-existing type annotation issue.Line 46 has
huggingface_api: boolbut the fixture returnsHfApi()and the functionassert_huggingface_values_matches_model_catalog_api_valuesexpectsHfApi. This would fail mypy strict checking.📝 Suggested fix
def test_huggingface_model_metadata( self: Self, updated_catalog_config_map: tuple[ConfigMap, str, str], model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], expected_catalog_values: dict[str, str], - huggingface_api: bool, + huggingface_api: HfApi, ):Note: You'll need to add the import:
from huggingface_hub import HfApi
♻️ Duplicate comments (2)
tests/model_registry/model_catalog/huggingface/utils.py (1)
130-133: Add docstring for the utility function.The function lacks documentation. Per coding guidelines, utility functions should have Google-format docstrings.
📝 Suggested docstring
def get_model_catalog_pod(namespace: str = "rhoai-model-registries") -> Pod: + """ + Get the model catalog pod from the specified namespace. + + Args: + namespace: Kubernetes namespace to search. Defaults to "rhoai-model-registries". + + Returns: + The first matching model catalog Pod. + + Raises: + AssertionError: If no model catalog pod is found. + """ catalog_pods = list(Pod.get(namespace=namespace, label_selector="app.kubernetes.io/name=model-catalog")) assert catalog_pods, f"No model catalog pod found in namespace {namespace}" return catalog_pods[0]tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
125-145: Fix type annotation and improve docstring.
Incorrect type annotation (line 130):
huggingface_api: boolshould beHfApisince the fixture returnsHfApi(). Alternatively, remove the parameter since it's only used indirectly throughexpected_num_models_from_hf_api.Docstring update: The docstring says "excluded models" but the test validates wildcard, allowed organization, exclusion, and inclusion patterns. Per coding guidelines, use Given-When-Then format.
Unused parameters:
updated_catalog_config_map_scope_functionandhuggingface_apiare flagged by static analysis as unused. They're needed for fixture activation but could be documented or handled viausefixtures.📝 Suggested improvements
+ `@pytest.mark.usefixtures`("updated_catalog_config_map_scope_function") def test_hugging_face_models( self: Self, - updated_catalog_config_map_scope_function: ConfigMap, model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], - huggingface_api: bool, expected_num_models_from_hf_api: int, ): """ - Test that excluded models do not appear in the catalog API response + Test HuggingFace catalog pattern matching behavior. + + Given: A catalog configuration with include/exclude patterns + When: Models are imported from HuggingFace + Then: The catalog API returns only models matching the pattern criteria """
🧹 Nitpick comments (3)
tests/model_registry/model_catalog/huggingface/utils.py (2)
104-127: Add docstring and consider explicit return value.Per coding guidelines, utility functions should have Google-format docstrings. The function is missing documentation explaining its purpose, parameters, and return behavior.
Additionally, the function implicitly returns
Nonewhen counts don't match (causing retry), but the return type hintsbool | None. Consider explicitly returningFalsefor clarity.📝 Suggested docstring and explicit return
`@retry`(wait_timeout=60, sleep=5) def wait_for_huggingface_retrival_match( source_id: str, model_catalog_rest_url: list[str], model_registry_rest_headers: dict[str, str], expected_num_models_from_hf_api: int, ) -> bool | None: + """ + Poll the catalog API until model count matches expected HuggingFace count. + + Retries for up to 60 seconds with 5-second intervals. + + Args: + source_id: The catalog source identifier to query. + model_catalog_rest_url: List of catalog API base URLs. + model_registry_rest_headers: HTTP headers for authentication. + expected_num_models_from_hf_api: Expected number of models from HF API. + + Returns: + True if counts match, None otherwise (triggers retry). + """ # Get all models from the catalog API for the given source ... LOGGER.warning( f"Expected {expected_num_models_from_hf_api} " "models to be present in response. " f"Found {response['size']}. Models in " f"response: {models_response}" ) + return None
136-146: Add docstring and adjust log levels.The function lacks documentation. Additionally, log levels appear inconsistent:
- Line 138: "Checking pod log" should be
LOGGER.info(routine operation)- Line 142: Success confirmation should be
LOGGER.info(expected outcome)- Line 145: Failure case appropriately uses
warning📝 Suggested improvements
`@retry`(wait_timeout=60, sleep=5) def wait_for_hugging_face_model_import(hf_id: str, expected_num_models_from_hf_api: int) -> bool: - LOGGER.warning("Checking pod log for model import information") + """ + Wait for HuggingFace model import confirmation in pod logs. + + Polls the model catalog pod logs for import completion message. + Retries for up to 60 seconds with 5-second intervals. + + Args: + hf_id: The HuggingFace source identifier. + expected_num_models_from_hf_api: Expected number of imported models. + + Returns: + True if import confirmed in logs, False otherwise (triggers retry). + """ + LOGGER.info("Checking pod log for model import information") pod = get_model_catalog_pod() log = pod.log(container="catalog") if f"{hf_id}: loaded {expected_num_models_from_hf_api} models" in log and f"{hf_id}: cleaned up 0 models" in log: - LOGGER.warning(f"Found log entry confirming model(s) imported for id: {hf_id}") + LOGGER.info(f"Found log entry confirming model(s) imported for id: {hf_id}") return True else: LOGGER.warning(f"No relevant log entry: {log}") return Falsetests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (1)
60-61: Add class docstring and pytest markers.Per coding guidelines:
- Every test class should have a docstring explaining what it tests
- Tests should apply relevant markers from
pytest.ini(tier markers likesmoke,sanity,tier1,tier2; component markers likemodel_registry)📝 Suggested class setup
+@pytest.mark.tier2 +@pytest.mark.model_registry class TestHFPatternMatching: + """Test HuggingFace catalog pattern matching for include/exclude model filters.""" + `@pytest.mark.parametrize`(
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.pytests/model_registry/model_catalog/huggingface/utils.py
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Every test MUST have a docstring explaining what it tests
Apply relevant markers frompytest.ini: tier (smoke,sanity,tier1,tier2), component (model_serving,model_registry,llama_stack), infrastructure (gpu,parallel,slow)
Use Given-When-Then format in test docstrings for behavioral clarity
Use openshift-python-wrapper for all Kubernetes API calls
Kubernetes resource lifecycle MUST use context managers to ensure cleanup
Add type annotations to test code and fixtures (mypy strict enforced)
Write Google-format docstrings for tests and fixtures
Files:
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.pytests/model_registry/model_catalog/huggingface/utils.py
tests/**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not import heavy resources at module level; defer to fixture scope
Files:
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
tests/**/utils.py
📄 CodeRabbit inference engine (AGENTS.md)
Component-specific utility functions should be placed in component-level utils.py files
Files:
tests/model_registry/model_catalog/huggingface/utils.py
🧬 Code graph analysis (2)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py (3)
tests/model_registry/model_catalog/huggingface/utils.py (2)
wait_for_huggingface_retrival_match(105-127)wait_for_hugging_face_model_import(137-146)tests/model_registry/conftest.py (1)
model_registry_rest_headers(399-400)tests/model_registry/model_catalog/huggingface/conftest.py (2)
huggingface_api(9-10)expected_num_models_from_hf_api(14-36)
tests/model_registry/model_catalog/huggingface/utils.py (4)
tests/model_registry/utils.py (1)
execute_get_command(730-738)tests/model_registry/model_catalog/conftest.py (1)
model_catalog_rest_url(413-422)tests/model_registry/conftest.py (1)
model_registry_rest_headers(399-400)tests/model_registry/model_catalog/huggingface/conftest.py (1)
expected_num_models_from_hf_api(14-36)
🪛 Ruff (0.14.11)
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
127-127: Unused method argument: updated_catalog_config_map_scope_function
(ARG002)
130-130: Unused method argument: huggingface_api
(ARG002)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
lugi0
left a comment
There was a problem hiding this comment.
I would remove the duplicate function, other than that LGTM
tests/model_registry/model_catalog/huggingface/test_huggingface_model_validation.py
Show resolved
Hide resolved
|
Status of building tag latest: success. |
* Add HF positive tests with wildcard * Updates to add checks to catalog pod log * updates based on review comments
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.