feat: Add tests for duplicate models in multiple HF sources#1221
feat: Add tests for duplicate models in multiple HF sources#1221dbasunag merged 7 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/verified', '/cherry-pick', '/build-push-pr-image', '/hold', '/wip', '/lgtm'} |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new HF model category Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_registry/model_catalog/constants.py`:
- Around line 78-79: The comment above the entry
"ibm-granite/granite-4.0-h-small" incorrectly states it is unique globally;
update the comment to state that it is unique only relative to the "mixed" group
(since the same model also appears in HF_MODELS["granite"] around the HF_MODELS
definitions), e.g., change the wording near the list containing
"ibm-granite/granite-4.0-h-small" to indicate uniqueness within
HF_MODELS["mixed"] rather than globally unique.
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 53-58: Add explicit presence checks for MIXED_SOURCE_ID and
OVERLAPPING_SOURCE_ID before iterating status: verify that any(item["id"] ==
MIXED_SOURCE_ID for item in sources) and any(item["id"] == OVERLAPPING_SOURCE_ID
for item in sources) (or equivalent assertions) so the test fails if an expected
source is missing; then keep the existing loop that asserts source["status"] ==
"available" for those IDs. This targets the variables MIXED_SOURCE_ID,
OVERLAPPING_SOURCE_ID and the existing loop over sources.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: bb7afe8c-9a5d-4f94-8326-b92e6ec7550e
📒 Files selected for processing (3)
tests/model_registry/model_catalog/constants.pytests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.pytests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)
52-58:⚠️ Potential issue | 🟠 MajorAssert expected source presence before status validation.
Current logic at Line [53]-Line [58] only validates sources that are returned. If one expected source is missing, this test can still pass.
Proposed fix
- sources = response.get("items", []) - for source in sources: - if source["id"] in [MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID]: - assert source["status"] == "available", ( - f"Source '{source['id']}' has status '{source['status']}', expected 'available'. " - f"Error: {source.get('error', 'N/A')}" - ) + sources_by_id = {source["id"]: source for source in response.get("items", [])} + for expected_source_id in [MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID]: + assert expected_source_id in sources_by_id, ( + f"Expected source '{expected_source_id}' not found. " + f"Available source ids: {list(sources_by_id.keys())}" + ) + source = sources_by_id[expected_source_id] + assert source["status"] == "available", ( + f"Source '{source['id']}' has status '{source['status']}', expected 'available'. " + f"Error: {source.get('error', 'N/A')}" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py` around lines 52 - 58, The test iterates over response.get("items", []) and only asserts status for sources it finds, so missing expected sources like MIXED_SOURCE_ID or OVERLAPPING_SOURCE_ID will be ignored; update the test to first collect returned IDs (e.g., from sources or response) and assert that both MIXED_SOURCE_ID and OVERLAPPING_SOURCE_ID are present, then locate each source by id and assert its status == "available" (including the existing error detail in the failure message) to ensure absence is treated as a test failure.tests/model_registry/model_catalog/constants.py (1)
78-79:⚠️ Potential issue | 🟡 MinorClarify the uniqueness scope in the inline comment.
At Line [78], “Unique to this source” is ambiguous/inaccurate because
"ibm-granite/granite-4.0-h-small"also exists inHF_MODELS["granite"](Line [68]). Scope it relative to"mixed"or to this specific test setup.Proposed edit
- # Unique to this source + # Unique relative to "mixed" in this test scenario "ibm-granite/granite-4.0-h-small",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_registry/model_catalog/constants.py` around lines 78 - 79, The inline comment "Unique to this source" is ambiguous; update the comment near the HF_MODELS entry for "ibm-granite/granite-4.0-h-small" to clarify the scope (e.g., "Unique to this 'mixed' test source" or "Unique to this test setup, not global") so it explicitly states whether uniqueness is within the 'mixed' collection or only for this test configuration; adjust the comment adjacent to the HF_MODELS (or the variable name used in this file) entry to reflect that scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/model_registry/model_catalog/constants.py`:
- Around line 78-79: The inline comment "Unique to this source" is ambiguous;
update the comment near the HF_MODELS entry for
"ibm-granite/granite-4.0-h-small" to clarify the scope (e.g., "Unique to this
'mixed' test source" or "Unique to this test setup, not global") so it
explicitly states whether uniqueness is within the 'mixed' collection or only
for this test configuration; adjust the comment adjacent to the HF_MODELS (or
the variable name used in this file) entry to reflect that scope.
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 52-58: The test iterates over response.get("items", []) and only
asserts status for sources it finds, so missing expected sources like
MIXED_SOURCE_ID or OVERLAPPING_SOURCE_ID will be ignored; update the test to
first collect returned IDs (e.g., from sources or response) and assert that both
MIXED_SOURCE_ID and OVERLAPPING_SOURCE_ID are present, then locate each source
by id and assert its status == "available" (including the existing error detail
in the failure message) to ensure absence is treated as a test failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: b29d7e79-3f7f-4619-836e-bfba329d0b5e
📒 Files selected for processing (3)
tests/model_registry/model_catalog/constants.pytests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.pytests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)
136-141: Consider defensive access forsource_idkey.Line 137 assumes every item has
"source_id". If the API schema changes or returns partial data, the test fails with an opaqueKeyErrorrather than an assertion message describing the problem.Proposed improvement
matching_items = response.get("items", []) - source_ids = {item["source_id"] for item in matching_items} + source_ids = {item.get("source_id") for item in matching_items} + source_ids.discard(None) # Remove None if any item lacked the key🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py` around lines 136 - 141, Defensively handle missing "source_id" in matching_items by building source_ids using item.get("source_id") and ignoring None (e.g., {item.get("source_id") for item in matching_items if item.get("source_id") is not None}); if any items lacked "source_id" include that fact in the assertion message so failures aren't a raw KeyError — update the assertion that checks {MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID}.issubset(source_ids) to mention SHARED_MODEL, the expected source IDs, the actual source_ids, and the count/indices of items missing "source_id" (use matching_items and source_ids to compute these diagnostics).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 109-110: The test can raise AttributeError when
result["externalId"] is present but null; update the handling of external_id in
the test so it never becomes None (e.g., replace external_id =
result.get("externalId", "") with external_id = result.get("externalId") or ""),
then keep the existing assert that checks
external_id.startswith(f"{source_id}:") unchanged; this ensures the assert won't
call startswith on None while preserving behavior when externalId is missing or
null.
---
Nitpick comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 136-141: Defensively handle missing "source_id" in matching_items
by building source_ids using item.get("source_id") and ignoring None (e.g.,
{item.get("source_id") for item in matching_items if item.get("source_id") is
not None}); if any items lacked "source_id" include that fact in the assertion
message so failures aren't a raw KeyError — update the assertion that checks
{MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID}.issubset(source_ids) to mention
SHARED_MODEL, the expected source IDs, the actual source_ids, and the
count/indices of items missing "source_id" (use matching_items and source_ids to
compute these diagnostics).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 1e616027-67a9-40e3-9357-cd2b7ad0786a
📒 Files selected for processing (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py
Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)
117-118:⚠️ Potential issue | 🟡 Minor
AttributeErrorifexternalIdis explicitlynullin response.
dict.get(key, default)returnsNonewhen the key exists with valueNone, not the default. If the API returns{"externalId": null}, line 118 raisesAttributeError: 'NoneType' object has no attribute 'startswith'.Proposed fix
- external_id = result.get("externalId", "") + external_id = result.get("externalId") or ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py` around lines 117 - 118, The test can raise AttributeError if result["externalId"] exists but is null; update the handling of external_id (the variable assigned from result.get("externalId", "")) so it is coerced to a string or defaulted to "" before calling external_id.startswith(...). Specifically, replace the current get usage with a coalescing approach (e.g., external_id = result.get("externalId") or "") or explicitly check for None, then assert not external_id.startswith(f"{source_id}:") using that safe value.
🧹 Nitpick comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)
92-105: Consider defensive access forresult["name"].Direct key access raises
KeyErrorwith a generic traceback if the API omits"name". A.get()with explicit assertion yields a clearer failure message.Optional improvement
result = execute_get_command(url=url, headers=model_registry_rest_headers) - assert result["name"] == SHARED_MODEL, ( + model_name = result.get("name") + assert model_name == SHARED_MODEL, ( - f"Expected model name '{SHARED_MODEL}', got '{result['name']}' from source '{source_id}'" + f"Expected model name '{SHARED_MODEL}', got '{model_name}' from source '{source_id}'" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py` around lines 92 - 105, In test_shared_model_retrievable_per_source, avoid direct dict indexing of result["name"]; change the check to fetch the value with result.get("name") into a local variable (e.g., model_name) and assert it is not None and equals SHARED_MODEL so failures show a clear message; update the assert to something like: model_name = result.get("name") and assert model_name == SHARED_MODEL, f"Expected model name '{SHARED_MODEL}', got '{model_name}' from source '{source_id}'", referencing the test_shared_model_retrievable_per_source function, the result variable, and SHARED_MODEL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 117-118: The test can raise AttributeError if result["externalId"]
exists but is null; update the handling of external_id (the variable assigned
from result.get("externalId", "")) so it is coerced to a string or defaulted to
"" before calling external_id.startswith(...). Specifically, replace the current
get usage with a coalescing approach (e.g., external_id =
result.get("externalId") or "") or explicitly check for None, then assert not
external_id.startswith(f"{source_id}:") using that safe value.
---
Nitpick comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 92-105: In test_shared_model_retrievable_per_source, avoid direct
dict indexing of result["name"]; change the check to fetch the value with
result.get("name") into a local variable (e.g., model_name) and assert it is not
None and equals SHARED_MODEL so failures show a clear message; update the assert
to something like: model_name = result.get("name") and assert model_name ==
SHARED_MODEL, f"Expected model name '{SHARED_MODEL}', got '{model_name}' from
source '{source_id}'", referencing the test_shared_model_retrievable_per_source
function, the result variable, and SHARED_MODEL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: fc74e388-2ffe-45a9-a9f8-a3a3693be112
📒 Files selected for processing (3)
tests/model_registry/model_catalog/constants.pytests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.pytests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
✅ Files skipped from review due to trivial changes (1)
- tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/model_registry/model_catalog/constants.py
|
Status of building tag latest: success. |
…hub-io#1221) * feat: Add tests for dupplicate models in multiple HF sources Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com> * fix: address review comments Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com> * fix: address review comments Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com> --------- Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com> Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
Pull Request
Summary
Related Issues
How it has been tested
Additional Requirements
Summary by CodeRabbit