Skip to content

feat: Add tests for duplicate models in multiple HF sources#1221

Merged
dbasunag merged 7 commits intoopendatahub-io:mainfrom
dbasunag:hf_dup_model
Mar 17, 2026
Merged

feat: Add tests for duplicate models in multiple HF sources#1221
dbasunag merged 7 commits intoopendatahub-io:mainfrom
dbasunag:hf_dup_model

Conversation

@dbasunag
Copy link
Copy Markdown
Collaborator

@dbasunag dbasunag commented Mar 14, 2026

Pull Request

Summary

Related Issues

  • Fixes:
  • JIRA: RHOAIENG-45082

How it has been tested

  • Locally
  • Jenkins

Additional Requirements

  • If this PR introduces a new test image, did you create a PR to mirror it in disconnected environment?
  • If this PR introduces new marker(s)/adds a new component, was relevant ticket created to update relevant Jenkins job?

Summary by CodeRabbit

  • Tests
    • Added tests for duplicate/overlapping models across multiple external sources, verifying shared models appear and are retrievable per source
    • Added checks that external IDs do not expose internal namespace prefixes
    • Added filtering tests (name across sources and name+source label; one marked xfail)
    • Introduced test constants for mixed/overlapping sources, a shared model, and a new overlapping model category with two HF models
  • Chores
    • Cleaned up test docstrings to remove specific issue identifiers

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/verified', '/cherry-pick', '/build-push-pr-image', '/hold', '/wip', '/lgtm'}

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new HF model category overlapping_mixed, introduces a new multi-source HuggingFace test module validating shared-model behavior across sources, and removes an issue identifier from several test docstrings. (23 words)

Changes

Cohort / File(s) Summary
Test Constants
tests/model_registry/model_catalog/constants.py
Added HF_MODELS entry overlapping_mixed containing ibm-granite/granite-4.0-h-1b (shared with mixed) and ibm-granite/granite-4.0-h-small. Added comment noting same-model across sources should not be silently dropped.
Multi-Source Test Suite
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py
New test module adding MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID, SHARED_MODEL and TestHuggingFaceModelsMultipleSources with tests for source status, shared-model presence in both sources, per-source retrieval, externalId namespace safety, an xfail for name+source filtering, and name-only cross-source filtering.
Docstring Cleanup
tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
Removed references to issue identifier RHOAIENG-47934 from class and method docstrings; no functional changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete; the Summary section contains only template placeholder text with no actual description of changes or rationale. Replace the Summary section placeholder with a concise description of what was added, why duplicate models across sources matter, and what behavior is being validated.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding tests for duplicate models across multiple HuggingFace sources.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/model_registry/model_catalog/constants.py`:
- Around line 78-79: The comment above the entry
"ibm-granite/granite-4.0-h-small" incorrectly states it is unique globally;
update the comment to state that it is unique only relative to the "mixed" group
(since the same model also appears in HF_MODELS["granite"] around the HF_MODELS
definitions), e.g., change the wording near the list containing
"ibm-granite/granite-4.0-h-small" to indicate uniqueness within
HF_MODELS["mixed"] rather than globally unique.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 53-58: Add explicit presence checks for MIXED_SOURCE_ID and
OVERLAPPING_SOURCE_ID before iterating status: verify that any(item["id"] ==
MIXED_SOURCE_ID for item in sources) and any(item["id"] == OVERLAPPING_SOURCE_ID
for item in sources) (or equivalent assertions) so the test fails if an expected
source is missing; then keep the existing loop that asserts source["status"] ==
"available" for those IDs. This targets the variables MIXED_SOURCE_ID,
OVERLAPPING_SOURCE_ID and the existing loop over sources.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: bb7afe8c-9a5d-4f94-8326-b92e6ec7550e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4eb3a and ab660eb.

📒 Files selected for processing (3)
  • tests/model_registry/model_catalog/constants.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py

Comment thread tests/model_registry/model_catalog/constants.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)

52-58: ⚠️ Potential issue | 🟠 Major

Assert expected source presence before status validation.

Current logic at Line [53]-Line [58] only validates sources that are returned. If one expected source is missing, this test can still pass.

Proposed fix
-        sources = response.get("items", [])
-        for source in sources:
-            if source["id"] in [MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID]:
-                assert source["status"] == "available", (
-                    f"Source '{source['id']}' has status '{source['status']}', expected 'available'. "
-                    f"Error: {source.get('error', 'N/A')}"
-                )
+        sources_by_id = {source["id"]: source for source in response.get("items", [])}
+        for expected_source_id in [MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID]:
+            assert expected_source_id in sources_by_id, (
+                f"Expected source '{expected_source_id}' not found. "
+                f"Available source ids: {list(sources_by_id.keys())}"
+            )
+            source = sources_by_id[expected_source_id]
+            assert source["status"] == "available", (
+                f"Source '{source['id']}' has status '{source['status']}', expected 'available'. "
+                f"Error: {source.get('error', 'N/A')}"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`
around lines 52 - 58, The test iterates over response.get("items", []) and only
asserts status for sources it finds, so missing expected sources like
MIXED_SOURCE_ID or OVERLAPPING_SOURCE_ID will be ignored; update the test to
first collect returned IDs (e.g., from sources or response) and assert that both
MIXED_SOURCE_ID and OVERLAPPING_SOURCE_ID are present, then locate each source
by id and assert its status == "available" (including the existing error detail
in the failure message) to ensure absence is treated as a test failure.
tests/model_registry/model_catalog/constants.py (1)

78-79: ⚠️ Potential issue | 🟡 Minor

Clarify the uniqueness scope in the inline comment.

At Line [78], “Unique to this source” is ambiguous/inaccurate because "ibm-granite/granite-4.0-h-small" also exists in HF_MODELS["granite"] (Line [68]). Scope it relative to "mixed" or to this specific test setup.

Proposed edit
-        # Unique to this source
+        # Unique relative to "mixed" in this test scenario
         "ibm-granite/granite-4.0-h-small",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_registry/model_catalog/constants.py` around lines 78 - 79, The
inline comment "Unique to this source" is ambiguous; update the comment near the
HF_MODELS entry for "ibm-granite/granite-4.0-h-small" to clarify the scope
(e.g., "Unique to this 'mixed' test source" or "Unique to this test setup, not
global") so it explicitly states whether uniqueness is within the 'mixed'
collection or only for this test configuration; adjust the comment adjacent to
the HF_MODELS (or the variable name used in this file) entry to reflect that
scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/model_registry/model_catalog/constants.py`:
- Around line 78-79: The inline comment "Unique to this source" is ambiguous;
update the comment near the HF_MODELS entry for
"ibm-granite/granite-4.0-h-small" to clarify the scope (e.g., "Unique to this
'mixed' test source" or "Unique to this test setup, not global") so it
explicitly states whether uniqueness is within the 'mixed' collection or only
for this test configuration; adjust the comment adjacent to the HF_MODELS (or
the variable name used in this file) entry to reflect that scope.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 52-58: The test iterates over response.get("items", []) and only
asserts status for sources it finds, so missing expected sources like
MIXED_SOURCE_ID or OVERLAPPING_SOURCE_ID will be ignored; update the test to
first collect returned IDs (e.g., from sources or response) and assert that both
MIXED_SOURCE_ID and OVERLAPPING_SOURCE_ID are present, then locate each source
by id and assert its status == "available" (including the existing error detail
in the failure message) to ensure absence is treated as a test failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: b29d7e79-3f7f-4619-836e-bfba329d0b5e

📥 Commits

Reviewing files that changed from the base of the PR and between ab660eb and 9d4cf15.

📒 Files selected for processing (3)
  • tests/model_registry/model_catalog/constants.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py

fege
fege previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@fege fege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
fege
fege previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@fege fege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)

136-141: Consider defensive access for source_id key.

Line 137 assumes every item has "source_id". If the API schema changes or returns partial data, the test fails with an opaque KeyError rather than an assertion message describing the problem.

Proposed improvement
         matching_items = response.get("items", [])
-        source_ids = {item["source_id"] for item in matching_items}
+        source_ids = {item.get("source_id") for item in matching_items}
+        source_ids.discard(None)  # Remove None if any item lacked the key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`
around lines 136 - 141, Defensively handle missing "source_id" in matching_items
by building source_ids using item.get("source_id") and ignoring None (e.g.,
{item.get("source_id") for item in matching_items if item.get("source_id") is
not None}); if any items lacked "source_id" include that fact in the assertion
message so failures aren't a raw KeyError — update the assertion that checks
{MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID}.issubset(source_ids) to mention
SHARED_MODEL, the expected source IDs, the actual source_ids, and the
count/indices of items missing "source_id" (use matching_items and source_ids to
compute these diagnostics).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 109-110: The test can raise AttributeError when
result["externalId"] is present but null; update the handling of external_id in
the test so it never becomes None (e.g., replace external_id =
result.get("externalId", "") with external_id = result.get("externalId") or ""),
then keep the existing assert that checks
external_id.startswith(f"{source_id}:") unchanged; this ensures the assert won't
call startswith on None while preserving behavior when externalId is missing or
null.

---

Nitpick comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 136-141: Defensively handle missing "source_id" in matching_items
by building source_ids using item.get("source_id") and ignoring None (e.g.,
{item.get("source_id") for item in matching_items if item.get("source_id") is
not None}); if any items lacked "source_id" include that fact in the assertion
message so failures aren't a raw KeyError — update the assertion that checks
{MIXED_SOURCE_ID, OVERLAPPING_SOURCE_ID}.issubset(source_ids) to mention
SHARED_MODEL, the expected source IDs, the actual source_ids, and the
count/indices of items missing "source_id" (use matching_items and source_ids to
compute these diagnostics).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 1e616027-67a9-40e3-9357-cd2b7ad0786a

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7a30c and 0f6a46e.

📒 Files selected for processing (1)
  • tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py

@dbasunag dbasunag changed the title feat: Add tests for dupplicate models in multiple HF sources feat: Add tests for duplicate models in multiple HF sources Mar 16, 2026
Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
@dbasunag dbasunag requested a review from fege March 16, 2026 19:32
SB159
SB159 previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@SB159 SB159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)

117-118: ⚠️ Potential issue | 🟡 Minor

AttributeError if externalId is explicitly null in response.

dict.get(key, default) returns None when the key exists with value None, not the default. If the API returns {"externalId": null}, line 118 raises AttributeError: 'NoneType' object has no attribute 'startswith'.

Proposed fix
-            external_id = result.get("externalId", "")
+            external_id = result.get("externalId") or ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`
around lines 117 - 118, The test can raise AttributeError if
result["externalId"] exists but is null; update the handling of external_id (the
variable assigned from result.get("externalId", "")) so it is coerced to a
string or defaulted to "" before calling external_id.startswith(...).
Specifically, replace the current get usage with a coalescing approach (e.g.,
external_id = result.get("externalId") or "") or explicitly check for None, then
assert not external_id.startswith(f"{source_id}:") using that safe value.
🧹 Nitpick comments (1)
tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py (1)

92-105: Consider defensive access for result["name"].

Direct key access raises KeyError with a generic traceback if the API omits "name". A .get() with explicit assertion yields a clearer failure message.

Optional improvement
             result = execute_get_command(url=url, headers=model_registry_rest_headers)
-            assert result["name"] == SHARED_MODEL, (
+            model_name = result.get("name")
+            assert model_name == SHARED_MODEL, (
-                f"Expected model name '{SHARED_MODEL}', got '{result['name']}' from source '{source_id}'"
+                f"Expected model name '{SHARED_MODEL}', got '{model_name}' from source '{source_id}'"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`
around lines 92 - 105, In test_shared_model_retrievable_per_source, avoid direct
dict indexing of result["name"]; change the check to fetch the value with
result.get("name") into a local variable (e.g., model_name) and assert it is not
None and equals SHARED_MODEL so failures show a clear message; update the assert
to something like: model_name = result.get("name") and assert model_name ==
SHARED_MODEL, f"Expected model name '{SHARED_MODEL}', got '{model_name}' from
source '{source_id}'", referencing the test_shared_model_retrievable_per_source
function, the result variable, and SHARED_MODEL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 117-118: The test can raise AttributeError if result["externalId"]
exists but is null; update the handling of external_id (the variable assigned
from result.get("externalId", "")) so it is coerced to a string or defaulted to
"" before calling external_id.startswith(...). Specifically, replace the current
get usage with a coalescing approach (e.g., external_id =
result.get("externalId") or "") or explicitly check for None, then assert not
external_id.startswith(f"{source_id}:") using that safe value.

---

Nitpick comments:
In
`@tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py`:
- Around line 92-105: In test_shared_model_retrievable_per_source, avoid direct
dict indexing of result["name"]; change the check to fetch the value with
result.get("name") into a local variable (e.g., model_name) and assert it is not
None and equals SHARED_MODEL so failures show a clear message; update the assert
to something like: model_name = result.get("name") and assert model_name ==
SHARED_MODEL, f"Expected model name '{SHARED_MODEL}', got '{model_name}' from
source '{source_id}'", referencing the test_shared_model_retrievable_per_source
function, the result variable, and SHARED_MODEL.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: fc74e388-2ffe-45a9-a9f8-a3a3693be112

📥 Commits

Reviewing files that changed from the base of the PR and between 0f6a46e and 1c551e5.

📒 Files selected for processing (3)
  • tests/model_registry/model_catalog/constants.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_models_multiple_sources.py
  • tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
✅ Files skipped from review due to trivial changes (1)
  • tests/model_registry/model_catalog/huggingface/test_huggingface_source_error_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/model_registry/model_catalog/constants.py

fege
fege previously approved these changes Mar 17, 2026
Copy link
Copy Markdown
Contributor

@fege fege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@dbasunag dbasunag enabled auto-merge (squash) March 17, 2026 10:52
@dbasunag dbasunag dismissed stale reviews from fege and SB159 via f19e334 March 17, 2026 11:32
Copy link
Copy Markdown
Contributor

@fege fege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@dbasunag dbasunag disabled auto-merge March 17, 2026 12:24
@dbasunag dbasunag merged commit 7ebdb05 into opendatahub-io:main Mar 17, 2026
10 checks passed
@dbasunag dbasunag deleted the hf_dup_model branch March 17, 2026 12:24
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

ssaleem-rh pushed a commit to ssaleem-rh/opendatahub-tests that referenced this pull request Mar 23, 2026
…hub-io#1221)

* feat: Add tests for dupplicate models in multiple HF sources

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>

* fix: address review comments

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>

* fix: address review comments

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>

---------

Signed-off-by: Debarati Basu-Nag <dbasunag@redhat.com>
Signed-off-by: Shehan Saleem <ssaleem@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants