Skip to content

Upgrade tests for llm-d#1206

Merged
threcc merged 1 commit intoopendatahub-io:mainfrom
threcc:llmd-upgrade-test
Mar 16, 2026
Merged

Upgrade tests for llm-d#1206
threcc merged 1 commit intoopendatahub-io:mainfrom
threcc:llmd-upgrade-test

Conversation

@threcc
Copy link
Copy Markdown
Contributor

@threcc threcc commented Mar 11, 2026

Pull Request

Summary

Adds upgrade tests for LLMInferenceService (llm-d) following the existing pre/post upgrade pattern, verifying that the LLMISVC, gateway, and inference endpoint survive an operator upgrade without pod restarts or downtime.

Also increases the default LLMISVC wait timeout and makes the teardown flag configurable in the shared conftest.

Related Issues

How it has been tested

  • Locally
  • Jenkins

Summary by CodeRabbit

  • Tests

    • Added a comprehensive LLMD upgrade test suite with pre/post-upgrade chat checks, resource existence checks, and pod/router stability assertions.
    • Added upgrade fixtures to provision LLMD namespaces, gateways, and inference services for upgrade scenarios.
    • Added utilities to verify gateway acceptance and to detect pod/router restarts.
  • Chores

    • Increased service wait timeout from 180s to 240s and added an optional teardown control for upgrade test resources.

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/hold', '/cherry-pick', '/build-push-pr-image', '/verified', '/lgtm', '/wip'}

mwaykole
mwaykole previously approved these changes Mar 13, 2026
@threcc threcc force-pushed the llmd-upgrade-test branch from 2891f4a to aa77966 Compare March 13, 2026 16:43
@threcc threcc requested a review from mwaykole March 13, 2026 16:43
@threcc threcc marked this pull request as ready for review March 13, 2026 16:43
Copy link
Copy Markdown
Contributor

@SB159 SB159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@threcc threcc enabled auto-merge (squash) March 13, 2026 16:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds LLMD upgrade testing: new upgrade fixtures and tests, verification utilities for gateway acceptance and pod restart checks, a configurable teardown flag in the LLMInferenceService test factory, and increases LLMD test config wait_timeout from 180s to 240s.

Changes

Cohort / File(s) Summary
LLMD test factory & config
tests/model_serving/model_server/llmd/conftest.py, tests/model_serving/model_server/llmd/llmd_configs/config_base.py
Added teardown: bool = True parameter to _create_llmisvc_from_config() and propagated it into the service creation payload; increased LLMISvcConfig.wait_timeout from 180 to 240.
Upgrade test fixtures
tests/model_serving/model_server/upgrade/conftest.py
Added LLMD-focused fixtures and imports: llmd_namespace_fixture, llmd_gateway_fixture, and llmd_inference_service_fixture to provision namespace, gateway, and LLMInferenceService for upgrade scenarios.
Upgrade tests
tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
New test module with pre-upgrade and post-upgrade test classes verifying LLMInferenceService presence, end-to-end chat completion, gateway acceptance, and that LLMD workload/router pods did not restart during upgrade.
Upgrade utilities
tests/model_serving/model_server/upgrade/utils.py
Added helpers verify_gateway_accepted(), verify_llmd_pods_not_restarted(), and verify_llmd_router_not_restarted() to assert gateway acceptance and to check pod container restart counts, raising on violations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description includes summary, related JIRA issue, and testing checklist, but 'Additional Requirements' section is entirely omitted. Add the 'Additional Requirements' section to the PR description with checkboxes for new test images and Jenkins job updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: addition of upgrade tests for LLMD component.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)

837-838: Stop importing private helpers from another conftest.py.

Line 837 imports _create_llmisvc_from_config from a sibling conftest.py. This is a pytest anti-pattern and creates brittle coupling in test bootstrap/collection. Move this helper into a shared non-conftest module and import from there.

As per coding guidelines, **: REVIEW PRIORITIES: 2. Architectural issues and anti-patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 837 - 838,
The test conftest is importing the private helper _create_llmisvc_from_config
from another conftest (an anti-pattern); extract _create_llmisvc_from_config
(and any related shared helpers like TinyLlamaOciConfig if needed) into a new
shared utility module (e.g., tests.model_serving.model_server.llmd.helpers or
utils), make the helper a non-private exported function if appropriate, then
update this conftest and the original llmd.conftest to import the helper from
that new shared module instead of importing from a conftest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 801-804: The fixture llmd_gateway_fixture currently hardcodes
teardown=False in the pre-upgrade path; change it to honor the runtime teardown
policy by reading the teardown flag from pytestconfig (e.g. use
pytestconfig.getoption("teardown") or the existing teardown config key) and pass
that boolean into the gateway creation/cleanup call instead of False; update
every hardcoded occurrence in the llmd_gateway_fixture pre-upgrade branch (and
the nearby occurrences referenced around the 817-824 block) so cleanup follows
the test-run teardown setting.

In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 402-403: The code accesses gateway.instance.status.get(...)
without checking for None which can raise AttributeError for un-reconciled
Gateways; modify the logic around gateway.instance.status (used to compute
conditions and accepted) to first guard for None (e.g., status =
gateway.instance.status or {} or use getattr) then call .get("conditions", [])
on that safe object so conditions and the accepted = any(...) check never
operate on None.
- Around line 343-346: The loop that inspects container restart counts accesses
pod.instance.status.containerStatuses without ensuring pod.instance.status
exists; update the checks in the loop in the function that iterates "for pod in
pods:" (the block referencing pod.instance.status.containerStatuses around the
current diff) to first guard with a truthy check for pod.instance.status (e.g.,
"if pod.instance.status and pod.instance.status.containerStatuses:") before
iterating containerStatuses, and make the identical defensive change inside
verify_isvc_inference_not_restarted() (the check around line 118) so both places
mirror the existing guards used at lines 259 and 312.

---

Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 837-838: The test conftest is importing the private helper
_create_llmisvc_from_config from another conftest (an anti-pattern); extract
_create_llmisvc_from_config (and any related shared helpers like
TinyLlamaOciConfig if needed) into a new shared utility module (e.g.,
tests.model_serving.model_server.llmd.helpers or utils), make the helper a
non-private exported function if appropriate, then update this conftest and the
original llmd.conftest to import the helper from that new shared module instead
of importing from a conftest.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 3556eeda-7138-4012-9b1c-d57ee315a1f4

📥 Commits

Reviewing files that changed from the base of the PR and between f986be3 and 7b3d9cb.

📒 Files selected for processing (5)
  • tests/model_serving/model_server/llmd/conftest.py
  • tests/model_serving/model_server/llmd/llmd_configs/config_base.py
  • tests/model_serving/model_server/upgrade/conftest.py
  • tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
  • tests/model_serving/model_server/upgrade/utils.py

Comment thread tests/model_serving/model_server/upgrade/conftest.py
Comment thread tests/model_serving/model_server/upgrade/utils.py
Comment thread tests/model_serving/model_server/upgrade/utils.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
tests/model_serving/model_server/upgrade/utils.py (3)

378-383: ⚠️ Potential issue | 🟠 Major

Guard router_pod.instance.status before accessing containerStatuses.

Same issue as above. Transient pod states can have None status.

Proposed fix
     restarted_containers: dict[str, list[str]] = {}
-    if router_pod.instance.status.containerStatuses:
+    if router_pod.instance.status and router_pod.instance.status.containerStatuses:
         for container in router_pod.instance.status.containerStatuses:
             if container.restartCount > max_restarts:
                 restarted_containers.setdefault(router_pod.name, []).append(
                     f"{container.name} (restarts: {container.restartCount})"
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/utils.py` around lines 378 - 383,
The code accesses router_pod.instance.status.containerStatuses without ensuring
status exists; update the condition to guard status first (e.g., check
router_pod.instance.status is not None and then check containerStatuses) before
iterating, so in the block around router_pod.instance.status.containerStatuses
you should first verify router_pod.instance.status (and optionally
router_pod.instance.status.containerStatuses) is truthy to avoid AttributeError;
apply this change where restarted_containers is populated for router_pod.name
and container.restartCount is checked.

343-349: ⚠️ Potential issue | 🟠 Major

Guard pod.instance.status before accessing containerStatuses; AttributeError if status is None.

During transient pod states (pending, initializing), pod.instance.status can be None. Lines 259 and 312 in this file guard against this; apply the same pattern here.

Proposed fix
     for pod in pods:
-        if pod.instance.status.containerStatuses:
+        if pod.instance.status and pod.instance.status.containerStatuses:
             for container in pod.instance.status.containerStatuses:
                 if container.restartCount > max_restarts:
                     restarted_containers.setdefault(pod.name, []).append(
                         f"{container.name} (restarts: {container.restartCount})"
                     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/utils.py` around lines 343 - 349,
The loop that inspects pod.instance.status.containerStatuses can raise
AttributeError when pod.instance.status is None; update the code in the pods
iteration (where restarted_containers is populated) to guard on
pod.instance.status (e.g., if not pod.instance.status: continue) before
accessing containerStatuses, mirroring the existing checks used earlier in this
file (see the earlier guards around lines that check pod.instance.status), and
then proceed to iterate container in pod.instance.status.containerStatuses and
compare container.restartCount to max_restarts as before.

402-405: ⚠️ Potential issue | 🟠 Major

Guard gateway.instance.status before calling .get(); AttributeError on un-reconciled Gateway.

A newly created Gateway may have status: None until the controller reconciles it.

Proposed fix
-    conditions = gateway.instance.status.get("conditions", [])
+    status = gateway.instance.status or {}
+    conditions = status.get("conditions", [])
     is_accepted = any(
         condition.get("type") == "Accepted" and condition.get("status") == "True" for condition in conditions
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/utils.py` around lines 402 - 405,
The code assumes gateway.instance.status is a dict and calls .get(), which will
raise AttributeError when status is None; before computing conditions and
is_accepted, add a guard that checks gateway.instance.status is truthy (e.g. if
not gateway.instance.status: treat as no conditions) and only call
gateway.instance.status.get("conditions", []) when status is not None — update
the logic around the variables conditions and is_accepted to handle a None
status case (use an empty list or short-circuit) so functions/variables
referencing gateway.instance.status, conditions, and is_accepted are safe for
un-reconciled Gateways.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 378-383: The code accesses
router_pod.instance.status.containerStatuses without ensuring status exists;
update the condition to guard status first (e.g., check
router_pod.instance.status is not None and then check containerStatuses) before
iterating, so in the block around router_pod.instance.status.containerStatuses
you should first verify router_pod.instance.status (and optionally
router_pod.instance.status.containerStatuses) is truthy to avoid AttributeError;
apply this change where restarted_containers is populated for router_pod.name
and container.restartCount is checked.
- Around line 343-349: The loop that inspects
pod.instance.status.containerStatuses can raise AttributeError when
pod.instance.status is None; update the code in the pods iteration (where
restarted_containers is populated) to guard on pod.instance.status (e.g., if not
pod.instance.status: continue) before accessing containerStatuses, mirroring the
existing checks used earlier in this file (see the earlier guards around lines
that check pod.instance.status), and then proceed to iterate container in
pod.instance.status.containerStatuses and compare container.restartCount to
max_restarts as before.
- Around line 402-405: The code assumes gateway.instance.status is a dict and
calls .get(), which will raise AttributeError when status is None; before
computing conditions and is_accepted, add a guard that checks
gateway.instance.status is truthy (e.g. if not gateway.instance.status: treat as
no conditions) and only call gateway.instance.status.get("conditions", []) when
status is not None — update the logic around the variables conditions and
is_accepted to handle a None status case (use an empty list or short-circuit) so
functions/variables referencing gateway.instance.status, conditions, and
is_accepted are safe for un-reconciled Gateways.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 5978c1a2-69fd-41c4-9a2e-f1badcc67272

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3d9cb and 824ba72.

📒 Files selected for processing (2)
  • tests/model_serving/model_server/upgrade/conftest.py
  • tests/model_serving/model_server/upgrade/utils.py

@mwaykole
Copy link
Copy Markdown
Member

@threcc fix the sign-off

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/model_serving/model_server/upgrade/utils.py (2)

402-403: ⚠️ Potential issue | 🟠 Major

Guard gateway status access before reading conditions.

Line 402 assumes gateway.instance.status is a dict-like object; if status is absent/transient, this raises AttributeError instead of the intended assertion path. Normalize status first and fail with an explicit assertion.

Proposed fix
-    conditions = gateway.instance.status.get("conditions", [])
+    status = getattr(gateway.instance, "status", None)
+    if not status:
+        raise AssertionError(f"Gateway {gateway.name} has no status yet")
+    conditions = status.get("conditions", []) if hasattr(status, "get") else getattr(status, "conditions", []) or []
In `ocp_resources` Gateway objects, what concrete type is `gateway.instance.status` and is it guaranteed non-null once `gateway.exists` is true?

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/utils.py` around lines 402 - 403,
Normalize and guard access to gateway.instance.status before reading
"conditions": first ensure gateway and gateway.instance exist and assign a safe
dict (e.g., status = getattr(gateway.instance, "status", {}) or {} if None)
before using .get; then compute conditions = status.get("conditions", []) and
is_accepted = any(...). Add an explicit assertion or raise with a clear message
if status is missing or not a mapping (referencing gateway, gateway.instance,
gateway.instance.status, conditions, is_accepted) so the code follows the
intended assertion path rather than raising AttributeError.

343-346: ⚠️ Potential issue | 🟠 Major

Make restart checks fail deterministically with explicit status validation.

Lines 343 and 378 can throw AttributeError before your intended PodContainersRestartError, which obscures test failures during transient reconciliation. Keep fail-fast behavior, but raise explicit test errors when status/containerStatuses are missing.

Proposed fix
 def verify_llmd_pods_not_restarted(
@@
-    for pod in pods:
-        if pod.instance.status.containerStatuses:
-            for container in pod.instance.status.containerStatuses:
-                if container.restartCount > max_restarts:
-                    restarted_containers.setdefault(pod.name, []).append(
-                        f"{container.name} (restarts: {container.restartCount})"
-                    )
+    for pod in pods:
+        status = getattr(pod.instance, "status", None)
+        container_statuses = getattr(status, "containerStatuses", None)
+        if container_statuses is None:
+            raise PodContainersRestartError(
+                f"Missing containerStatuses for pod {pod.name} while verifying restart counts"
+            )
+        for container in container_statuses:
+            if container.restartCount > max_restarts:
+                restarted_containers.setdefault(pod.name, []).append(
+                    f"{container.name} (restarts: {container.restartCount})"
+                )
@@
 def verify_llmd_router_not_restarted(
@@
-    restarted_containers: dict[str, list[str]] = {}
-    if router_pod.instance.status.containerStatuses:
-        for container in router_pod.instance.status.containerStatuses:
-            if container.restartCount > max_restarts:
-                restarted_containers.setdefault(router_pod.name, []).append(
-                    f"{container.name} (restarts: {container.restartCount})"
-                )
+    restarted_containers: dict[str, list[str]] = {}
+    status = getattr(router_pod.instance, "status", None)
+    container_statuses = getattr(status, "containerStatuses", None)
+    if container_statuses is None:
+        raise PodContainersRestartError(
+            f"Missing containerStatuses for router pod {router_pod.name} while verifying restart counts"
+        )
+    for container in container_statuses:
+        if container.restartCount > max_restarts:
+            restarted_containers.setdefault(router_pod.name, []).append(
+                f"{container.name} (restarts: {container.restartCount})"
+            )
For `ocp_resources` Pod objects, can `pod.instance.status` or `status.containerStatuses` be transiently null before reconciliation? What is the recommended defensive access pattern?

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

Also applies to: 378-380

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/utils.py` around lines 343 - 346,
The loop over pods can raise AttributeError when pod.instance.status or
status.containerStatuses is None; update the checks in the code that iterates
pods (the block referencing pod.instance.status, status.containerStatuses,
container.restartCount and max_restarts) to defensively validate presence of
status and containerStatuses before accessing them, and if either is missing
raise the explicit PodContainersRestartError with a clear message (including pod
identity) instead of letting an AttributeError propagate; apply the same
defensive pattern to the similar block that checks container restart counts
later in the file.
🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)

838-839: Extract _create_llmisvc_from_config to a shared test utilities module.

Line 838 imports a private helper from a sibling conftest.py. While functional, conftest-to-conftest imports create pytest coupling that is fragile as modules evolve. Move _create_llmisvc_from_config to a shared module (e.g., tests/model_serving/model_server/shared/utils.py or similar) and import from there instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 838 - 839,
Move the helper function `_create_llmisvc_from_config` out of the sibling
conftest into a shared test utilities module and update imports to reference
that shared module instead of importing from another conftest; specifically,
create a shared utils module that exports `_create_llmisvc_from_config` (and any
dependent helpers/constants like `TinyLlamaOciConfig` if needed), change the
import in `conftest.py` to import `_create_llmisvc_from_config` from the new
shared module, and update any other tests that currently import it from the
sibling conftest to use the shared module so tests no longer import directly
from another conftest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 402-403: Normalize and guard access to gateway.instance.status
before reading "conditions": first ensure gateway and gateway.instance exist and
assign a safe dict (e.g., status = getattr(gateway.instance, "status", {}) or {}
if None) before using .get; then compute conditions = status.get("conditions",
[]) and is_accepted = any(...). Add an explicit assertion or raise with a clear
message if status is missing or not a mapping (referencing gateway,
gateway.instance, gateway.instance.status, conditions, is_accepted) so the code
follows the intended assertion path rather than raising AttributeError.
- Around line 343-346: The loop over pods can raise AttributeError when
pod.instance.status or status.containerStatuses is None; update the checks in
the code that iterates pods (the block referencing pod.instance.status,
status.containerStatuses, container.restartCount and max_restarts) to
defensively validate presence of status and containerStatuses before accessing
them, and if either is missing raise the explicit PodContainersRestartError with
a clear message (including pod identity) instead of letting an AttributeError
propagate; apply the same defensive pattern to the similar block that checks
container restart counts later in the file.

---

Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 838-839: Move the helper function `_create_llmisvc_from_config`
out of the sibling conftest into a shared test utilities module and update
imports to reference that shared module instead of importing from another
conftest; specifically, create a shared utils module that exports
`_create_llmisvc_from_config` (and any dependent helpers/constants like
`TinyLlamaOciConfig` if needed), change the import in `conftest.py` to import
`_create_llmisvc_from_config` from the new shared module, and update any other
tests that currently import it from the sibling conftest to use the shared
module so tests no longer import directly from another conftest.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0a46eb0e-4740-4322-9a36-8b8d6e897b93

📥 Commits

Reviewing files that changed from the base of the PR and between 26277f0 and 32c2fda.

📒 Files selected for processing (5)
  • tests/model_serving/model_server/llmd/conftest.py
  • tests/model_serving/model_server/llmd/llmd_configs/config_base.py
  • tests/model_serving/model_server/upgrade/conftest.py
  • tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
  • tests/model_serving/model_server/upgrade/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/model_serving/model_server/upgrade/test_upgrade_llmd.py

Signed-off-by: threcc <trecchiu@redhat.com>
@threcc threcc force-pushed the llmd-upgrade-test branch from 32c2fda to 33e77ac Compare March 16, 2026 10:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)

838-839: Importing private function _create_llmisvc_from_config creates an isolated cross-module dependency.

The underscore prefix indicates this is a private API. While this is the only external import of this function in the codebase, coupling to implementation details remains fragile.

Since no public factory function exists in the module, consider promoting _create_llmisvc_from_config to public API (create_llmisvc_from_config) or formally documenting its contract as semi-public. The single-point dependency makes this a low-urgency refactor, but aligning the API surface reduces future friction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 838 - 839,
The test imports the private function _create_llmisvc_from_config from
tests.model_serving.model_server.llmd.conftest which creates a fragile
cross-module dependency; update the module to expose a public factory
(create_llmisvc_from_config) or add a documented semi-public alias that forwards
to _create_llmisvc_from_config, then change this test import to use
create_llmisvc_from_config (and keep TinyLlamaOciConfig as-is) so callers rely
on the public symbol instead of the underscored private function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 340-352: The current check in the workload restart verifier
silently passes when get_llmd_workload_pods returns an empty list; add an
explicit check after calling get_llmd_workload_pods to raise
PodContainersRestartError (or reuse the same error type) if pods is empty
(similar to verify_llmd_router_not_restarted's behavior), then continue with the
existing loop that populates restarted_containers; reference
get_llmd_workload_pods, restarted_containers, PodContainersRestartError and
verify_llmd_router_not_restarted when implementing this early-empty-pods guard
so missing deployments are reported instead of being ignored.

---

Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 838-839: The test imports the private function
_create_llmisvc_from_config from tests.model_serving.model_server.llmd.conftest
which creates a fragile cross-module dependency; update the module to expose a
public factory (create_llmisvc_from_config) or add a documented semi-public
alias that forwards to _create_llmisvc_from_config, then change this test import
to use create_llmisvc_from_config (and keep TinyLlamaOciConfig as-is) so callers
rely on the public symbol instead of the underscored private function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: cc66a6e1-a19e-4e44-9606-e847568bcda4

📥 Commits

Reviewing files that changed from the base of the PR and between 32c2fda and 33e77ac.

📒 Files selected for processing (5)
  • tests/model_serving/model_server/llmd/conftest.py
  • tests/model_serving/model_server/llmd/llmd_configs/config_base.py
  • tests/model_serving/model_server/upgrade/conftest.py
  • tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
  • tests/model_serving/model_server/upgrade/utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
  • tests/model_serving/model_server/llmd/llmd_configs/config_base.py
  • tests/model_serving/model_server/llmd/conftest.py

Comment thread tests/model_serving/model_server/upgrade/utils.py
@threcc threcc merged commit a8b277f into opendatahub-io:main Mar 16, 2026
11 checks passed
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

@threcc threcc deleted the llmd-upgrade-test branch March 23, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants