Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/hold', '/cherry-pick', '/build-push-pr-image', '/verified', '/lgtm', '/wip'} |
2891f4a to
aa77966
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds LLMD upgrade testing: new upgrade fixtures and tests, verification utilities for gateway acceptance and pod restart checks, a configurable Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)
837-838: Stop importing private helpers from anotherconftest.py.Line 837 imports
_create_llmisvc_from_configfrom a siblingconftest.py. This is a pytest anti-pattern and creates brittle coupling in test bootstrap/collection. Move this helper into a shared non-conftest module and import from there.As per coding guidelines,
**: REVIEW PRIORITIES: 2. Architectural issues and anti-patterns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 837 - 838, The test conftest is importing the private helper _create_llmisvc_from_config from another conftest (an anti-pattern); extract _create_llmisvc_from_config (and any related shared helpers like TinyLlamaOciConfig if needed) into a new shared utility module (e.g., tests.model_serving.model_server.llmd.helpers or utils), make the helper a non-private exported function if appropriate, then update this conftest and the original llmd.conftest to import the helper from that new shared module instead of importing from a conftest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 801-804: The fixture llmd_gateway_fixture currently hardcodes
teardown=False in the pre-upgrade path; change it to honor the runtime teardown
policy by reading the teardown flag from pytestconfig (e.g. use
pytestconfig.getoption("teardown") or the existing teardown config key) and pass
that boolean into the gateway creation/cleanup call instead of False; update
every hardcoded occurrence in the llmd_gateway_fixture pre-upgrade branch (and
the nearby occurrences referenced around the 817-824 block) so cleanup follows
the test-run teardown setting.
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 402-403: The code accesses gateway.instance.status.get(...)
without checking for None which can raise AttributeError for un-reconciled
Gateways; modify the logic around gateway.instance.status (used to compute
conditions and accepted) to first guard for None (e.g., status =
gateway.instance.status or {} or use getattr) then call .get("conditions", [])
on that safe object so conditions and the accepted = any(...) check never
operate on None.
- Around line 343-346: The loop that inspects container restart counts accesses
pod.instance.status.containerStatuses without ensuring pod.instance.status
exists; update the checks in the loop in the function that iterates "for pod in
pods:" (the block referencing pod.instance.status.containerStatuses around the
current diff) to first guard with a truthy check for pod.instance.status (e.g.,
"if pod.instance.status and pod.instance.status.containerStatuses:") before
iterating containerStatuses, and make the identical defensive change inside
verify_isvc_inference_not_restarted() (the check around line 118) so both places
mirror the existing guards used at lines 259 and 312.
---
Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 837-838: The test conftest is importing the private helper
_create_llmisvc_from_config from another conftest (an anti-pattern); extract
_create_llmisvc_from_config (and any related shared helpers like
TinyLlamaOciConfig if needed) into a new shared utility module (e.g.,
tests.model_serving.model_server.llmd.helpers or utils), make the helper a
non-private exported function if appropriate, then update this conftest and the
original llmd.conftest to import the helper from that new shared module instead
of importing from a conftest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 3556eeda-7138-4012-9b1c-d57ee315a1f4
📒 Files selected for processing (5)
tests/model_serving/model_server/llmd/conftest.pytests/model_serving/model_server/llmd/llmd_configs/config_base.pytests/model_serving/model_server/upgrade/conftest.pytests/model_serving/model_server/upgrade/test_upgrade_llmd.pytests/model_serving/model_server/upgrade/utils.py
8cd3dac to
16a7e85
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
tests/model_serving/model_server/upgrade/utils.py (3)
378-383:⚠️ Potential issue | 🟠 MajorGuard
router_pod.instance.statusbefore accessingcontainerStatuses.Same issue as above. Transient pod states can have
Nonestatus.Proposed fix
restarted_containers: dict[str, list[str]] = {} - if router_pod.instance.status.containerStatuses: + if router_pod.instance.status and router_pod.instance.status.containerStatuses: for container in router_pod.instance.status.containerStatuses: if container.restartCount > max_restarts: restarted_containers.setdefault(router_pod.name, []).append( f"{container.name} (restarts: {container.restartCount})" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/utils.py` around lines 378 - 383, The code accesses router_pod.instance.status.containerStatuses without ensuring status exists; update the condition to guard status first (e.g., check router_pod.instance.status is not None and then check containerStatuses) before iterating, so in the block around router_pod.instance.status.containerStatuses you should first verify router_pod.instance.status (and optionally router_pod.instance.status.containerStatuses) is truthy to avoid AttributeError; apply this change where restarted_containers is populated for router_pod.name and container.restartCount is checked.
343-349:⚠️ Potential issue | 🟠 MajorGuard
pod.instance.statusbefore accessingcontainerStatuses; AttributeError if status is None.During transient pod states (pending, initializing),
pod.instance.statuscan beNone. Lines 259 and 312 in this file guard against this; apply the same pattern here.Proposed fix
for pod in pods: - if pod.instance.status.containerStatuses: + if pod.instance.status and pod.instance.status.containerStatuses: for container in pod.instance.status.containerStatuses: if container.restartCount > max_restarts: restarted_containers.setdefault(pod.name, []).append( f"{container.name} (restarts: {container.restartCount})" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/utils.py` around lines 343 - 349, The loop that inspects pod.instance.status.containerStatuses can raise AttributeError when pod.instance.status is None; update the code in the pods iteration (where restarted_containers is populated) to guard on pod.instance.status (e.g., if not pod.instance.status: continue) before accessing containerStatuses, mirroring the existing checks used earlier in this file (see the earlier guards around lines that check pod.instance.status), and then proceed to iterate container in pod.instance.status.containerStatuses and compare container.restartCount to max_restarts as before.
402-405:⚠️ Potential issue | 🟠 MajorGuard
gateway.instance.statusbefore calling.get(); AttributeError on un-reconciled Gateway.A newly created Gateway may have
status: Noneuntil the controller reconciles it.Proposed fix
- conditions = gateway.instance.status.get("conditions", []) + status = gateway.instance.status or {} + conditions = status.get("conditions", []) is_accepted = any( condition.get("type") == "Accepted" and condition.get("status") == "True" for condition in conditions )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/utils.py` around lines 402 - 405, The code assumes gateway.instance.status is a dict and calls .get(), which will raise AttributeError when status is None; before computing conditions and is_accepted, add a guard that checks gateway.instance.status is truthy (e.g. if not gateway.instance.status: treat as no conditions) and only call gateway.instance.status.get("conditions", []) when status is not None — update the logic around the variables conditions and is_accepted to handle a None status case (use an empty list or short-circuit) so functions/variables referencing gateway.instance.status, conditions, and is_accepted are safe for un-reconciled Gateways.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 378-383: The code accesses
router_pod.instance.status.containerStatuses without ensuring status exists;
update the condition to guard status first (e.g., check
router_pod.instance.status is not None and then check containerStatuses) before
iterating, so in the block around router_pod.instance.status.containerStatuses
you should first verify router_pod.instance.status (and optionally
router_pod.instance.status.containerStatuses) is truthy to avoid AttributeError;
apply this change where restarted_containers is populated for router_pod.name
and container.restartCount is checked.
- Around line 343-349: The loop that inspects
pod.instance.status.containerStatuses can raise AttributeError when
pod.instance.status is None; update the code in the pods iteration (where
restarted_containers is populated) to guard on pod.instance.status (e.g., if not
pod.instance.status: continue) before accessing containerStatuses, mirroring the
existing checks used earlier in this file (see the earlier guards around lines
that check pod.instance.status), and then proceed to iterate container in
pod.instance.status.containerStatuses and compare container.restartCount to
max_restarts as before.
- Around line 402-405: The code assumes gateway.instance.status is a dict and
calls .get(), which will raise AttributeError when status is None; before
computing conditions and is_accepted, add a guard that checks
gateway.instance.status is truthy (e.g. if not gateway.instance.status: treat as
no conditions) and only call gateway.instance.status.get("conditions", []) when
status is not None — update the logic around the variables conditions and
is_accepted to handle a None status case (use an empty list or short-circuit) so
functions/variables referencing gateway.instance.status, conditions, and
is_accepted are safe for un-reconciled Gateways.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5978c1a2-69fd-41c4-9a2e-f1badcc67272
📒 Files selected for processing (2)
tests/model_serving/model_server/upgrade/conftest.pytests/model_serving/model_server/upgrade/utils.py
|
@threcc fix the sign-off |
26277f0 to
32c2fda
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/model_serving/model_server/upgrade/utils.py (2)
402-403:⚠️ Potential issue | 🟠 MajorGuard gateway status access before reading conditions.
Line 402 assumes
gateway.instance.statusis a dict-like object; if status is absent/transient, this raisesAttributeErrorinstead of the intended assertion path. Normalize status first and fail with an explicit assertion.Proposed fix
- conditions = gateway.instance.status.get("conditions", []) + status = getattr(gateway.instance, "status", None) + if not status: + raise AssertionError(f"Gateway {gateway.name} has no status yet") + conditions = status.get("conditions", []) if hasattr(status, "get") else getattr(status, "conditions", []) or []In `ocp_resources` Gateway objects, what concrete type is `gateway.instance.status` and is it guaranteed non-null once `gateway.exists` is true?As per coding guidelines,
**: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/utils.py` around lines 402 - 403, Normalize and guard access to gateway.instance.status before reading "conditions": first ensure gateway and gateway.instance exist and assign a safe dict (e.g., status = getattr(gateway.instance, "status", {}) or {} if None) before using .get; then compute conditions = status.get("conditions", []) and is_accepted = any(...). Add an explicit assertion or raise with a clear message if status is missing or not a mapping (referencing gateway, gateway.instance, gateway.instance.status, conditions, is_accepted) so the code follows the intended assertion path rather than raising AttributeError.
343-346:⚠️ Potential issue | 🟠 MajorMake restart checks fail deterministically with explicit status validation.
Lines 343 and 378 can throw
AttributeErrorbefore your intendedPodContainersRestartError, which obscures test failures during transient reconciliation. Keep fail-fast behavior, but raise explicit test errors when status/containerStatuses are missing.Proposed fix
def verify_llmd_pods_not_restarted( @@ - for pod in pods: - if pod.instance.status.containerStatuses: - for container in pod.instance.status.containerStatuses: - if container.restartCount > max_restarts: - restarted_containers.setdefault(pod.name, []).append( - f"{container.name} (restarts: {container.restartCount})" - ) + for pod in pods: + status = getattr(pod.instance, "status", None) + container_statuses = getattr(status, "containerStatuses", None) + if container_statuses is None: + raise PodContainersRestartError( + f"Missing containerStatuses for pod {pod.name} while verifying restart counts" + ) + for container in container_statuses: + if container.restartCount > max_restarts: + restarted_containers.setdefault(pod.name, []).append( + f"{container.name} (restarts: {container.restartCount})" + ) @@ def verify_llmd_router_not_restarted( @@ - restarted_containers: dict[str, list[str]] = {} - if router_pod.instance.status.containerStatuses: - for container in router_pod.instance.status.containerStatuses: - if container.restartCount > max_restarts: - restarted_containers.setdefault(router_pod.name, []).append( - f"{container.name} (restarts: {container.restartCount})" - ) + restarted_containers: dict[str, list[str]] = {} + status = getattr(router_pod.instance, "status", None) + container_statuses = getattr(status, "containerStatuses", None) + if container_statuses is None: + raise PodContainersRestartError( + f"Missing containerStatuses for router pod {router_pod.name} while verifying restart counts" + ) + for container in container_statuses: + if container.restartCount > max_restarts: + restarted_containers.setdefault(router_pod.name, []).append( + f"{container.name} (restarts: {container.restartCount})" + )For `ocp_resources` Pod objects, can `pod.instance.status` or `status.containerStatuses` be transiently null before reconciliation? What is the recommended defensive access pattern?As per coding guidelines,
**: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.Also applies to: 378-380
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/utils.py` around lines 343 - 346, The loop over pods can raise AttributeError when pod.instance.status or status.containerStatuses is None; update the checks in the code that iterates pods (the block referencing pod.instance.status, status.containerStatuses, container.restartCount and max_restarts) to defensively validate presence of status and containerStatuses before accessing them, and if either is missing raise the explicit PodContainersRestartError with a clear message (including pod identity) instead of letting an AttributeError propagate; apply the same defensive pattern to the similar block that checks container restart counts later in the file.
🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)
838-839: Extract_create_llmisvc_from_configto a shared test utilities module.Line 838 imports a private helper from a sibling
conftest.py. While functional, conftest-to-conftest imports create pytest coupling that is fragile as modules evolve. Move_create_llmisvc_from_configto a shared module (e.g.,tests/model_serving/model_server/shared/utils.pyor similar) and import from there instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 838 - 839, Move the helper function `_create_llmisvc_from_config` out of the sibling conftest into a shared test utilities module and update imports to reference that shared module instead of importing from another conftest; specifically, create a shared utils module that exports `_create_llmisvc_from_config` (and any dependent helpers/constants like `TinyLlamaOciConfig` if needed), change the import in `conftest.py` to import `_create_llmisvc_from_config` from the new shared module, and update any other tests that currently import it from the sibling conftest to use the shared module so tests no longer import directly from another conftest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 402-403: Normalize and guard access to gateway.instance.status
before reading "conditions": first ensure gateway and gateway.instance exist and
assign a safe dict (e.g., status = getattr(gateway.instance, "status", {}) or {}
if None) before using .get; then compute conditions = status.get("conditions",
[]) and is_accepted = any(...). Add an explicit assertion or raise with a clear
message if status is missing or not a mapping (referencing gateway,
gateway.instance, gateway.instance.status, conditions, is_accepted) so the code
follows the intended assertion path rather than raising AttributeError.
- Around line 343-346: The loop over pods can raise AttributeError when
pod.instance.status or status.containerStatuses is None; update the checks in
the code that iterates pods (the block referencing pod.instance.status,
status.containerStatuses, container.restartCount and max_restarts) to
defensively validate presence of status and containerStatuses before accessing
them, and if either is missing raise the explicit PodContainersRestartError with
a clear message (including pod identity) instead of letting an AttributeError
propagate; apply the same defensive pattern to the similar block that checks
container restart counts later in the file.
---
Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 838-839: Move the helper function `_create_llmisvc_from_config`
out of the sibling conftest into a shared test utilities module and update
imports to reference that shared module instead of importing from another
conftest; specifically, create a shared utils module that exports
`_create_llmisvc_from_config` (and any dependent helpers/constants like
`TinyLlamaOciConfig` if needed), change the import in `conftest.py` to import
`_create_llmisvc_from_config` from the new shared module, and update any other
tests that currently import it from the sibling conftest to use the shared
module so tests no longer import directly from another conftest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0a46eb0e-4740-4322-9a36-8b8d6e897b93
📒 Files selected for processing (5)
tests/model_serving/model_server/llmd/conftest.pytests/model_serving/model_server/llmd/llmd_configs/config_base.pytests/model_serving/model_server/upgrade/conftest.pytests/model_serving/model_server/upgrade/test_upgrade_llmd.pytests/model_serving/model_server/upgrade/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
Signed-off-by: threcc <trecchiu@redhat.com>
32c2fda to
33e77ac
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/model_serving/model_server/upgrade/conftest.py (1)
838-839: Importing private function_create_llmisvc_from_configcreates an isolated cross-module dependency.The underscore prefix indicates this is a private API. While this is the only external import of this function in the codebase, coupling to implementation details remains fragile.
Since no public factory function exists in the module, consider promoting
_create_llmisvc_from_configto public API (create_llmisvc_from_config) or formally documenting its contract as semi-public. The single-point dependency makes this a low-urgency refactor, but aligning the API surface reduces future friction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/model_serving/model_server/upgrade/conftest.py` around lines 838 - 839, The test imports the private function _create_llmisvc_from_config from tests.model_serving.model_server.llmd.conftest which creates a fragile cross-module dependency; update the module to expose a public factory (create_llmisvc_from_config) or add a documented semi-public alias that forwards to _create_llmisvc_from_config, then change this test import to use create_llmisvc_from_config (and keep TinyLlamaOciConfig as-is) so callers rely on the public symbol instead of the underscored private function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/model_serving/model_server/upgrade/utils.py`:
- Around line 340-352: The current check in the workload restart verifier
silently passes when get_llmd_workload_pods returns an empty list; add an
explicit check after calling get_llmd_workload_pods to raise
PodContainersRestartError (or reuse the same error type) if pods is empty
(similar to verify_llmd_router_not_restarted's behavior), then continue with the
existing loop that populates restarted_containers; reference
get_llmd_workload_pods, restarted_containers, PodContainersRestartError and
verify_llmd_router_not_restarted when implementing this early-empty-pods guard
so missing deployments are reported instead of being ignored.
---
Nitpick comments:
In `@tests/model_serving/model_server/upgrade/conftest.py`:
- Around line 838-839: The test imports the private function
_create_llmisvc_from_config from tests.model_serving.model_server.llmd.conftest
which creates a fragile cross-module dependency; update the module to expose a
public factory (create_llmisvc_from_config) or add a documented semi-public
alias that forwards to _create_llmisvc_from_config, then change this test import
to use create_llmisvc_from_config (and keep TinyLlamaOciConfig as-is) so callers
rely on the public symbol instead of the underscored private function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: cc66a6e1-a19e-4e44-9606-e847568bcda4
📒 Files selected for processing (5)
tests/model_serving/model_server/llmd/conftest.pytests/model_serving/model_server/llmd/llmd_configs/config_base.pytests/model_serving/model_server/upgrade/conftest.pytests/model_serving/model_server/upgrade/test_upgrade_llmd.pytests/model_serving/model_server/upgrade/utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/model_serving/model_server/upgrade/test_upgrade_llmd.py
- tests/model_serving/model_server/llmd/llmd_configs/config_base.py
- tests/model_serving/model_server/llmd/conftest.py
|
Status of building tag latest: success. |
Pull Request
Summary
Adds upgrade tests for
LLMInferenceService(llm-d) following the existing pre/post upgrade pattern, verifying that the LLMISVC, gateway, and inference endpoint survive an operator upgrade without pod restarts or downtime.Also increases the default LLMISVC wait timeout and makes the teardown flag configurable in the shared conftest.
Related Issues
How it has been tested
Summary by CodeRabbit
Tests
Chores