[RHOAIENG-34594] Add test case for singlenode with estimated prefix cache by threcc · Pull Request #907 · opendatahub-io/opendatahub-tests

threcc · 2025-12-03T17:38:20Z

Description

Add a new test case to cover the Single-node Estimated Prefix Cache scenario, check the comments #907 (comment)

Jira issue tracker: https://issues.redhat.com/browse/RHOAIENG-34594

How Has This Been Tested?

Run the test locally vs a cluster with GPU (tesla t4)

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Tests
- Added end-to-end tests validating single-node estimated prefix-cache routing, request distribution, and GPU-aware skipping.
- Added fixtures for authenticated access and single-node cache-enabled service setup.
- Introduced test helpers to discover service pods, send authenticated test requests, and verify Prometheus metrics (request counts and cache hit rate).
- Added constants for health checks and prefix-cache configuration used by the tests.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-03T17:38:40Z

The following are automatically added/executed:

PR size label.
Run pre-commit
Run tox
Add PR author as the PR assignee
Build image based on the PR

Available user actions:

To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
lgtm label removed on each new commit push.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.

Supported labels

{'/lgtm', '/build-push-pr-image', '/verified', '/cherry-pick', '/wip', '/hold'}

coderabbitai · 2025-12-04T11:23:21Z

📝 Walkthrough

Walkthrough

Adds LLMD prefix-cache support and tests: new constants and liveness probe, two pytest fixtures, Prometheus-aware LLMD test utilities, and an integration test validating single-node estimated prefix-cache routing and metrics.

Changes

Cohort / File(s)	Summary
Fixtures & Config Constants `tests/model_serving/model_server/llmd/conftest.py`, `tests/model_serving/model_server/llmd/constants.py`	Added `singlenode_estimated_prefix_cache` and `authenticated_llmisvc_token` fixtures; introduced `LLMD_LIVENESS_PROBE`, `PREFIX_CACHE_BLOCK_SIZE`, `PREFIX_CACHE_HASH_ALGO`, `PREFIX_CACHE_HASH_SEED`, and `ROUTER_SCHEDULER_CONFIG_ESTIMATED_PREFIX_CACHE`.
Testing Utilities `tests/model_serving/model_server/llmd/utils.py`	Added Prometheus-aware helpers: `get_llmd_workload_pods()`, `get_llmd_router_scheduler_pod()`, `send_prefix_cache_test_requests()`, `get_metrics_request_count_per_pod()`, `get_metrics_prefix_cache_hit_rate()`, and `verify_estimated_prefix_cache_metrics()` plus related imports. Note: similar utility blocks appear duplicated in the file.
Integration Test `tests/model_serving/model_server/llmd/test_singlenode_estimated_prefix_cache.py`	New test class `TestSingleNodeEstimatedPrefixCache` with `test_singlenode_estimated_prefix_cache` validating gateway/LLM service readiness, router-scheduler and workload pod counts, request dispatch with authenticated token, and Prometheus-based metric verification.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Verify Prometheus queries, metric names, and time/window parameters in utils.py.
Inspect and deduplicate repeated utility blocks in utils.py.
Review fixture logic and dynamic token resolution in conftest.py.
Check pod selection/readiness logic and potential race conditions in the integration test.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the main change: adding a test case for single-node with estimated prefix cache functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

tests/model_serving/model_server/llmd/utils.py (4)

197-223: Code duplication with get_llmd_pods inner function.

The logic in get_llmd_workload_pods duplicates the get_llmd_pods inner function defined at lines 112-126 within verify_llmd_no_failed_pods. Consider extracting and reusing get_llmd_workload_pods in verify_llmd_no_failed_pods to eliminate duplication.

 def verify_llmd_no_failed_pods(
     client: DynamicClient,
     llm_service: LLMInferenceService,
     timeout: int = 300,
 ) -> None:
     # ... docstring ...
     from utilities.exceptions import FailedPodsError
     from ocp_resources.resource import Resource

     LOGGER.info(f"Comprehensive health check for LLMInferenceService {llm_service.name}")

     container_wait_base_errors = ["InvalidImageName", "CrashLoopBackOff", "ImagePullBackOff", "ErrImagePull"]
     container_terminated_base_errors = [Resource.Status.ERROR, "CrashLoopBackOff"]

-    def get_llmd_pods():
-        """Get LLMD workload pods for this LLMInferenceService."""
-        pods = []
-        for pod in Pod.get(
-            dyn_client=client,
-            namespace=llm_service.namespace,
-            label_selector=(
-                f"{Pod.ApiGroup.APP_KUBERNETES_IO}/part-of=llminferenceservice,"
-                f"{Pod.ApiGroup.APP_KUBERNETES_IO}/name={llm_service.name}"
-            ),
-        ):
-            labels = pod.instance.metadata.get("labels", {})
-            if labels.get("kserve.io/component") == "workload":
-                pods.append(pod)
-        return pods
+    def get_llmd_pods():
+        return get_llmd_workload_pods(client=client, llmisvc=llm_service)

     for pods in TimeoutSampler(

264-287: Narrow exception handling and make log window configurable.

The bare Exception catch loses valuable diagnostic context. Consider catching specific exceptions (e.g., kubernetes.client.exceptions.ApiException) or at least logging the exception type.
The hardcoded since_seconds=120 may be insufficient in slower test environments.

-def count_chat_completions_requests_in_pod(pod: Pod) -> int:
+def count_chat_completions_requests_in_pod(pod: Pod, since_seconds: int = 120) -> int:
     """
     Count POST /v1/chat/completions requests in pod logs.

     Args:
         pod: The vLLM workload pod to check
+        since_seconds: Time window for log retrieval (default: 120)

     Returns:
         Number of successful chat completion requests found in logs
     """
     try:
-        logs = pod.log(container="main", since_seconds=120)
+        logs = pod.log(container="main", since_seconds=since_seconds)

         # Match: "POST /v1/chat/completions HTTP/1.1" 200
         pattern = r"POST /v1/chat/completions HTTP/1.1.*200"
         matches = re.findall(pattern, logs)
         # ...
-    except Exception as e:
-        LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}")
+    except Exception as e:  # noqa: BLE001
+        LOGGER.warning(f"Failed to count requests for pod {pod.name}: {type(e).__name__}: {e}")
         return 0

290-323: Remove unused parameters and address potential race condition.

The query and timestamp_before parameters are documented as unused and kept for "signature compatibility." If no callers use them, remove them to avoid confusion.
The hardcoded time.sleep(5) could be insufficient for log propagation in some environments. Consider making it configurable.
If multiple pods show increased request counts (possible in race conditions), only the first is returned without warning.

 def get_pod_that_handled_request(
     workload_pods: list[Pod],
-    query: str,
-    timestamp_before: float,
     baseline_counts: dict[str, int],
+    wait_seconds: float = 5.0,
 ) -> str | None:
     """
     Determine which pod handled a request by counting POST requests.

     Args:
         workload_pods: List of vLLM workload pods
-        query: Not used (kept for signature compatibility)
-        timestamp_before: Not used (kept for signature compatibility)
         baseline_counts: Dict of {pod_name: request_count} before this request
+        wait_seconds: Time to wait for logs to propagate (default: 5.0)

     Returns:
         Pod name that handled the request, or None if not found
     """
-    time.sleep(5)
+    time.sleep(wait_seconds)

     current_counts = {}
     for pod in workload_pods:
         current_counts[pod.name] = count_chat_completions_requests_in_pod(pod)

+    handling_pods = []
     for pod in workload_pods:
         baseline = baseline_counts.get(pod.name, 0)
         current = current_counts.get(pod.name, 0)

         if current > baseline:
             LOGGER.info(f"Pod {pod.name} handled request: {baseline} -> {current} (+{current - baseline})")
-            return pod.name
+            handling_pods.append(pod.name)
+
+    if len(handling_pods) == 1:
+        return handling_pods[0]
+    elif len(handling_pods) > 1:
+        LOGGER.warning(f"Multiple pods show increased counts: {handling_pods}, returning first")
+        return handling_pods[0]

     LOGGER.warning("Could not determine which pod handled request")
     return None

361-393: Rename unused loop variable and consider extracting repeated logic.

The loop variable i is unused; rename to _ per Python convention.
The inference request logic is duplicated between Phase 1 (lines 362-388) and Phase 2 (lines 410-436). Consider extracting a helper function.

-    for i in range(3):
+    for _ in range(3):
         inference_config = {

Optionally, extract the repeated inference/tracking logic:

def _send_and_track_request(
    llmisvc: LLMInferenceService,
    prompt: str,
    token: str,
    workload_pods: list[Pod],
    baseline_counts: dict[str, int],
) -> str | None:
    """Send inference request and track which pod handled it."""
    inference_config = {
        "default_query_model": {
            "query_input": prompt,
            "query_output": r".*",
            "use_regex": True,
        },
        "chat_completions": TINYLLAMA_INFERENCE_CONFIG["chat_completions"],
    }
    verify_inference_response_llmd(
        llm_service=llmisvc,
        inference_config=inference_config,
        inference_type="chat_completions",
        protocol=Protocols.HTTPS,
        use_default_query=True,
        insecure=False,
        model_name=llmisvc.instance.spec.model.name,
        token=token,
        authorized_user=True,
    )
    return get_pod_that_handled_request(workload_pods, baseline_counts)

tests/model_serving/model_server/llmd/conftest.py (2)

372-390: Complex environment variable string is fragile.

The VLLM_ADDITIONAL_ARGS value contains embedded JSON with escaped quotes and Go templates. While functional, this is difficult to maintain and debug.

Consider constructing the JSON programmatically and using json.dumps() for reliability:
import json

kv_transfer_config = json.dumps({"kv_connector": "NixlConnector", "kv_role": "kv_both"})
kv_events_config_template = json.dumps({
    "enable_kv_cache_events": True,
    "publisher": "zmq",
    "endpoint": "tcp://{{ ChildName .ObjectMeta.Name `-epp-service` }}:5557",
    "topic": "kv@${POD_IP}@${MODEL_NAME}",
})

vllm_args = (
    f"--prefix-caching-hash-algo {PREFIX_CACHE_HASH_ALGO} "
    f"--block-size {PREFIX_CACHE_BLOCK_SIZE} "
    f"--kv_transfer_config '{kv_transfer_config}' "
    f"--kv-events-config '{kv_events_config_template}'"
)
161-167: Use LLMD_LIVENESS_PROBE constant consistently.

The llmd_inference_service_gpu fixture (lines 161-167) defines the liveness probe inline with identical values to LLMD_LIVENESS_PROBE. For consistency and DRY, use the constant in both places.
+from tests.model_serving.model_server.llmd.constants import LLMD_LIVENESS_PROBE
 # ... in llmd_inference_service_gpu fixture ...
-    liveness_probe = {
-        "httpGet": {"path": "/health", "port": 8000, "scheme": "HTTPS"},
-        "initialDelaySeconds": 120,
-        "periodSeconds": 30,
-        "timeoutSeconds": 30,
-        "failureThreshold": 5,
-    }
+    liveness_probe = LLMD_LIVENESS_PROBE
Also applies to: 391-391

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (2)

14-25: Move module docstring to the top of the file.

The module docstring is placed after imports (lines 14-25), which is unconventional. Per PEP 257, module docstrings should be the first statement in the file.

+"""
+Test Single-Node Precise Prefix Caching.
+
+This test verifies that the LLM-D router correctly routes inference requests
+based on cache state, maximizing prefix cache hits.
+
+Test configuration:
+- LLMInferenceService with 2 replicas and router enabled
+- Authentication enabled
+- Verify router pod and vLLM pods are running
+- Send multiple requests with shared prefixes and size greater than PREFIX_CACHE_BLOCK_SIZE
+"""
+
 import pytest
 from kubernetes.dynamic import DynamicClient
 from ocp_resources.llm_inference_service import LLMInferenceService

 from tests.model_serving.model_server.llmd.utils import (
     get_llmd_router_scheduler_pod,
     get_llmd_workload_pods,
     verify_gateway_status,
     verify_llm_service_status,
     verify_singlenode_prefix_cache_routing,
 )
 from simple_logger.logger import get_logger

-"""
-Test Single-Node Precise Prefix Caching.
-...
-"""
-
 LOGGER = get_logger(name=__name__)

41-61: Add comment explaining llmd_gateway dependency and consider instance attribute.

The llmd_gateway parameter ensures the gateway fixture runs first. Add a comment for clarity.
Storing the token as a class attribute (TestSingleNodePrecisePrefixCache.auth_token) could cause issues if tests run in parallel with pytest-xdist. Consider using a class-scoped fixture that yields the token instead.

     @pytest.fixture(scope="class", autouse=True)
     def setup_auth(
         self,
-        llmd_gateway,
+        llmd_gateway,  # Required: ensures gateway is ready before auth setup
         singlenode_precise_prefix_cache,
         llmd_s3_service_account,
         llmisvc_auth_token,
         llmisvc_auth_view_role,
         llmisvc_auth_role_binding,
     ):
         """Set up authentication for single-node prefix cache test."""
         # Create token with RBAC resources using factory fixtures
         token = llmisvc_auth_token(
             service_account=llmd_s3_service_account,
             llmisvc=singlenode_precise_prefix_cache,
             view_role_factory=llmisvc_auth_view_role,
             role_binding_factory=llmisvc_auth_role_binding,
         )

-        # Store token as class attribute for use in tests
-        TestSingleNodePrecisePrefixCache.auth_token = token
+        # Store on instance - safer if tests run in parallel
+        self.__class__.auth_token = token

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23e2790 and 28f7e15.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

tests/model_serving/model_server/llmd/conftest.py (3 hunks)
tests/model_serving/model_server/llmd/constants.py (1 hunks)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1 hunks)
tests/model_serving/model_server/llmd/utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/model_serving/model_server/llmd/utils.py (3)

utilities/constants.py (1)

Protocols (97-104)

utilities/exceptions.py (1)

PodContainersRestartError (109-110)

utilities/llmd_utils.py (1)

verify_inference_response_llmd (393-454)

tests/model_serving/model_server/llmd/conftest.py (2)

utilities/llmd_utils.py (1)

create_llmisvc (149-341)

utilities/constants.py (2)

ResourceLimits (241-263)

GPU (252-263)

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (2)

tests/model_serving/model_server/llmd/utils.py (5)

get_llmd_router_scheduler_pod (226-251)

get_llmd_workload_pods (197-223)

verify_gateway_status (28-49)

verify_llm_service_status (52-73)

verify_singlenode_prefix_cache_routing (326-443)

tests/conftest.py (1)

gpu_count_on_cluster (793-814)

🪛 Ruff (0.14.7)

tests/model_serving/model_server/llmd/utils.py

285-285: Do not catch blind exception: Exception

(BLE001)

292-292: Unused function argument: query

(ARG001)

293-293: Unused function argument: timestamp_before

(ARG001)

362-362: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

tests/model_serving/model_server/llmd/conftest.py

347-347: Unused function argument: llmd_s3_secret

(ARG001)

349-349: Unused function argument: llmd_gateway

(ARG001)

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py

44-44: Unused method argument: llmd_gateway

(ARG002)

🔇 Additional comments (3)

tests/model_serving/model_server/llmd/conftest.py (1)
343-350: Fixture dependencies are intentional for ordering.

The llmd_s3_secret and llmd_gateway parameters appear unused but are required to ensure proper fixture ordering — llmd_s3_secret must exist before llmd_s3_service_account is used, and llmd_gateway must be created before the service.

Consider adding brief comments to clarify this intent for future maintainers.
 @pytest.fixture(scope="class")
 def singlenode_precise_prefix_cache(
     admin_client: DynamicClient,
     unprivileged_model_namespace: Namespace,
-    llmd_s3_secret: Secret,
+    llmd_s3_secret: Secret,  # Required: creates S3 secret before service account
     llmd_s3_service_account: ServiceAccount,
-    llmd_gateway,
+    llmd_gateway,  # Required: ensures gateway is ready before service creation
 ) -> Generator[LLMInferenceService, None, None]:
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
63-92: LGTM! Well-structured test with clear preconditions.

The test properly:

Skips when insufficient GPUs are available

Validates infrastructure readiness before routing tests

Uses descriptive assertion messages

Delegates complex routing validation to a helper function

One minor suggestion: the llmd_gateway parameter appears unused in the method signature but is likely needed for fixture ordering. Add a brief comment or use _ prefix convention.
     def test_singlenode_precise_prefix_cache(
         self,
         unprivileged_client: DynamicClient,
-        llmd_gateway,
+        llmd_gateway,  # noqa: ARG002 - Required for fixture ordering
         singlenode_precise_prefix_cache: LLMInferenceService,
         gpu_count_on_cluster: int,
     ):
tests/model_serving/model_server/llmd/constants.py (1)

1-44: LGTM! Well-organized constants.

The configuration constants are clearly defined and appropriately grouped. The liveness probe settings and scheduler configuration follow the expected API patterns. The PREFIX_CACHE_HASH_SEED is correctly defined as a string "42", which aligns with vLLM's expectation for hash seed values (read from the PYTHONHASHSEED environment variable).

mwaykole

/lgtm

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

tests/model_serving/model_server/llmd/conftest.py (1)
343-441: singlenode_precise_prefix_cache fixture: clean overall; drop redundant secret arg and make gateway dependency explicit

The fixture wiring into create_llmisvc looks solid and matches the existing LLMD patterns. Two small cleanups:

llmd_s3_secret is unused here and redundant because llmd_s3_service_account already depends on it; you can safely drop it from the signature.

llmd_gateway is intentionally unused but forces the gateway fixture to run; it’s worth touching it explicitly so this is clear and to satisfy ARG001.

For example:
 @pytest.fixture(scope="class")
 def singlenode_precise_prefix_cache(
     admin_client: DynamicClient,
     unprivileged_model_namespace: Namespace,
-    llmd_s3_secret: Secret,
     llmd_s3_service_account: ServiceAccount,
     llmd_gateway,
 ) -> Generator[LLMInferenceService, None, None]:
     """LLMInferenceService fixture for single-node precise prefix cache test."""
 
+    # Ensure llmd_gateway fixture is instantiated even though we don't use it directly here
+    _ = llmd_gateway
+
     with create_llmisvc(
         client=admin_client,
tests/model_serving/model_server/llmd/utils.py (3)
254-287: Broad exception handling in count_chat_completions_requests_in_pod

Catching a blanket Exception here hides all failures (including programming errors) and just returns 0, which could make debugging routing issues harder.

Given this is test-only code and log scraping is inherently flaky, two options:

Narrow the exception to the known pod/log retrieval errors, or

Keep the broad catch but explicitly mark it as intentional for linting:
-    except Exception as e:
-        LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}")
-        return 0
+    except Exception as e:  # noqa: BLE001 - tolerate log scraping failures in tests
+        LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}")
+        return 0
361-363: Rename unused loop variable in Phase 1 for clarity

The loop index isn’t used; renaming it to _ makes that explicit and satisfies linting:
-    phase1_pods = []
-    for i in range(3):
+    phase1_pods = []
+    for _ in range(3):
197-223: Pod selection logic duplicated with verify_llmd_no_failed_pods helper

get_llmd_workload_pods uses the same label selector and filtering as the inner get_llmd_pods inside verify_llmd_no_failed_pods above. To keep things DRY and avoid future divergence, consider refactoring verify_llmd_no_failed_pods to call this new helper instead of maintaining two copies of the same logic.

No behavior change is required for this PR; this is a good follow-up cleanup target.
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
41-61: Clarify intentional use of llmd_gateway in setup_auth to satisfy lint

llmd_gateway in setup_auth isn’t referenced, but it’s clearly there to ensure the gateway fixture is instantiated before creating the auth token and service. To make that intent explicit and silence ARG002, you can touch it:
     def setup_auth(
         self,
         llmd_gateway,
         singlenode_precise_prefix_cache,
@@
     ):
         """Set up authentication for single-node prefix cache test."""
-        # Create token with RBAC resources using factory fixtures
+        # Force llmd_gateway fixture instantiation even though we don't use it directly
+        _ = llmd_gateway
+
+        # Create token with RBAC resources using factory fixtures
         token = llmisvc_auth_token(

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28f7e15 and 159fd07.

📒 Files selected for processing (4)

tests/model_serving/model_server/llmd/conftest.py (3 hunks)
tests/model_serving/model_server/llmd/constants.py (1 hunks)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1 hunks)
tests/model_serving/model_server/llmd/utils.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/model_serving/model_server/llmd/constants.py

🧰 Additional context used

🧬 Code graph analysis (2)

tests/model_serving/model_server/llmd/utils.py (2)

utilities/constants.py (1)

Protocols (97-104)

utilities/llmd_utils.py (1)

verify_inference_response_llmd (393-454)

tests/model_serving/model_server/llmd/conftest.py (3)

utilities/llmd_utils.py (1)

create_llmisvc (149-341)

utilities/constants.py (4)

ModelStorage (289-306)

ResourceLimits (241-263)

GPU (252-263)

Timeout (227-238)

utilities/llmd_constants.py (2)

ModelStorage (30-37)

ModelNames (46-50)

🪛 Ruff (0.14.7)

tests/model_serving/model_server/llmd/utils.py

285-285: Do not catch blind exception: Exception

(BLE001)

292-292: Unused function argument: query

(ARG001)

293-293: Unused function argument: timestamp_before

(ARG001)

362-362: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

tests/model_serving/model_server/llmd/conftest.py

347-347: Unused function argument: llmd_s3_secret

(ARG001)

349-349: Unused function argument: llmd_gateway

(ARG001)

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py

44-44: Unused method argument: llmd_gateway

(ARG002)

tests/model_serving/model_server/llmd/utils.py

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py

tests/model_serving/model_server/llmd/utils.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
41-61: Consider yielding the token from the fixture instead of using a class attribute.

Storing the token as a class attribute (TestSingleNodePrecisePrefixCache.auth_token) is a less idiomatic pytest pattern. Consider yielding the token from a separate fixture so tests can access it directly via fixture injection. This improves test isolation and follows pytest conventions.

Example approach:
@pytest.fixture(scope="class")
def prefix_cache_auth_token(
    llmd_gateway,
    singlenode_precise_prefix_cache,
    llmd_s3_service_account,
    llmisvc_auth_token,
    llmisvc_auth_view_role,
    llmisvc_auth_role_binding,
):
    """Set up authentication and yield token for single-node prefix cache test."""
    token = llmisvc_auth_token(
        service_account=llmd_s3_service_account,
        llmisvc=singlenode_precise_prefix_cache,
        view_role_factory=llmisvc_auth_view_role,
        role_binding_factory=llmisvc_auth_role_binding,
    )
    yield token
Then inject prefix_cache_auth_token directly into the test method.
tests/model_serving/model_server/llmd/utils.py (1)
301-301: Fix time.sleep keyword argument - will cause TypeError.

time.sleep() does not accept keyword arguments. This will raise TypeError: sleep() takes no keyword arguments at runtime.
-    time.sleep(secs=5)
+    time.sleep(5)  # Allow logs to flush before counting requests

🧹 Nitpick comments (4)

tests/model_serving/model_server/llmd/conftest.py (1)
161-167: Consider reusing LLMD_LIVENESS_PROBE constant for consistency.

This inline liveness probe definition is identical to the new LLMD_LIVENESS_PROBE constant in constants.py. Consider using the constant here to reduce duplication and ensure consistency.
+from tests.model_serving.model_server.llmd.constants import LLMD_LIVENESS_PROBE
 ...
-    liveness_probe = {
-        "httpGet": {"path": "/health", "port": 8000, "scheme": "HTTPS"},
-        "initialDelaySeconds": 120,
-        "periodSeconds": 30,
-        "timeoutSeconds": 30,
-        "failureThreshold": 5,
-    }
+    liveness_probe = LLMD_LIVENESS_PROBE
tests/model_serving/model_server/llmd/utils.py (3)
285-287: Consider catching more specific exceptions.

While the broad exception handling provides resilience for this log-counting helper, consider catching more specific exceptions (e.g., kubernetes.client.exceptions.ApiException) to avoid masking unexpected errors during debugging.

355-355: Rename unused loop variable.

The loop variable i is not used. Rename to _ to indicate intentional non-use.
-    for i in range(3):
+    for _ in range(3):
112-126: Refactor to use get_llmd_workload_pods utility.

The internal get_llmd_pods() function duplicates the logic of the new get_llmd_workload_pods() utility (lines 197-223). Consider refactoring to use the shared utility.
     def get_llmd_pods():
         """Get LLMD workload pods for this LLMInferenceService."""
-        pods = []
-        for pod in Pod.get(
-            dyn_client=client,
-            namespace=llm_service.namespace,
-            label_selector=(
-                f"{Pod.ApiGroup.APP_KUBERNETES_IO}/part-of=llminferenceservice,"
-                f"{Pod.ApiGroup.APP_KUBERNETES_IO}/name={llm_service.name}"
-            ),
-        ):
-            labels = pod.instance.metadata.get("labels", {})
-            if labels.get("kserve.io/component") == "workload":
-                pods.append(pod)
-        return pods
+        return get_llmd_workload_pods(client=client, llmisvc=llm_service)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 159fd07 and 12c2b2f.

📒 Files selected for processing (4)

tests/model_serving/model_server/llmd/conftest.py (3 hunks)
tests/model_serving/model_server/llmd/constants.py (1 hunks)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1 hunks)
tests/model_serving/model_server/llmd/utils.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/model_serving/model_server/llmd/utils.py (4)

utilities/constants.py (1)

Protocols (97-104)

utilities/exceptions.py (1)

PodContainersRestartError (109-110)

utilities/llmd_utils.py (1)

verify_inference_response_llmd (393-454)

tests/model_serving/conftest.py (1)

protocol (26-36)

tests/model_serving/model_server/llmd/conftest.py (1)

utilities/llmd_utils.py (1)

create_llmisvc (149-341)

🪛 Ruff (0.14.7)

tests/model_serving/model_server/llmd/utils.py

285-285: Do not catch blind exception: Exception

(BLE001)

355-355: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

tests/model_serving/model_server/llmd/conftest.py

347-347: Unused function argument: llmd_s3_secret

(ARG001)

349-349: Unused function argument: llmd_gateway

(ARG001)

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py

44-44: Unused method argument: llmd_gateway

(ARG002)

🔇 Additional comments (4)

tests/model_serving/model_server/llmd/conftest.py (1)

343-350: Fixture dependency parameters are correctly used for ordering.

The llmd_s3_secret and llmd_gateway parameters are intentionally declared to ensure proper fixture ordering/dependency resolution even though they're not directly referenced in the function body. This is a valid pytest pattern. Consider adding a brief comment to clarify this intent for future maintainers.

tests/model_serving/model_server/llmd/constants.py (1)

1-44: Well-organized constants module.

The constants are clearly documented and properly structured. The scheduler configuration correctly references the prefix cache parameters for consistency.

tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)

63-94: Integration test structure is acceptable, though could be modularized.

While a previous reviewer suggested splitting into smaller tests, the current structure is reasonable for an end-to-end integration test where phases are sequential and interdependent. The assertions at each step provide clear failure points.

tests/model_serving/model_server/llmd/utils.py (1)

319-439: Well-structured routing verification with clear test phases.

The function properly validates prefix cache routing behavior:

Phase 1 verifies repeated identical prompts route to the same pod (full cache hit)

Phase 2 verifies prompts with shared prefixes route together (partial cache hit)

The baseline counting approach for request attribution is reasonable given the log-based detection method.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/model_serving/model_server/llmd/utils.py (1)
401-402: Replace time.sleep() with TimeoutSampler and remove invalid noqa directive.

A past reviewer requested replacing the fixed sleep with TimeoutSampler to poll for metrics availability. The noqa: FCN001 directive is also invalid and should be removed.

Based on past review comments, use TimeoutSampler to poll Prometheus until metrics are available rather than blindly sleeping:
-    # Wait for Prometheus to scrape metrics
-    LOGGER.info("Waiting 30 seconds for Prometheus to scrape metrics")
-    time.sleep(30)  # noqa: FCN001
+    # Poll Prometheus for metrics availability
+    LOGGER.info("Waiting for Prometheus to scrape metrics")
+    for _ in TimeoutSampler(
+        wait_timeout=60,
+        sleep=5,
+        func=lambda: get_metrics_request_count_per_pod(prometheus, llmisvc, workload_pods),
+        exceptions=(Exception,),
+    ):
+        # Verify we have non-zero metrics before proceeding
+        if sum(get_metrics_request_count_per_pod(prometheus, llmisvc, workload_pods).values()) > 0:
+            break

🧹 Nitpick comments (1)

tests/model_serving/model_server/llmd/utils.py (1)
419-419: Prefer next() over list comprehension with index.

Using next() is more idiomatic and provides clearer error handling if no match is found.

Apply this diff:
-    active_pod = [name for name, count in pods_request_counts.items() if count == expected_requests][0]
+    active_pod = next(name for name, count in pods_request_counts.items() if count == expected_requests)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbcc5ae and 4d58233.

📒 Files selected for processing (1)

tests/model_serving/model_server/llmd/utils.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/model_serving/model_server/llmd/utils.py (6)

tests/conftest.py (1)

prometheus (665-673)

utilities/constants.py (1)

Protocols (97-104)

utilities/exceptions.py (1)

PodContainersRestartError (109-110)

utilities/llmd_utils.py (1)

verify_inference_response_llmd (393-454)

utilities/monitoring.py (1)

get_metrics_value (10-24)

tests/model_serving/conftest.py (1)

protocol (26-36)

🪛 Flake8 (7.3.0)

tests/model_serving/model_server/llmd/utils.py

[error] 312-312: undefined name 'i'

(F821)

🪛 Ruff (0.14.8)

tests/model_serving/model_server/llmd/utils.py

311-311: Do not catch blind exception: Exception

(BLE001)

312-312: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

312-312: Undefined name i

(F821)

402-402: Unused noqa directive (unknown: FCN001)

Remove unused noqa directive

(RUF100)

419-419: Prefer next(...) over single element slice

Replace with next(...)

(RUF015)

tests/model_serving/model_server/llmd/utils.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/model_serving/model_server/llmd/utils.py (1)

254-317: Tighten exception handling and log with traceback in prefix‑cache request loop

The loop does the right thing in terms of driving repeated requests, but the except Exception as e + LOGGER.error(...) pattern is both noisy for static analysis and hides the traceback that would be useful when debugging failures.

Given verify_inference_response_llmd documents raising InferenceResponseError and ValueError, you can narrow the catch and use LOGGER.exception:

-from utilities.exceptions import PodContainersRestartError
+from utilities.exceptions import PodContainersRestartError, InferenceResponseError

-        try:
-            verify_inference_response_llmd(
-                llm_service=llmisvc,
-                inference_config=inference_config,
-                inference_type="chat_completions",
-                protocol=Protocols.HTTPS,
-                use_default_query=True,
-                insecure=False,
-                model_name=llmisvc.instance.spec.model.name,
-                token=token,
-                authorized_user=True,
-            )
-            successful_requests += 1
-        except Exception as e:
-            LOGGER.error(f"Request {index + 1} failed: {e}")
-            failed_requests += 1
+        try:
+            verify_inference_response_llmd(
+                llm_service=llmisvc,
+                inference_config=inference_config,
+                inference_type="chat_completions",
+                protocol=Protocols.HTTPS,
+                use_default_query=True,
+                insecure=False,
+                model_name=llmisvc.instance.spec.model.name,
+                token=token,
+                authorized_user=True,
+            )
+            successful_requests += 1
+        except (InferenceResponseError, ValueError) as exc:
+            LOGGER.exception(f"Request {index + 1} failed validation: {exc}")
+            failed_requests += 1

This keeps the “keep going and count failures” behavior, but avoids a blind catch‑all and gives you stack traces in logs, which should also satisfy the BLE001/TRY400 lints.

🧹 Nitpick comments (1)

tests/model_serving/model_server/llmd/utils.py (1)

197-251: Workload/router pod discovery helpers look correct; consider de‑duplicating label selector logic

The new get_llmd_workload_pods and get_llmd_router_scheduler_pod helpers use selectors consistent with the inner get_llmd_pods in verify_llmd_no_failed_pods, and the label keys/values look right for workload vs router-scheduler pods.

To avoid drift if labels change later, consider reusing these helpers (or a shared selector builder) inside verify_llmd_no_failed_pods instead of duplicating the selector and workload filter logic.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d58233 and cab8064.

📒 Files selected for processing (1)

tests/model_serving/model_server/llmd/utils.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/model_serving/model_server/llmd/utils.py (4)

utilities/constants.py (1)

Protocols (97-104)

utilities/exceptions.py (1)

PodContainersRestartError (109-110)

utilities/llmd_utils.py (1)

verify_inference_response_llmd (393-454)

utilities/monitoring.py (1)

get_metrics_value (10-24)

🪛 Ruff (0.14.8)

tests/model_serving/model_server/llmd/utils.py

310-310: Do not catch blind exception: Exception

(BLE001)

311-311: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

417-417: Prefer next(...) over single element slice

Replace with next(...)

(RUF015)

tests/model_serving/model_server/llmd/utils.py

pierDipi

/lgtm
/approve

github-actions · 2025-12-10T09:47:02Z

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

…ache (opendatahub-io#907)

github-actions bot added the ModelServing label Dec 3, 2025

github-actions bot assigned threcc Dec 3, 2025

github-actions bot added the size/xl label Dec 3, 2025

threcc force-pushed the singlenode-precise-prefix branch 3 times, most recently from 6222ea4 to 724334f Compare December 4, 2025 11:18

github-actions bot added size/xxl and removed size/xl labels Dec 4, 2025

threcc marked this pull request as ready for review December 4, 2025 11:19

threcc requested review from a team, Raghul-M, brettmthompson and mwaykole as code owners December 4, 2025 11:19

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

mwaykole previously approved these changes Dec 4, 2025

View reviewed changes

rhods-ci-bot added the lgtm-by-mwaykole label Dec 4, 2025

threcc dismissed mwaykole’s stale review via 159fd07 December 4, 2025 12:19

threcc force-pushed the singlenode-precise-prefix branch from 28f7e15 to 159fd07 Compare December 4, 2025 12:19

rhods-ci-bot removed the lgtm-by-mwaykole label Dec 4, 2025

threcc enabled auto-merge (squash) December 4, 2025 12:22

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

tests/model_serving/model_server/llmd/utils.py Outdated Show resolved Hide resolved

threcc disabled auto-merge December 4, 2025 13:19

dbasunag reviewed Dec 4, 2025

View reviewed changes

rhods-ci-bot added the commented-by-dbasunag label Dec 4, 2025

threcc force-pushed the singlenode-precise-prefix branch from 159fd07 to 12c2b2f Compare December 4, 2025 14:07

rhods-ci-bot removed the commented-by-dbasunag label Dec 4, 2025

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

threcc force-pushed the singlenode-precise-prefix branch from 12c2b2f to 3c86b86 Compare December 4, 2025 15:21

address pr comments

4d58233

threcc dismissed stale reviews from mwaykole and fege via 4d58233 December 9, 2025 16:00

threcc force-pushed the singlenode-precise-prefix branch from c1c2d9a to 4d58233 Compare December 9, 2025 16:00

rhods-ci-bot added commented-by-threcc and removed lgtm-by-mwaykole changes-requested-by-dbasunag commented-by-threcc labels Dec 9, 2025

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

tests/model_serving/model_server/llmd/utils.py Show resolved Hide resolved

Use retry instead of time.sleep

cab8064

rhods-ci-bot added commented-by-threcc and removed commented-by-threcc labels Dec 9, 2025

Merge branch 'main' into singlenode-precise-prefix

ef5ec9f

rhods-ci-bot removed the commented-by-threcc label Dec 9, 2025

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

tests/model_serving/model_server/llmd/utils.py Show resolved Hide resolved

pierDipi approved these changes Dec 9, 2025

View reviewed changes

pierDipi requested a review from dbasunag December 9, 2025 17:43

dbasunag approved these changes Dec 9, 2025

View reviewed changes

rhods-ci-bot added the lgtm-by-dbasunag label Dec 9, 2025

Merge branch 'main' into singlenode-precise-prefix

dc0a37d

rhods-ci-bot removed the lgtm-by-dbasunag label Dec 10, 2025

mwaykole approved these changes Dec 10, 2025

View reviewed changes

rhods-ci-bot added the lgtm-by-mwaykole label Dec 10, 2025

Merge branch 'main' into singlenode-precise-prefix

5dd5460

rhods-ci-bot removed the lgtm-by-mwaykole label Dec 10, 2025

threcc enabled auto-merge (squash) December 10, 2025 09:36

threcc merged commit 2472b3a into opendatahub-io:main Dec 10, 2025
8 checks passed

threcc deleted the singlenode-precise-prefix branch December 11, 2025 14:24

mwaykole pushed a commit to mwaykole/opendatahub-tests that referenced this pull request Jan 23, 2026

[RHOAIENG-34594] Add test case for singlenode with estimated prefix c…

a8ac6f9

…ache (opendatahub-io#907)

Conversation

threcc commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mwaykole left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pierDipi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

threcc commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading