[RHOAIENG-34594] Add test case for singlenode with estimated prefix cache #907
[RHOAIENG-34594] Add test case for singlenode with estimated prefix cache #907threcc merged 16 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/lgtm', '/build-push-pr-image', '/verified', '/cherry-pick', '/wip', '/hold'} |
6222ea4 to
724334f
Compare
📝 WalkthroughWalkthroughAdds LLMD prefix-cache support and tests: new constants and liveness probe, two pytest fixtures, Prometheus-aware LLMD test utilities, and an integration test validating single-node estimated prefix-cache routing and metrics. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (8)
tests/model_serving/model_server/llmd/utils.py (4)
197-223: Code duplication withget_llmd_podsinner function.The logic in
get_llmd_workload_podsduplicates theget_llmd_podsinner function defined at lines 112-126 withinverify_llmd_no_failed_pods. Consider extracting and reusingget_llmd_workload_podsinverify_llmd_no_failed_podsto eliminate duplication.def verify_llmd_no_failed_pods( client: DynamicClient, llm_service: LLMInferenceService, timeout: int = 300, ) -> None: # ... docstring ... from utilities.exceptions import FailedPodsError from ocp_resources.resource import Resource LOGGER.info(f"Comprehensive health check for LLMInferenceService {llm_service.name}") container_wait_base_errors = ["InvalidImageName", "CrashLoopBackOff", "ImagePullBackOff", "ErrImagePull"] container_terminated_base_errors = [Resource.Status.ERROR, "CrashLoopBackOff"] - def get_llmd_pods(): - """Get LLMD workload pods for this LLMInferenceService.""" - pods = [] - for pod in Pod.get( - dyn_client=client, - namespace=llm_service.namespace, - label_selector=( - f"{Pod.ApiGroup.APP_KUBERNETES_IO}/part-of=llminferenceservice," - f"{Pod.ApiGroup.APP_KUBERNETES_IO}/name={llm_service.name}" - ), - ): - labels = pod.instance.metadata.get("labels", {}) - if labels.get("kserve.io/component") == "workload": - pods.append(pod) - return pods + def get_llmd_pods(): + return get_llmd_workload_pods(client=client, llmisvc=llm_service) for pods in TimeoutSampler(
264-287: Narrow exception handling and make log window configurable.
The bare
Exceptioncatch loses valuable diagnostic context. Consider catching specific exceptions (e.g.,kubernetes.client.exceptions.ApiException) or at least logging the exception type.The hardcoded
since_seconds=120may be insufficient in slower test environments.-def count_chat_completions_requests_in_pod(pod: Pod) -> int: +def count_chat_completions_requests_in_pod(pod: Pod, since_seconds: int = 120) -> int: """ Count POST /v1/chat/completions requests in pod logs. Args: pod: The vLLM workload pod to check + since_seconds: Time window for log retrieval (default: 120) Returns: Number of successful chat completion requests found in logs """ try: - logs = pod.log(container="main", since_seconds=120) + logs = pod.log(container="main", since_seconds=since_seconds) # Match: "POST /v1/chat/completions HTTP/1.1" 200 pattern = r"POST /v1/chat/completions HTTP/1.1.*200" matches = re.findall(pattern, logs) # ... - except Exception as e: - LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}") + except Exception as e: # noqa: BLE001 + LOGGER.warning(f"Failed to count requests for pod {pod.name}: {type(e).__name__}: {e}") return 0
290-323: Remove unused parameters and address potential race condition.
The
queryandtimestamp_beforeparameters are documented as unused and kept for "signature compatibility." If no callers use them, remove them to avoid confusion.The hardcoded
time.sleep(5)could be insufficient for log propagation in some environments. Consider making it configurable.If multiple pods show increased request counts (possible in race conditions), only the first is returned without warning.
def get_pod_that_handled_request( workload_pods: list[Pod], - query: str, - timestamp_before: float, baseline_counts: dict[str, int], + wait_seconds: float = 5.0, ) -> str | None: """ Determine which pod handled a request by counting POST requests. Args: workload_pods: List of vLLM workload pods - query: Not used (kept for signature compatibility) - timestamp_before: Not used (kept for signature compatibility) baseline_counts: Dict of {pod_name: request_count} before this request + wait_seconds: Time to wait for logs to propagate (default: 5.0) Returns: Pod name that handled the request, or None if not found """ - time.sleep(5) + time.sleep(wait_seconds) current_counts = {} for pod in workload_pods: current_counts[pod.name] = count_chat_completions_requests_in_pod(pod) + handling_pods = [] for pod in workload_pods: baseline = baseline_counts.get(pod.name, 0) current = current_counts.get(pod.name, 0) if current > baseline: LOGGER.info(f"Pod {pod.name} handled request: {baseline} -> {current} (+{current - baseline})") - return pod.name + handling_pods.append(pod.name) + + if len(handling_pods) == 1: + return handling_pods[0] + elif len(handling_pods) > 1: + LOGGER.warning(f"Multiple pods show increased counts: {handling_pods}, returning first") + return handling_pods[0] LOGGER.warning("Could not determine which pod handled request") return None
361-393: Rename unused loop variable and consider extracting repeated logic.
The loop variable
iis unused; rename to_per Python convention.The inference request logic is duplicated between Phase 1 (lines 362-388) and Phase 2 (lines 410-436). Consider extracting a helper function.
- for i in range(3): + for _ in range(3): inference_config = {Optionally, extract the repeated inference/tracking logic:
def _send_and_track_request( llmisvc: LLMInferenceService, prompt: str, token: str, workload_pods: list[Pod], baseline_counts: dict[str, int], ) -> str | None: """Send inference request and track which pod handled it.""" inference_config = { "default_query_model": { "query_input": prompt, "query_output": r".*", "use_regex": True, }, "chat_completions": TINYLLAMA_INFERENCE_CONFIG["chat_completions"], } verify_inference_response_llmd( llm_service=llmisvc, inference_config=inference_config, inference_type="chat_completions", protocol=Protocols.HTTPS, use_default_query=True, insecure=False, model_name=llmisvc.instance.spec.model.name, token=token, authorized_user=True, ) return get_pod_that_handled_request(workload_pods, baseline_counts)tests/model_serving/model_server/llmd/conftest.py (2)
372-390: Complex environment variable string is fragile.The
VLLM_ADDITIONAL_ARGSvalue contains embedded JSON with escaped quotes and Go templates. While functional, this is difficult to maintain and debug.Consider constructing the JSON programmatically and using
json.dumps()for reliability:import json kv_transfer_config = json.dumps({"kv_connector": "NixlConnector", "kv_role": "kv_both"}) kv_events_config_template = json.dumps({ "enable_kv_cache_events": True, "publisher": "zmq", "endpoint": "tcp://{{ ChildName .ObjectMeta.Name `-epp-service` }}:5557", "topic": "kv@${POD_IP}@${MODEL_NAME}", }) vllm_args = ( f"--prefix-caching-hash-algo {PREFIX_CACHE_HASH_ALGO} " f"--block-size {PREFIX_CACHE_BLOCK_SIZE} " f"--kv_transfer_config '{kv_transfer_config}' " f"--kv-events-config '{kv_events_config_template}'" )
161-167: UseLLMD_LIVENESS_PROBEconstant consistently.The
llmd_inference_service_gpufixture (lines 161-167) defines the liveness probe inline with identical values toLLMD_LIVENESS_PROBE. For consistency and DRY, use the constant in both places.+from tests.model_serving.model_server.llmd.constants import LLMD_LIVENESS_PROBE # ... in llmd_inference_service_gpu fixture ... - liveness_probe = { - "httpGet": {"path": "/health", "port": 8000, "scheme": "HTTPS"}, - "initialDelaySeconds": 120, - "periodSeconds": 30, - "timeoutSeconds": 30, - "failureThreshold": 5, - } + liveness_probe = LLMD_LIVENESS_PROBEAlso applies to: 391-391
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (2)
14-25: Move module docstring to the top of the file.The module docstring is placed after imports (lines 14-25), which is unconventional. Per PEP 257, module docstrings should be the first statement in the file.
+""" +Test Single-Node Precise Prefix Caching. + +This test verifies that the LLM-D router correctly routes inference requests +based on cache state, maximizing prefix cache hits. + +Test configuration: +- LLMInferenceService with 2 replicas and router enabled +- Authentication enabled +- Verify router pod and vLLM pods are running +- Send multiple requests with shared prefixes and size greater than PREFIX_CACHE_BLOCK_SIZE +""" + import pytest from kubernetes.dynamic import DynamicClient from ocp_resources.llm_inference_service import LLMInferenceService from tests.model_serving.model_server.llmd.utils import ( get_llmd_router_scheduler_pod, get_llmd_workload_pods, verify_gateway_status, verify_llm_service_status, verify_singlenode_prefix_cache_routing, ) from simple_logger.logger import get_logger -""" -Test Single-Node Precise Prefix Caching. -... -""" - LOGGER = get_logger(name=__name__)
41-61: Add comment explainingllmd_gatewaydependency and consider instance attribute.
The
llmd_gatewayparameter ensures the gateway fixture runs first. Add a comment for clarity.Storing the token as a class attribute (
TestSingleNodePrecisePrefixCache.auth_token) could cause issues if tests run in parallel withpytest-xdist. Consider using a class-scoped fixture that yields the token instead.@pytest.fixture(scope="class", autouse=True) def setup_auth( self, - llmd_gateway, + llmd_gateway, # Required: ensures gateway is ready before auth setup singlenode_precise_prefix_cache, llmd_s3_service_account, llmisvc_auth_token, llmisvc_auth_view_role, llmisvc_auth_role_binding, ): """Set up authentication for single-node prefix cache test.""" # Create token with RBAC resources using factory fixtures token = llmisvc_auth_token( service_account=llmd_s3_service_account, llmisvc=singlenode_precise_prefix_cache, view_role_factory=llmisvc_auth_view_role, role_binding_factory=llmisvc_auth_role_binding, ) - # Store token as class attribute for use in tests - TestSingleNodePrecisePrefixCache.auth_token = token + # Store on instance - safer if tests run in parallel + self.__class__.auth_token = token
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
tests/model_serving/model_server/llmd/conftest.py(3 hunks)tests/model_serving/model_server/llmd/constants.py(1 hunks)tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py(1 hunks)tests/model_serving/model_server/llmd/utils.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/model_serving/model_server/llmd/utils.py (3)
utilities/constants.py (1)
Protocols(97-104)utilities/exceptions.py (1)
PodContainersRestartError(109-110)utilities/llmd_utils.py (1)
verify_inference_response_llmd(393-454)
tests/model_serving/model_server/llmd/conftest.py (2)
utilities/llmd_utils.py (1)
create_llmisvc(149-341)utilities/constants.py (2)
ResourceLimits(241-263)GPU(252-263)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (2)
tests/model_serving/model_server/llmd/utils.py (5)
get_llmd_router_scheduler_pod(226-251)get_llmd_workload_pods(197-223)verify_gateway_status(28-49)verify_llm_service_status(52-73)verify_singlenode_prefix_cache_routing(326-443)tests/conftest.py (1)
gpu_count_on_cluster(793-814)
🪛 Ruff (0.14.7)
tests/model_serving/model_server/llmd/utils.py
285-285: Do not catch blind exception: Exception
(BLE001)
292-292: Unused function argument: query
(ARG001)
293-293: Unused function argument: timestamp_before
(ARG001)
362-362: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
tests/model_serving/model_server/llmd/conftest.py
347-347: Unused function argument: llmd_s3_secret
(ARG001)
349-349: Unused function argument: llmd_gateway
(ARG001)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
44-44: Unused method argument: llmd_gateway
(ARG002)
🔇 Additional comments (3)
tests/model_serving/model_server/llmd/conftest.py (1)
343-350: Fixture dependencies are intentional for ordering.The
llmd_s3_secretandllmd_gatewayparameters appear unused but are required to ensure proper fixture ordering —llmd_s3_secretmust exist beforellmd_s3_service_accountis used, andllmd_gatewaymust be created before the service.Consider adding brief comments to clarify this intent for future maintainers.
@pytest.fixture(scope="class") def singlenode_precise_prefix_cache( admin_client: DynamicClient, unprivileged_model_namespace: Namespace, - llmd_s3_secret: Secret, + llmd_s3_secret: Secret, # Required: creates S3 secret before service account llmd_s3_service_account: ServiceAccount, - llmd_gateway, + llmd_gateway, # Required: ensures gateway is ready before service creation ) -> Generator[LLMInferenceService, None, None]:tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
63-92: LGTM! Well-structured test with clear preconditions.The test properly:
- Skips when insufficient GPUs are available
- Validates infrastructure readiness before routing tests
- Uses descriptive assertion messages
- Delegates complex routing validation to a helper function
One minor suggestion: the
llmd_gatewayparameter appears unused in the method signature but is likely needed for fixture ordering. Add a brief comment or use_prefix convention.def test_singlenode_precise_prefix_cache( self, unprivileged_client: DynamicClient, - llmd_gateway, + llmd_gateway, # noqa: ARG002 - Required for fixture ordering singlenode_precise_prefix_cache: LLMInferenceService, gpu_count_on_cluster: int, ):tests/model_serving/model_server/llmd/constants.py (1)
1-44: LGTM! Well-organized constants.The configuration constants are clearly defined and appropriately grouped. The liveness probe settings and scheduler configuration follow the expected API patterns. The
PREFIX_CACHE_HASH_SEEDis correctly defined as a string"42", which aligns with vLLM's expectation for hash seed values (read from the PYTHONHASHSEED environment variable).
28f7e15 to
159fd07
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/model_serving/model_server/llmd/conftest.py (1)
343-441: singlenode_precise_prefix_cache fixture: clean overall; drop redundant secret arg and make gateway dependency explicitThe fixture wiring into
create_llmisvclooks solid and matches the existing LLMD patterns. Two small cleanups:
llmd_s3_secretis unused here and redundant becausellmd_s3_service_accountalready depends on it; you can safely drop it from the signature.llmd_gatewayis intentionally unused but forces the gateway fixture to run; it’s worth touching it explicitly so this is clear and to satisfy ARG001.For example:
@pytest.fixture(scope="class") def singlenode_precise_prefix_cache( admin_client: DynamicClient, unprivileged_model_namespace: Namespace, - llmd_s3_secret: Secret, llmd_s3_service_account: ServiceAccount, llmd_gateway, ) -> Generator[LLMInferenceService, None, None]: """LLMInferenceService fixture for single-node precise prefix cache test.""" + # Ensure llmd_gateway fixture is instantiated even though we don't use it directly here + _ = llmd_gateway + with create_llmisvc( client=admin_client,tests/model_serving/model_server/llmd/utils.py (3)
254-287: Broad exception handling in count_chat_completions_requests_in_podCatching a blanket
Exceptionhere hides all failures (including programming errors) and just returns0, which could make debugging routing issues harder.Given this is test-only code and log scraping is inherently flaky, two options:
- Narrow the exception to the known pod/log retrieval errors, or
- Keep the broad catch but explicitly mark it as intentional for linting:
- except Exception as e: - LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}") - return 0 + except Exception as e: # noqa: BLE001 - tolerate log scraping failures in tests + LOGGER.info(f"Failed to count requests for pod {pod.name}: {e}") + return 0
361-363: Rename unused loop variable in Phase 1 for clarityThe loop index isn’t used; renaming it to
_makes that explicit and satisfies linting:- phase1_pods = [] - for i in range(3): + phase1_pods = [] + for _ in range(3):
197-223: Pod selection logic duplicated with verify_llmd_no_failed_pods helper
get_llmd_workload_podsuses the same label selector and filtering as the innerget_llmd_podsinsideverify_llmd_no_failed_podsabove. To keep things DRY and avoid future divergence, consider refactoringverify_llmd_no_failed_podsto call this new helper instead of maintaining two copies of the same logic.No behavior change is required for this PR; this is a good follow-up cleanup target.
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
41-61: Clarify intentional use of llmd_gateway in setup_auth to satisfy lint
llmd_gatewayinsetup_authisn’t referenced, but it’s clearly there to ensure the gateway fixture is instantiated before creating the auth token and service. To make that intent explicit and silence ARG002, you can touch it:def setup_auth( self, llmd_gateway, singlenode_precise_prefix_cache, @@ ): """Set up authentication for single-node prefix cache test.""" - # Create token with RBAC resources using factory fixtures + # Force llmd_gateway fixture instantiation even though we don't use it directly + _ = llmd_gateway + + # Create token with RBAC resources using factory fixtures token = llmisvc_auth_token(
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/model_serving/model_server/llmd/conftest.py(3 hunks)tests/model_serving/model_server/llmd/constants.py(1 hunks)tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py(1 hunks)tests/model_serving/model_server/llmd/utils.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/model_serving/model_server/llmd/constants.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/model_serving/model_server/llmd/utils.py (2)
utilities/constants.py (1)
Protocols(97-104)utilities/llmd_utils.py (1)
verify_inference_response_llmd(393-454)
tests/model_serving/model_server/llmd/conftest.py (3)
utilities/llmd_utils.py (1)
create_llmisvc(149-341)utilities/constants.py (4)
ModelStorage(289-306)ResourceLimits(241-263)GPU(252-263)Timeout(227-238)utilities/llmd_constants.py (2)
ModelStorage(30-37)ModelNames(46-50)
🪛 Ruff (0.14.7)
tests/model_serving/model_server/llmd/utils.py
285-285: Do not catch blind exception: Exception
(BLE001)
292-292: Unused function argument: query
(ARG001)
293-293: Unused function argument: timestamp_before
(ARG001)
362-362: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
tests/model_serving/model_server/llmd/conftest.py
347-347: Unused function argument: llmd_s3_secret
(ARG001)
349-349: Unused function argument: llmd_gateway
(ARG001)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
44-44: Unused method argument: llmd_gateway
(ARG002)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
Outdated
Show resolved
Hide resolved
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
Outdated
Show resolved
Hide resolved
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
Outdated
Show resolved
Hide resolved
159fd07 to
12c2b2f
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
41-61: Consider yielding the token from the fixture instead of using a class attribute.Storing the token as a class attribute (
TestSingleNodePrecisePrefixCache.auth_token) is a less idiomatic pytest pattern. Consider yielding the token from a separate fixture so tests can access it directly via fixture injection. This improves test isolation and follows pytest conventions.Example approach:
@pytest.fixture(scope="class") def prefix_cache_auth_token( llmd_gateway, singlenode_precise_prefix_cache, llmd_s3_service_account, llmisvc_auth_token, llmisvc_auth_view_role, llmisvc_auth_role_binding, ): """Set up authentication and yield token for single-node prefix cache test.""" token = llmisvc_auth_token( service_account=llmd_s3_service_account, llmisvc=singlenode_precise_prefix_cache, view_role_factory=llmisvc_auth_view_role, role_binding_factory=llmisvc_auth_role_binding, ) yield tokenThen inject
prefix_cache_auth_tokendirectly into the test method.tests/model_serving/model_server/llmd/utils.py (1)
301-301: Fixtime.sleepkeyword argument - will causeTypeError.
time.sleep()does not accept keyword arguments. This will raiseTypeError: sleep() takes no keyword argumentsat runtime.- time.sleep(secs=5) + time.sleep(5) # Allow logs to flush before counting requests
🧹 Nitpick comments (4)
tests/model_serving/model_server/llmd/conftest.py (1)
161-167: Consider reusingLLMD_LIVENESS_PROBEconstant for consistency.This inline liveness probe definition is identical to the new
LLMD_LIVENESS_PROBEconstant inconstants.py. Consider using the constant here to reduce duplication and ensure consistency.+from tests.model_serving.model_server.llmd.constants import LLMD_LIVENESS_PROBE ... - liveness_probe = { - "httpGet": {"path": "/health", "port": 8000, "scheme": "HTTPS"}, - "initialDelaySeconds": 120, - "periodSeconds": 30, - "timeoutSeconds": 30, - "failureThreshold": 5, - } + liveness_probe = LLMD_LIVENESS_PROBEtests/model_serving/model_server/llmd/utils.py (3)
285-287: Consider catching more specific exceptions.While the broad exception handling provides resilience for this log-counting helper, consider catching more specific exceptions (e.g.,
kubernetes.client.exceptions.ApiException) to avoid masking unexpected errors during debugging.
355-355: Rename unused loop variable.The loop variable
iis not used. Rename to_to indicate intentional non-use.- for i in range(3): + for _ in range(3):
112-126: Refactor to useget_llmd_workload_podsutility.The internal
get_llmd_pods()function duplicates the logic of the newget_llmd_workload_pods()utility (lines 197-223). Consider refactoring to use the shared utility.def get_llmd_pods(): """Get LLMD workload pods for this LLMInferenceService.""" - pods = [] - for pod in Pod.get( - dyn_client=client, - namespace=llm_service.namespace, - label_selector=( - f"{Pod.ApiGroup.APP_KUBERNETES_IO}/part-of=llminferenceservice," - f"{Pod.ApiGroup.APP_KUBERNETES_IO}/name={llm_service.name}" - ), - ): - labels = pod.instance.metadata.get("labels", {}) - if labels.get("kserve.io/component") == "workload": - pods.append(pod) - return pods + return get_llmd_workload_pods(client=client, llmisvc=llm_service)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/model_serving/model_server/llmd/conftest.py(3 hunks)tests/model_serving/model_server/llmd/constants.py(1 hunks)tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py(1 hunks)tests/model_serving/model_server/llmd/utils.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/model_serving/model_server/llmd/utils.py (4)
utilities/constants.py (1)
Protocols(97-104)utilities/exceptions.py (1)
PodContainersRestartError(109-110)utilities/llmd_utils.py (1)
verify_inference_response_llmd(393-454)tests/model_serving/conftest.py (1)
protocol(26-36)
tests/model_serving/model_server/llmd/conftest.py (1)
utilities/llmd_utils.py (1)
create_llmisvc(149-341)
🪛 Ruff (0.14.7)
tests/model_serving/model_server/llmd/utils.py
285-285: Do not catch blind exception: Exception
(BLE001)
355-355: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
tests/model_serving/model_server/llmd/conftest.py
347-347: Unused function argument: llmd_s3_secret
(ARG001)
349-349: Unused function argument: llmd_gateway
(ARG001)
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py
44-44: Unused method argument: llmd_gateway
(ARG002)
🔇 Additional comments (4)
tests/model_serving/model_server/llmd/conftest.py (1)
343-350: Fixture dependency parameters are correctly used for ordering.The
llmd_s3_secretandllmd_gatewayparameters are intentionally declared to ensure proper fixture ordering/dependency resolution even though they're not directly referenced in the function body. This is a valid pytest pattern. Consider adding a brief comment to clarify this intent for future maintainers.tests/model_serving/model_server/llmd/constants.py (1)
1-44: Well-organized constants module.The constants are clearly documented and properly structured. The scheduler configuration correctly references the prefix cache parameters for consistency.
tests/model_serving/model_server/llmd/test_singlenode_precise_prefix_cache.py (1)
63-94: Integration test structure is acceptable, though could be modularized.While a previous reviewer suggested splitting into smaller tests, the current structure is reasonable for an end-to-end integration test where phases are sequential and interdependent. The assertions at each step provide clear failure points.
tests/model_serving/model_server/llmd/utils.py (1)
319-439: Well-structured routing verification with clear test phases.The function properly validates prefix cache routing behavior:
- Phase 1 verifies repeated identical prompts route to the same pod (full cache hit)
- Phase 2 verifies prompts with shared prefixes route together (partial cache hit)
The baseline counting approach for request attribution is reasonable given the log-based detection method.
12c2b2f to
3c86b86
Compare
c1c2d9a to
4d58233
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/model_serving/model_server/llmd/utils.py (1)
401-402: Replacetime.sleep()withTimeoutSamplerand remove invalidnoqadirective.A past reviewer requested replacing the fixed sleep with
TimeoutSamplerto poll for metrics availability. Thenoqa: FCN001directive is also invalid and should be removed.Based on past review comments, use
TimeoutSamplerto poll Prometheus until metrics are available rather than blindly sleeping:- # Wait for Prometheus to scrape metrics - LOGGER.info("Waiting 30 seconds for Prometheus to scrape metrics") - time.sleep(30) # noqa: FCN001 + # Poll Prometheus for metrics availability + LOGGER.info("Waiting for Prometheus to scrape metrics") + for _ in TimeoutSampler( + wait_timeout=60, + sleep=5, + func=lambda: get_metrics_request_count_per_pod(prometheus, llmisvc, workload_pods), + exceptions=(Exception,), + ): + # Verify we have non-zero metrics before proceeding + if sum(get_metrics_request_count_per_pod(prometheus, llmisvc, workload_pods).values()) > 0: + break
🧹 Nitpick comments (1)
tests/model_serving/model_server/llmd/utils.py (1)
419-419: Prefernext()over list comprehension with index.Using
next()is more idiomatic and provides clearer error handling if no match is found.Apply this diff:
- active_pod = [name for name, count in pods_request_counts.items() if count == expected_requests][0] + active_pod = next(name for name, count in pods_request_counts.items() if count == expected_requests)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/model_serving/model_server/llmd/utils.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/model_serving/model_server/llmd/utils.py (6)
tests/conftest.py (1)
prometheus(665-673)utilities/constants.py (1)
Protocols(97-104)utilities/exceptions.py (1)
PodContainersRestartError(109-110)utilities/llmd_utils.py (1)
verify_inference_response_llmd(393-454)utilities/monitoring.py (1)
get_metrics_value(10-24)tests/model_serving/conftest.py (1)
protocol(26-36)
🪛 Flake8 (7.3.0)
tests/model_serving/model_server/llmd/utils.py
[error] 312-312: undefined name 'i'
(F821)
🪛 Ruff (0.14.8)
tests/model_serving/model_server/llmd/utils.py
311-311: Do not catch blind exception: Exception
(BLE001)
312-312: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
312-312: Undefined name i
(F821)
402-402: Unused noqa directive (unknown: FCN001)
Remove unused noqa directive
(RUF100)
419-419: Prefer next(...) over single element slice
Replace with next(...)
(RUF015)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/model_serving/model_server/llmd/utils.py (1)
254-317: Tighten exception handling and log with traceback in prefix‑cache request loopThe loop does the right thing in terms of driving repeated requests, but the
except Exception as e+LOGGER.error(...)pattern is both noisy for static analysis and hides the traceback that would be useful when debugging failures.Given
verify_inference_response_llmddocuments raisingInferenceResponseErrorandValueError, you can narrow the catch and useLOGGER.exception:-from utilities.exceptions import PodContainersRestartError +from utilities.exceptions import PodContainersRestartError, InferenceResponseError- try: - verify_inference_response_llmd( - llm_service=llmisvc, - inference_config=inference_config, - inference_type="chat_completions", - protocol=Protocols.HTTPS, - use_default_query=True, - insecure=False, - model_name=llmisvc.instance.spec.model.name, - token=token, - authorized_user=True, - ) - successful_requests += 1 - except Exception as e: - LOGGER.error(f"Request {index + 1} failed: {e}") - failed_requests += 1 + try: + verify_inference_response_llmd( + llm_service=llmisvc, + inference_config=inference_config, + inference_type="chat_completions", + protocol=Protocols.HTTPS, + use_default_query=True, + insecure=False, + model_name=llmisvc.instance.spec.model.name, + token=token, + authorized_user=True, + ) + successful_requests += 1 + except (InferenceResponseError, ValueError) as exc: + LOGGER.exception(f"Request {index + 1} failed validation: {exc}") + failed_requests += 1This keeps the “keep going and count failures” behavior, but avoids a blind catch‑all and gives you stack traces in logs, which should also satisfy the BLE001/TRY400 lints.
🧹 Nitpick comments (1)
tests/model_serving/model_server/llmd/utils.py (1)
197-251: Workload/router pod discovery helpers look correct; consider de‑duplicating label selector logicThe new
get_llmd_workload_podsandget_llmd_router_scheduler_podhelpers use selectors consistent with the innerget_llmd_podsinverify_llmd_no_failed_pods, and the label keys/values look right for workload vs router-scheduler pods.To avoid drift if labels change later, consider reusing these helpers (or a shared selector builder) inside
verify_llmd_no_failed_podsinstead of duplicating the selector and workload filter logic.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/model_serving/model_server/llmd/utils.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/model_serving/model_server/llmd/utils.py (4)
utilities/constants.py (1)
Protocols(97-104)utilities/exceptions.py (1)
PodContainersRestartError(109-110)utilities/llmd_utils.py (1)
verify_inference_response_llmd(393-454)utilities/monitoring.py (1)
get_metrics_value(10-24)
🪛 Ruff (0.14.8)
tests/model_serving/model_server/llmd/utils.py
310-310: Do not catch blind exception: Exception
(BLE001)
311-311: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
417-417: Prefer next(...) over single element slice
Replace with next(...)
(RUF015)
|
Status of building tag latest: success. |
Description
Add a new test case to cover the Single-node Estimated Prefix Cache scenario, check the comments #907 (comment)
Jira issue tracker: https://issues.redhat.com/browse/RHOAIENG-34594
How Has This Been Tested?
Run the test locally vs a cluster with GPU (tesla t4)
Merge criteria:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.