Skip to content

Feature: add llmd test and lib as foundation of llmd test #612

Merged
mwaykole merged 15 commits intoopendatahub-io:mainfrom
mwaykole:llmd
Sep 23, 2025
Merged

Feature: add llmd test and lib as foundation of llmd test #612
mwaykole merged 15 commits intoopendatahub-io:mainfrom
mwaykole:llmd

Conversation

@mwaykole
Copy link
Copy Markdown
Member

@mwaykole mwaykole commented Sep 15, 2025

Summary by CodeRabbit

  • New Features

    • End-to-end LLMD tooling to create/manage gateways and LLM inference services, resolve endpoints, run inferences, and validate responses.
  • Tests

    • Added CPU-based LLMD end-to-end tests for OCI and S3 storage, reusable fixtures, readiness/inference checks, pod restart validation, and a new llmd_cpu pytest marker.
  • Documentation

    • Added overview docstring describing LLMD test coverage.
  • Chores

    • Centralized LLMD defaults for gateways, images, storage, resources, and timeouts.

@mwaykole mwaykole requested review from a team and Raghul-M as code owners September 15, 2025 16:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 15, 2025

📝 Walkthrough

Walkthrough

Adds a new LLMD (LLM Deployment) test suite and supporting utilities: pytest fixtures, readiness verifiers, end-to-end tests for OCI and S3 storage, deployment/inference helpers, centralized constants, and an OPT125M CPU inference manifest.

Changes

Cohort / File(s) Summary
LLMD test package
tests/model_serving/model_server/llmd/__init__.py
Adds a module docstring describing the LLMD test package.
LLMD pytest fixtures
tests/model_serving/model_server/llmd/conftest.py
Adds class-scoped pytest fixtures: gateway_namespace, llmd_s3_secret, llmd_s3_service_account, llmd_gateway, llmd_inference_service, and llmd_inference_service_s3 to create/manage gateways and LLMInferenceService instances (creation, wait, yield, teardown semantics).
LLMD test helpers
tests/model_serving/model_server/llmd/utils.py
Adds verification helpers: verify_gateway_status, verify_llm_service_status, and verify_llmd_pods_not_restarted (raises on detected container restarts).
LLMD tests (OCI CPU)
tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py
Adds TestLLMDOCICPUInference with a test that verifies gateway/service readiness, runs inference using OPT125M_CPU_INFERENCE_CONFIG (chat_completions HTTP), and asserts pods did not restart; marked llmd_cpu.
LLMD tests (S3)
tests/model_serving/model_server/llmd/test_llmd_s3.py
Adds TestLLMDS3Inference with a test that verifies gateway/service readiness, runs inference against S3-backed service using OPT125M_CPU_INFERENCE_CONFIG, requires AWS config, and asserts pods did not restart; marked llmd_cpu.
LLMD constants
utilities/llmd_constants.py
Adds centralized constants for LLMD defaults (gateway names/namespaces, endpoints, timeouts, vLLM OCI storage and CPU image, default resources, BASIC_LLMD_PARAMS, etc.).
LLMD utilities
utilities/llmd_utils.py
Adds deployment and inference tooling: create_llmd_gateway, create_llmisvc context managers, get_llm_inference_url, verify_inference_response_llmd, LLMUserInference class, and related validation helpers and response format validator.
Inference manifest
utilities/manifests/opt125m_cpu.py
Adds OPT125M_CPU_INFERENCE_CONFIG dict with default_query_model, chat_completions, and completions HTTP templates and response mapping.
Pytest config
pytest.ini
Adds pytest marker llmd_cpu to label tests requiring LLMD CPU resources.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: add llmd test cpu test and lib " is related to the changeset: the PR adds LLMD tests, utilities, constants, and manifests, so it communicates the main intent. The phrasing is slightly redundant ("test cpu test"), has extra whitespace, and could be clearer, but it is not misleading or generic.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/hold', '/verified', '/lgtm', '/cherry-pick', '/build-push-pr-image', '/wip'}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/model_serving/model_server/llmd/conftest.py (1)

128-176: S3 model URI likely points to the wrong bucket (hard‑coded in create_llmisvc).
create_llmisvc builds s3://ods-ci-wisdom/{storage_path} when storage_key+storage_path are provided, ignoring the actual models_s3_bucket_name set in your secret. This can 404 models in non‑ods-ci buckets. Prefer passing an explicit storage_uri built from models_s3_bucket_name and storage_path, and do not provide storage_key/storage_path to avoid the hard‑coded path.

 @pytest.fixture(scope="class")
 def llmd_inference_service_s3(
     request: FixtureRequest,
     admin_client: DynamicClient,
     unprivileged_model_namespace: Namespace,
     llmd_s3_secret: Secret,
-    llmd_s3_service_account: ServiceAccount,
+    llmd_s3_service_account: ServiceAccount,
+    models_s3_bucket_name: str,
 ) -> Generator[LLMInferenceService, None, None]:
@@
-    if "storage_key" not in kwargs:
-        kwargs["storage_key"] = llmd_s3_secret.name
-
-    if "storage_path" not in kwargs:
-        kwargs["storage_path"] = DEFAULT_S3_STORAGE_PATH
+    # Resolve storage_path and construct an explicit S3 URI using the configured bucket
+    storage_path = kwargs.get("storage_path", DEFAULT_S3_STORAGE_PATH)
+    storage_uri = f"s3://{models_s3_bucket_name}/{storage_path}"
@@
     with create_llmisvc(
         client=admin_client,
         name=service_name,
         namespace=unprivileged_model_namespace.name,
-        storage_key=kwargs.get("storage_key"),
-        storage_path=kwargs.get("storage_path"),
+        storage_uri=storage_uri,
         container_image=kwargs.get("container_image", VLLM_CPU_IMAGE),
         container_resources=container_resources,
         service_account=llmd_s3_service_account.name,
         wait=True,
         timeout=Timeout.TIMEOUT_15MIN,
         **{
             k: v
             for k, v in kwargs.items()
             if k not in ["name", "storage_key", "storage_path", "container_image", "container_resources"]
         },
     ) as llm_service:
         yield llm_service

If you prefer fixing this at the source, I can propose a patch to create_llmisvc to derive the bucket from the provided secret or accept an explicit bucket parameter.

🧹 Nitpick comments (13)
utilities/manifests/opt125m_cpu.py (2)

9-9: Add leading slash to endpoints for URL-join robustness.
Safer when base URLs are concatenated.

-            "endpoint": "v1/chat/completions",
+            "endpoint": "/v1/chat/completions",
...
-            "endpoint": "v1/completions",
+            "endpoint": "/v1/completions",

Also applies to: 19-19


2-6: Strengthen default_query_model to avoid tautological match.
r'.*' reduces test signal; consider a light regex that checks for 100°C/212°F or “boil/boiling”.

tests/model_serving/model_server/llmd/utils.py (2)

30-37: Defensive condition parsing to avoid KeyError/None access.
Protect against missing status/keys.

-    conditions = gateway.instance.status.get("conditions", [])
-    for condition in conditions:
-        if condition["type"] == "Programmed" and condition["status"] == "True":
+    status = getattr(gateway.instance, "status", {}) or {}
+    conditions = status.get("conditions") or []
+    for condition in conditions:
+        if condition.get("type") == "Programmed" and condition.get("status") == "True":
             LOGGER.info(f"Gateway {gateway.name} is programmed and ready")
             return True

54-61: Same defensive parsing for LLMInferenceService conditions.

-    conditions = llm_service.instance.status.get("conditions", [])
-    for condition in conditions:
-        if condition["type"] == "Ready" and condition["status"] == "True":
+    status = getattr(llm_service.instance, "status", {}) or {}
+    conditions = status.get("conditions") or []
+    for condition in conditions:
+        if condition.get("type") == "Ready" and condition.get("status") == "True":
             LOGGER.info(f"LLMInferenceService {llm_service.name} is ready")
             return True
tests/model_serving/model_server/llmd/conftest.py (2)

23-26: Silence unused admin_client argument in gateway_namespace.
Keeps fixture concise and linter‑clean.

-@pytest.fixture(scope="class")
-def gateway_namespace(admin_client: DynamicClient) -> str:
-    return DEFAULT_GATEWAY_NAMESPACE
+@pytest.fixture(scope="class")
+def gateway_namespace(_: DynamicClient) -> str:
+    return DEFAULT_GATEWAY_NAMESPACE

105-112: Default resources may be high for shared CI clusters.
Consider making memory/CPU defaults overridable via env/pytest options to reduce scheduling flakiness.

Also applies to: 150-156

utilities/llmd_constants.py (2)

17-23: Avoid “latest” and personal registries; pin immutable image digests.

  • DEFAULT_CONTAINER_IMAGE points to quay.io/pierdipi/vllm-cpu:latest, and VLLM_STORAGE_OCI references a personal namespace. For reproducibility and supply‑chain hygiene, pin digests or versioned tags and move off personal repos if possible.

Example (illustrative):

-VLLM_CPU_IMAGE = "quay.io/pierdipi/vllm-cpu:latest"
+VLLM_CPU_IMAGE = "quay.io/org/vllm-cpu@sha256:<digest>"
-VLLM_STORAGE_OCI = "oci://quay.io/mwaykole/test:opt-125m"
+VLLM_STORAGE_OCI = "oci://quay.io/org/opt-125m:<version>"

9-10: Use these constants in utilities/llmd_utils.py to avoid drift.

KSERVE_GATEWAY_LABEL and KSERVE_INGRESS_GATEWAY are defined here but llmd_utils hard‑codes the same strings. Replace the hard‑coded values to keep things in sync.

I’ve provided a targeted diff in the llmd_utils review.

Also applies to: 22-23

utilities/llmd_utils.py (5)

21-27: De‑duplicate KSERVE labels; use shared constants.

Hard‑coded label keys/values will drift from utilities/llmd_constants.py. Import and use KSERVE_GATEWAY_LABEL and KSERVE_INGRESS_GATEWAY.

Apply:

@@
-from utilities.llmd_constants import (
+from utilities.llmd_constants import (
     DEFAULT_GATEWAY_NAME,
     DEFAULT_GATEWAY_NAMESPACE,
     OPENSHIFT_DEFAULT_GATEWAY_CLASS,
     DEFAULT_LLM_ENDPOINT,
     DEFAULT_TIMEOUT,
+    KSERVE_GATEWAY_LABEL,
+    KSERVE_INGRESS_GATEWAY,
 )
@@
-    if infrastructure is None:
-        infrastructure = {"labels": {"serving.kserve.io/gateway": "kserve-ingress-gateway"}}
+    if infrastructure is None:
+        infrastructure = {"labels": {KSERVE_GATEWAY_LABEL: KSERVE_INGRESS_GATEWAY}}

Also applies to: 71-73


83-85: Over‑broad exception catch.

Catching Exception hides real errors. Prefer specific exceptions (e.g., ApiException/ResourceNotFoundError) or re‑raise after logging.


355-357: Remove unused parameter from _validate_unauthorized_response().

Drop inference arg and adjust callsite.

Apply:

-    if authorized_user is False:
-        _validate_unauthorized_response(res, token, inference)
+    if authorized_user is False:
+        _validate_unauthorized_response(res, token)
-def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str], inference: LLMUserInference) -> None:
+def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str]) -> None:

Also applies to: 536-536


298-300: Over‑broad exception catches degrade diagnostics.

These wide catches hide TLS/CA resolution issues. Narrow them (ValueError, ImportError) or log and re‑raise.

Also applies to: 488-490


1-16: Minor: consolidate imports to what’s used.

Optional tidy‑up only if your linter flags it. No behavior change.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a853ac0 and cdd0a82.

📒 Files selected for processing (8)
  • tests/model_serving/model_server/llmd/__init__.py (1 hunks)
  • tests/model_serving/model_server/llmd/conftest.py (1 hunks)
  • tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py (1 hunks)
  • tests/model_serving/model_server/llmd/test_llmd_s3.py (1 hunks)
  • tests/model_serving/model_server/llmd/utils.py (1 hunks)
  • utilities/llmd_constants.py (1 hunks)
  • utilities/llmd_utils.py (1 hunks)
  • utilities/manifests/opt125m_cpu.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T10:05:17.642Z
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.

Applied to files:

  • tests/model_serving/model_server/llmd/conftest.py
🧬 Code graph analysis (5)
tests/model_serving/model_server/llmd/conftest.py (4)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/infra.py (1)
  • s3_endpoint_secret (282-348)
utilities/llmd_utils.py (2)
  • create_gateway (33-114)
  • create_llmisvc (118-262)
tests/conftest.py (7)
  • admin_client (68-69)
  • unprivileged_model_namespace (459-466)
  • aws_access_key_id (124-131)
  • aws_secret_access_key (135-142)
  • models_s3_bucket_name (243-250)
  • models_s3_bucket_region (254-261)
  • models_s3_bucket_endpoint (265-272)
utilities/llmd_constants.py (1)
utilities/constants.py (1)
  • Timeout (213-224)
tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py (5)
tests/model_serving/model_server/llmd/utils.py (2)
  • verify_llm_service_status (40-61)
  • verify_gateway_status (16-37)
utilities/constants.py (1)
  • Protocols (92-99)
utilities/llmd_utils.py (1)
  • verify_inference_response_llmd (305-366)
tests/model_serving/model_server/llmd/conftest.py (2)
  • llmd_gateway (65-84)
  • llmd_inference_service (88-124)
tests/model_serving/conftest.py (1)
  • protocol (26-36)
tests/model_serving/model_server/llmd/test_llmd_s3.py (4)
tests/model_serving/model_server/llmd/utils.py (2)
  • verify_llm_service_status (40-61)
  • verify_gateway_status (16-37)
utilities/constants.py (1)
  • Protocols (92-99)
utilities/llmd_utils.py (1)
  • verify_inference_response_llmd (305-366)
tests/model_serving/model_server/llmd/conftest.py (2)
  • llmd_gateway (65-84)
  • llmd_inference_service_s3 (128-175)
utilities/llmd_utils.py (4)
utilities/certificates_utils.py (1)
  • get_ca_bundle (69-104)
utilities/constants.py (2)
  • HTTPRequest (112-115)
  • Timeout (213-224)
utilities/exceptions.py (1)
  • InferenceResponseError (24-25)
utilities/infra.py (1)
  • get_services_by_isvc_label (504-533)
🪛 Ruff (0.12.2)
tests/model_serving/model_server/llmd/conftest.py

24-24: Unused function argument: admin_client

(ARG001)

utilities/llmd_utils.py

83-83: Do not catch blind exception: Exception

(BLE001)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


195-195: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Avoid specifying long messages outside the exception class

(TRY003)


298-298: Do not catch blind exception: Exception

(BLE001)


397-397: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


424-424: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


488-488: Do not catch blind exception: Exception

(BLE001)


515-515: Avoid specifying long messages outside the exception class

(TRY003)


536-536: Unused function argument: inference

(ARG001)


554-554: Avoid specifying long messages outside the exception class

(TRY003)


575-575: Avoid specifying long messages outside the exception class

(TRY003)


580-580: Avoid specifying long messages outside the exception class

(TRY003)


587-587: Avoid specifying long messages outside the exception class

(TRY003)


628-631: Avoid specifying long messages outside the exception class

(TRY003)


633-633: Avoid specifying long messages outside the exception class

(TRY003)


639-639: Avoid specifying long messages outside the exception class

(TRY003)


642-642: Avoid specifying long messages outside the exception class

(TRY003)


645-645: Avoid specifying long messages outside the exception class

(TRY003)


649-649: Avoid specifying long messages outside the exception class

(TRY003)


653-653: Avoid specifying long messages outside the exception class

(TRY003)


656-656: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
tests/model_serving/model_server/llmd/__init__.py (1)

1-8: LGTM – clear package docstring.
No issues.

tests/model_serving/model_server/llmd/test_llmd_s3.py (2)

23-34: Test flow looks good.
Readiness checks before inference and HTTP path selection via Protocols are sound.


14-18: create_ns supports an explicit name — resolved.
utilities/infra.py defines create_ns(..., name: str | None = None), so the unprivileged_model_namespace name parameter is effective and not a no-op.

tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py (1)

15-34: LGTM – OCI CPU E2E structure is solid.
Good gating on readiness and reuse of common inference config.

utilities/llmd_constants.py (1)

25-28: Double‑check default CPU/memory footprints.

Requests 100m CPU with 8Gi memory and limit 10Gi may be unschedulable on small CI clusters. Confirm these are intentional defaults for “CPU tests.”

utilities/llmd_utils.py (2)

636-659: Good, reusable response shape validator.

The checks align with OpenAI chat completions format and provide actionable errors.


180-189: Fix storage_key usage and parameterize S3 bucket

  • storage_key is accepted but unused; URI is hard-coded to s3://ods-ci-wisdom/{storage_path} — makes the helper non‑portable. (utilities/llmd_utils.py:180-189)
  • Either wire storage_key into the model spec (per the LLMInferenceService CRD) and make the bucket configurable, or remove storage_key from the signature/docs.

Will patch after you confirm the expected CRD fields (bucket/uri field and whether credentials come from a secret name or a service account).

mwaykole and others added 3 commits September 16, 2025 12:58
Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
utilities/llmd_utils.py (5)

469-475: Curl header quoting bug splits args; keep header as‑is.

-        header = HTTPRequest.CONTENT_JSON.replace("-H ", "").replace("'", "")
+        header = HTTPRequest.CONTENT_JSON
         cmd_exec = "curl -i -s"
-        cmd = f"{cmd_exec} -X POST -d '{body}' -H {header} -H 'Accept: application/json'"
+        cmd = f"{cmd_exec} -X POST -d '{body}' {header} -H 'Accept: application/json'"

482-489: Wrong deployment_mode forces --insecure; use “raw-deployment”.

-                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw")
+                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw-deployment")

Also replace blind except Exception with a narrower exception if possible.


512-516: Do not leak Bearer tokens in errors; redact before raising.

-        if res:
-            return out
-        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {cmd}")
+        if res:
+            return out
+        redacted_cmd = re.sub(r'(-H\s+["\']Authorization:\s*Bearer\s+)\S+(["\'])', r'\1<redacted>\2', cmd)
+        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {redacted_cmd}")

526-534: Parse HTTP body and JSON; validators expect structured payload.

-        return {"output": out}
+        # Extract last HTTP body and parse JSON if possible
+        body = out.split("\r\n\r\n")[-1]
+        try:
+            payload = json.loads(body)
+        except json.JSONDecodeError:
+            payload = None
+        return {"output": out, "payload": payload}

566-633: Validate against parsed payload (OpenAI‑style), not raw text.

-def _validate_authorized_response(
+def _validate_authorized_response(
     res: Dict[str, Any],
     inference: LLMUserInference,
     inference_config: Dict[str, Any],
     inference_type: str,
     expected_response_text: Optional[str],
     use_default_query: bool,
     model_name: str,
 ) -> None:
-    """Validate response for authorized users."""
+    """Validate response for authorized users (uses parsed JSON payload when available)."""
@@
-    if inference.inference_response_text_key_name:
-        if inference_type == inference.STREAMING:
-            if output := re.findall(
-                rf"{inference.inference_response_text_key_name}\": \"(.*)\"",
-                res[inference.inference_response_key_name],
-                re.MULTILINE,
-            ):
-                assert "".join(output) == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {output}"
-                )
-        elif inference_type == inference.INFER or use_regex:
-            formatted_res = json.dumps(res[inference.inference_response_text_key_name]).replace(" ", "")
-            if use_regex and expected_response_text is not None:
-                assert re.search(expected_response_text, formatted_res), (
-                    f"Expected: {expected_response_text} not found in: {formatted_res}"
-                )
-            else:
-                formatted_res = json.dumps(res[inference.inference_response_key_name]).replace(" ", "")
-                assert formatted_res == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match output: {formatted_res}"
-                )
-        else:
-            response = res[inference.inference_response_key_name]
-            if isinstance(response, list):
-                response = response[0]
-
-            if isinstance(response, dict):
-                response_text = response[inference.inference_response_text_key_name]
-                assert response_text == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {response_text}"
-                )
-            else:
-                raise InferenceResponseError(
-                    "Inference response output does not match expected output format."
-                    f"Expected: {expected_response_text}.\nResponse: {res}"
-                )
-    else:
-        raise InferenceResponseError(f"Inference response output not found in response. Response: {res}")
+    payload = res.get("payload")
+    if not isinstance(payload, dict):
+        raise InferenceResponseError("Failed to parse JSON payload from response.")
+    validate_llm_response_format(payload)
+    response_text = payload["choices"][0]["message"]["content"]
+    if use_regex:
+        assert re.search(str(expected_response_text), response_text), (
+            f"Expected pattern not found. Pattern: {expected_response_text}"
+        )
+    else:
+        assert response_text == expected_response_text, (
+            f"Expected: {expected_response_text} does not match: {response_text}"
+        )
🧹 Nitpick comments (6)
tests/model_serving/model_server/llmd/conftest.py (4)

24-25: Unused fixture arg: rename to underscore to silence Ruff.

admin_client isn’t used here. Rename to _admin_client (or drop it) to avoid ARG001.

-@pytest.fixture(scope="class")
-def gateway_namespace(admin_client: DynamicClient) -> str:
+@pytest.fixture(scope="class")
+def gateway_namespace(_admin_client: DynamicClient) -> str:
     return DEFAULT_GATEWAY_NAMESPACE

70-76: Guard access to request.param (fixture may be unparametrized).

Accessing request.param without a default raises when the fixture isn’t parametrized.

-    if isinstance(request.param, str):
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
-        gateway_class_name = request.param
+        gateway_class_name = param
         kwargs = {}
     else:
-        gateway_class_name = request.param.get("gateway_class_name", "openshift-default")
-        kwargs = {k: v for k, v in request.param.items() if k != "gateway_class_name"}
+        param = param or {}
+        gateway_class_name = param.get("gateway_class_name", "openshift-default")
+        kwargs = {k: v for k, v in param.items() if k != "gateway_class_name"}

93-101: Same: don’t assume parametrization in llmd_inference_service.

-    if isinstance(request.param, str):
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
-        name_suffix = request.param
+        name_suffix = param
         kwargs = {}
     else:
-        name_suffix = request.param.get("name_suffix", "basic")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
+        param = param or {}
+        name_suffix = param.get("name_suffix", "basic")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}

126-140: Same: guard request.param in S3 fixture.

-    if isinstance(request.param, str):
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
-        name_suffix = request.param
+        name_suffix = param
         kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
     else:
-        name_suffix = request.param.get("name_suffix", "s3")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
+        param = param or {}
+        name_suffix = param.get("name_suffix", "s3")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}
utilities/llmd_utils.py (2)

536-547: Unused arg in unauthorized validator; rename to underscore.

-def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str], inference: LLMUserInference) -> None:
+def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str], _inference: LLMUserInference) -> None:

Also consider short‑circuiting on HTTP 401/403 status in headers if present.


73-85: Broad exception catches; narrow where feasible.

Catching Exception hides actionable failures. Prefer specific exceptions from ocp_resources/requests or re‑raise after logging.

Also applies to: 298-300, 488-489

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdd0a82 and 41c26e8.

📒 Files selected for processing (3)
  • pytest.ini (1 hunks)
  • tests/model_serving/model_server/llmd/conftest.py (1 hunks)
  • utilities/llmd_utils.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pytest.ini
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T10:05:17.642Z
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.

Applied to files:

  • tests/model_serving/model_server/llmd/conftest.py
🧬 Code graph analysis (2)
tests/model_serving/model_server/llmd/conftest.py (4)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/infra.py (1)
  • s3_endpoint_secret (282-348)
utilities/llmd_utils.py (2)
  • create_gateway (33-114)
  • create_llmisvc (118-262)
tests/conftest.py (7)
  • admin_client (68-69)
  • unprivileged_model_namespace (459-466)
  • aws_access_key_id (124-131)
  • aws_secret_access_key (135-142)
  • models_s3_bucket_name (243-250)
  • models_s3_bucket_region (254-261)
  • models_s3_bucket_endpoint (265-272)
utilities/llmd_utils.py (4)
utilities/certificates_utils.py (1)
  • get_ca_bundle (69-104)
utilities/constants.py (2)
  • HTTPRequest (112-115)
  • Timeout (213-224)
utilities/exceptions.py (1)
  • InferenceResponseError (24-25)
utilities/infra.py (1)
  • get_services_by_isvc_label (504-533)
🪛 Ruff (0.12.2)
tests/model_serving/model_server/llmd/conftest.py

24-24: Unused function argument: admin_client

(ARG001)

utilities/llmd_utils.py

83-83: Do not catch blind exception: Exception

(BLE001)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


195-195: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Avoid specifying long messages outside the exception class

(TRY003)


298-298: Do not catch blind exception: Exception

(BLE001)


397-397: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


424-424: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


488-488: Do not catch blind exception: Exception

(BLE001)


515-515: Avoid specifying long messages outside the exception class

(TRY003)


536-536: Unused function argument: inference

(ARG001)


554-554: Avoid specifying long messages outside the exception class

(TRY003)


575-575: Avoid specifying long messages outside the exception class

(TRY003)


580-580: Avoid specifying long messages outside the exception class

(TRY003)


587-587: Avoid specifying long messages outside the exception class

(TRY003)


627-630: Avoid specifying long messages outside the exception class

(TRY003)


632-632: Avoid specifying long messages outside the exception class

(TRY003)


638-638: Avoid specifying long messages outside the exception class

(TRY003)


641-641: Avoid specifying long messages outside the exception class

(TRY003)


644-644: Avoid specifying long messages outside the exception class

(TRY003)


648-648: Avoid specifying long messages outside the exception class

(TRY003)


652-652: Avoid specifying long messages outside the exception class

(TRY003)


655-655: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
tests/model_serving/model_server/llmd/conftest.py (2)

30-47: Incorrect — no variable mismatch in models_s3_bucket_endpoint fixture.
The fixture in tests/conftest.py (def models_s3_bucket_endpoint, lines ~265–272) assigns models_bucket_endpoint = pytestconfig.option.models_s3_bucket_endpoint and returns models_bucket_endpoint; no undefined name is referenced and the S3 fixture should resolve correctly.

Likely an incorrect or invalid review comment.


55-61: Confirm ServiceAccount "secrets" usage is intentional.

Attaching the S3 Secret to the SA here (tests/model_serving/model_server/llmd/conftest.py:55–61) is uncommon—controllers normally read annotated S3 secrets. The repo already passes secrets to SAs in several fixtures (e.g. tests/model_serving/model_server/upgrade/conftest.py, tests/model_serving/model_server/storage/minio/conftest.py). Confirm LLMD pods actually require the secret mounted on the ServiceAccount; if not, remove the secrets=[...] to avoid unnecessary access.

Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
tests/model_serving/model_server/llmd/conftest.py (3)

24-26: Fix unused fixture arg (Ruff ARG001).

admin_client isn’t used. Rename to _admin_client (keeps dependency ordering) or drop it.

 @pytest.fixture(scope="class")
-def gateway_namespace(admin_client: DynamicClient) -> str:
+def gateway_namespace(_admin_client: DynamicClient) -> str:
     return DEFAULT_GATEWAY_NAMESPACE

55-61: ServiceAccount linkage: verify necessity of secrets field.

Mounting S3 creds is already driven by storage_key on the service. The SA secrets entry may be redundant. Keep if your controller requires it; otherwise, consider dropping to reduce coupling.


149-155: Configurable resource defaults.

10Gi/8Gi for CPU tests is hefty. Consider sourcing from a constant or CLI option to ease running in constrained clusters.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41c26e8 and 4b8630f.

📒 Files selected for processing (1)
  • tests/model_serving/model_server/llmd/conftest.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T10:05:17.642Z
Learnt from: adolfo-ab
PR: opendatahub-io/opendatahub-tests#334
File: tests/model_explainability/trustyai_service/test_trustyai_service.py:52-65
Timestamp: 2025-06-05T10:05:17.642Z
Learning: For TrustyAI image validation tests: operator image tests require admin_client, related_images_refs, and trustyai_operator_configmap fixtures, while service image tests would require different fixtures like trustyai_service_with_pvc_storage, model_namespace, and current_client_token.

Applied to files:

  • tests/model_serving/model_server/llmd/conftest.py
🧬 Code graph analysis (1)
tests/model_serving/model_server/llmd/conftest.py (4)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/infra.py (1)
  • s3_endpoint_secret (282-348)
utilities/llmd_utils.py (2)
  • create_gateway (33-114)
  • create_llmisvc (118-262)
tests/conftest.py (7)
  • admin_client (68-69)
  • unprivileged_model_namespace (459-466)
  • aws_access_key_id (124-131)
  • aws_secret_access_key (135-142)
  • models_s3_bucket_name (243-250)
  • models_s3_bucket_region (254-261)
  • models_s3_bucket_endpoint (265-272)
🪛 Ruff (0.12.2)
tests/model_serving/model_server/llmd/conftest.py

24-24: Unused function argument: admin_client

(ARG001)

🔇 Additional comments (1)
tests/model_serving/model_server/llmd/conftest.py (1)

102-104: Nice ordering control.

Forcing llmd_gateway via request.getfixturevalue is a clean way to ensure the gateway exists before the service.

Signed-off-by: Milind Waykole <mwaykole@redhat.com>
@github-actions github-actions bot added size/xl and removed size/xxl labels Sep 16, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
utilities/llmd_utils.py (8)

526-534: Parse JSON body from curl output and return structured payload.

Validators need a parsed payload; extract the last HTTP body and json.loads it.

Apply:

-        out = self.run_inference(
+        out = self.run_inference(
             model_name=model_name,
             inference_input=inference_input,
             use_default_query=use_default_query,
             insecure=insecure,
             token=token,
         )
-        return {"output": out}
+        # Extract last HTTP body (curl -i includes headers)
+        parts = re.split(r"\r?\n\r?\n", out)
+        body = parts[-1] if parts else out
+        try:
+            payload = json.loads(body)
+        except json.JSONDecodeError:
+            payload = None
+        return {"output": out, "payload": payload}

566-633: Authorized response validation should use parsed JSON payload.

Avoid indexing res with dynamic keys; validate against res["payload"] and support regex/exact match.

Apply:

 def _validate_authorized_response(
     res: Dict[str, Any],
     inference: LLMUserInference,
     inference_config: Dict[str, Any],
     inference_type: str,
     expected_response_text: Optional[str],
     use_default_query: bool,
     model_name: str,
 ) -> None:
-    """Validate response for authorized users."""
-
-    use_regex = False
+    """Validate response for authorized users."""
+    payload = res.get("payload")
+    if not isinstance(payload, dict):
+        raise InferenceResponseError("Failed to parse JSON payload from response.")
+
+    # Resolve expected text
+    use_regex = False
     if use_default_query:
         expected_response_text_config = inference_config.get("default_query_model", {})
         use_regex = expected_response_text_config.get("use_regex", False)
 
         if not expected_response_text_config:
             raise ValueError(f"Missing default_query_model config for inference {inference_config}")
 
         if inference_config.get("support_multi_default_queries"):
             query_config = expected_response_text_config.get(inference_type)
             if not query_config:
                 raise ValueError(f"Missing default_query_model config for inference type {inference_type}")
             expected_response_text = query_config.get("query_output", "")
             use_regex = query_config.get("use_regex", False)
         else:
             expected_response_text = expected_response_text_config.get("query_output")
 
         if not expected_response_text:
             raise ValueError(f"Missing response text key for inference {inference_config}")
 
         if isinstance(expected_response_text, str):
-            expected_response_text = Template(template=expected_response_text).safe_substitute(model_name=model_name)
+            expected_response_text = Template(expected_response_text).safe_substitute(model_name=model_name)
         elif isinstance(expected_response_text, dict):
             response_output = expected_response_text.get("response_output")
             if response_output is not None:
-                expected_response_text = Template(template=response_output).safe_substitute(model_name=model_name)
-    if inference.inference_response_text_key_name:
-        if inference_type == inference.STREAMING:
-            if output := re.findall(
-                rf"{inference.inference_response_text_key_name}\": \"(.*)\"",
-                res[inference.inference_response_key_name],
-                re.MULTILINE,
-            ):
-                assert "".join(output) == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {output}"
-                )
-        elif inference_type == inference.INFER or use_regex:
-            formatted_res = json.dumps(res[inference.inference_response_text_key_name]).replace(" ", "")
-            if use_regex and expected_response_text is not None:
-                assert re.search(expected_response_text, formatted_res), (
-                    f"Expected: {expected_response_text} not found in: {formatted_res}"
-                )
-            else:
-                formatted_res = json.dumps(res[inference.inference_response_key_name]).replace(" ", "")
-                assert formatted_res == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match output: {formatted_res}"
-                )
-        else:
-            response = res[inference.inference_response_key_name]
-            if isinstance(response, list):
-                response = response[0]
-
-            if isinstance(response, dict):
-                response_text = response[inference.inference_response_text_key_name]
-                assert response_text == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {response_text}"
-                )
-            else:
-                raise InferenceResponseError(
-                    "Inference response output does not match expected output format."
-                    f"Expected: {expected_response_text}.\nResponse: {res}"
-                )
-    else:
-        raise InferenceResponseError(f"Inference response output not found in response. Response: {res}")
+                expected_response_text = Template(response_output).safe_substitute(model_name=model_name)
+
+    # Extract text from OpenAI-style responses
+    response_text = None
+    if isinstance(payload.get("choices"), list) and payload["choices"]:
+        choice0 = payload["choices"][0]
+        if isinstance(choice0, dict):
+            if isinstance(choice0.get("message"), dict):
+                response_text = choice0["message"].get("content")
+            elif "text" in choice0:
+                response_text = choice0.get("text")
+
+    if response_text is None:
+        raise InferenceResponseError(f"Inference response output not found in payload. Payload: {payload}")
+
+    if use_regex and expected_response_text is not None:
+        assert re.search(str(expected_response_text), str(response_text)), (
+            f"Expected pattern not found. Pattern: {expected_response_text}"
+        )
+    elif expected_response_text is not None:
+        assert str(response_text) == str(expected_response_text), (
+            f"Expected: {expected_response_text} does not match: {response_text}"
+        )

180-189: Hard‑coded S3 bucket blocks non‑default envs. Build URI from inputs.

Allow fully‑qualified s3:// and avoid forcing ods-ci-wisdom.

Apply:

-    if storage_key and storage_path:
-        model_config = {
-            "uri": f"s3://ods-ci-wisdom/{storage_path}",
-        }
+    if storage_key and storage_path:
+        # Accept full s3:// URI or construct from "bucket/prefix"
+        uri = storage_path if str(storage_path).startswith("s3://") else f"s3://{storage_path}"
+        model_config = {"uri": uri}

Optional: include the secret ref if supported by the CRD (e.g., model_config["storageKey"] = storage_key).


435-437: Template() called with invalid kwarg; will raise TypeError.

Use positional arg.

Apply:

-            template = Template(template=template_str)
+            template = Template(template_str)

469-475: Curl header/body quoting bug + split args; payload injection risk.

  • Do not strip quotes from the header; keep HTTPRequest.CONTENT_JSON as-is.
  • Quote the JSON body via shlex.quote to survive single quotes in content.
  • Add --fail-with-body to fail on non‑2xx while preserving body.

Apply:

-        header = HTTPRequest.CONTENT_JSON.replace("-H ", "").replace("'", "")
-        cmd_exec = "curl -i -s"
-        cmd = f"{cmd_exec} -X POST -d '{body}' -H {header} -H 'Accept: application/json'"
+        header = HTTPRequest.CONTENT_JSON
+        cmd_exec = "curl -i -s --fail-with-body"
+        quoted_body = shlex.quote(body)
+        cmd = f"{cmd_exec} -X POST -d {quoted_body} {header} -H 'Accept: application/json'"

482-489: Wrong deployment_mode to get_ca_bundle() forces insecure.

Use “raw-deployment”; also avoid blind except and log reason.

Apply:

-                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw")
+                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw-deployment")
                 if ca_bundle:
                     cmd += f" --cacert {ca_bundle}"
                 else:
                     cmd += " --insecure"
-            except Exception:
-                cmd += " --insecure"
+            except Exception as e:
+                LOGGER.warning(f"Falling back to --insecure (CA bundle unavailable): {e}")
+                cmd += " --insecure"

512-515: Bearer token leakage in error path; redact before raising.

Sanitize Authorization header in the command string.

Apply:

-        if res:
-            return out
-        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {cmd}")
+        if res:
+            return out
+        redacted_cmd = re.sub(r"(Authorization:\s*Bearer\s+)\S+", r"\1<redacted>", cmd)
+        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {redacted_cmd}")

589-595: More Template() constructor bugs.

Both calls pass template= kwarg; switch to positional.

Apply:

-            expected_response_text = Template(template=expected_response_text).safe_substitute(model_name=model_name)
+            expected_response_text = Template(expected_response_text).safe_substitute(model_name=model_name)
...
-                expected_response_text = Template(template=response_output).safe_substitute(model_name=model_name)
+                expected_response_text = Template(response_output).safe_substitute(model_name=model_name)
🧹 Nitpick comments (3)
utilities/llmd_utils.py (3)

71-85: Don't swallow broad exceptions when cleaning existing Gateway.

Catching Exception hides real failures (permissions, API errors). Catch the specific not-found exception (or check .exists via API) and log others.

Apply:

-    try:
-        existing_gateway = Gateway(
+    try:
+        existing_gateway = Gateway(
             client=client,
             name=name,
             namespace=namespace,
             api_group="gateway.networking.k8s.io",
         )
         if existing_gateway.exists:
             LOGGER.info(f"Cleaning up existing Gateway {name} in namespace {namespace}")
             existing_gateway.delete(wait=True, timeout=Timeout.TIMEOUT_2MIN)
-    except Exception as e:
+    except Exception as e:
         LOGGER.debug(f"No existing Gateway to clean up: {e}")

Consider narrowing the except to the specific “not found”/lookup error raised by ocp-resources.


295-302: Catching broad Exception hides discovery failures.

Scope to the expected not-found error from get_services_by_isvc_label; log others at warning/error.

Apply:

-    except Exception as e:
+    except Exception as e:
         LOGGER.warning(f"Could not get service for LLMInferenceService {llm_service.name}: {e}")

Consider catching the specific ResourceNotFoundError raised by get_services_by_isvc_label.


536-555: Unused parameter and brittle auth checks.

  • inference arg unused; prefix with _ or remove.
  • Consider case-insensitive header match and structured checks once headers parsing is added.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b8630f and 75978bf.

📒 Files selected for processing (1)
  • utilities/llmd_utils.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
utilities/llmd_utils.py (4)
utilities/certificates_utils.py (1)
  • get_ca_bundle (69-104)
utilities/constants.py (2)
  • HTTPRequest (112-115)
  • Timeout (213-224)
utilities/exceptions.py (1)
  • InferenceResponseError (24-25)
utilities/infra.py (1)
  • get_services_by_isvc_label (504-533)
🪛 Ruff (0.12.2)
utilities/llmd_utils.py

83-83: Do not catch blind exception: Exception

(BLE001)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


195-195: Avoid specifying long messages outside the exception class

(TRY003)


206-206: Avoid specifying long messages outside the exception class

(TRY003)


298-298: Do not catch blind exception: Exception

(BLE001)


397-397: Avoid specifying long messages outside the exception class

(TRY003)


399-399: Avoid specifying long messages outside the exception class

(TRY003)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


424-424: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


488-488: Do not catch blind exception: Exception

(BLE001)


515-515: Avoid specifying long messages outside the exception class

(TRY003)


536-536: Unused function argument: inference

(ARG001)


554-554: Avoid specifying long messages outside the exception class

(TRY003)


575-575: Avoid specifying long messages outside the exception class

(TRY003)


580-580: Avoid specifying long messages outside the exception class

(TRY003)


587-587: Avoid specifying long messages outside the exception class

(TRY003)


627-630: Avoid specifying long messages outside the exception class

(TRY003)


632-632: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
utilities/llmd_utils.py (1)

239-250: Confirm LLMInferenceService ctor accepts label vs labels.

LLMInferenceService is imported from ocp_resources (not defined in this repo); the call currently passes label=labels (utilities/llmd_utils.py:239–250). Verify ocp_resources.LLMInferenceService init accepts label — if it expects labels, change to labels=labels to avoid a silent no-op.

Copy link
Copy Markdown
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we're adding CPU tests here while we could augment the existing tests that run on CI and will catch issues earlier in https://github.com/opendatahub-io/kserve/blob/release-v0.15/test/e2e/llmisvc/test_llm_inference_service.py

Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utilities/llmd_utils.py (1)

579-655: Authorized response validation should use parsed payload (OpenAI format).

Current logic indexes res with dynamic keys and will KeyError; use payload.

 def _validate_authorized_response(
     res: Dict[str, Any],
     inference: LLMUserInference,
     inference_config: Dict[str, Any],
     inference_type: str,
     expected_response_text: Optional[str],
     use_default_query: bool,
     model_name: str,
 ) -> None:
-    """Validate response for authorized users."""
-
-    use_regex = False
-
-    if use_default_query:
-        expected_response_text_config = inference_config.get("default_query_model", {})
-        use_regex = expected_response_text_config.get("use_regex", False)
-
-        if not expected_response_text_config:
-            raise ValueError(f"Missing default_query_model config for inference {inference_config}")
-
-        if inference_config.get("support_multi_default_queries"):
-            query_config = expected_response_text_config.get(inference_type)
-            if not query_config:
-                raise ValueError(f"Missing default_query_model config for inference type {inference_type}")
-            expected_response_text = query_config.get("query_output", "")
-            use_regex = query_config.get("use_regex", False)
-        else:
-            expected_response_text = expected_response_text_config.get("query_output")
-
-        if not expected_response_text:
-            raise ValueError(f"Missing response text key for inference {inference_config}")
-
-        if isinstance(expected_response_text, str):
-            expected_response_text = Template(template=expected_response_text).safe_substitute(model_name=model_name)
-        elif isinstance(expected_response_text, dict):
-            response_output = expected_response_text.get("response_output")
-            if response_output is not None:
-                expected_response_text = Template(template=response_output).safe_substitute(model_name=model_name)
-    if inference.inference_response_text_key_name:
-        if inference_type == inference.STREAMING:
-            if output := re.findall(
-                rf"{inference.inference_response_text_key_name}\": \"(.*)\"",
-                res[inference.inference_response_key_name],
-                re.MULTILINE,
-            ):
-                assert "".join(output) == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {output}"
-                )
-        elif inference_type == inference.INFER or use_regex:
-            formatted_res = json.dumps(res[inference.inference_response_text_key_name]).replace(" ", "")
-            if use_regex and expected_response_text is not None:
-                assert re.search(expected_response_text, formatted_res), (
-                    f"Expected: {expected_response_text} not found in: {formatted_res}"
-                )
-            else:
-                formatted_res = json.dumps(res[inference.inference_response_key_name]).replace(" ", "")
-                assert formatted_res == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match output: {formatted_res}"
-                )
-        else:
-            response = res[inference.inference_response_key_name]
-            if isinstance(response, list):
-                response = response[0]
-
-            if isinstance(response, dict):
-                response_text = response[inference.inference_response_text_key_name]
-                assert response_text == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {response_text}"
-                )
-            else:
-                raise InferenceResponseError(
-                    "Inference response output does not match expected output format."
-                    f"Expected: {expected_response_text}.\nResponse: {res}"
-                )
-    else:
-        raise InferenceResponseError(f"Inference response output not found in response. Response: {res}")
+    """Validate response for authorized users (OpenAI Chat Completions)."""
+    payload = res.get("payload")
+    if not isinstance(payload, dict):
+        raise InferenceResponseError("Failed to parse JSON payload from response.")
+
+    # Resolve expected text
+    use_regex = False
+    if use_default_query:
+        expected_response_text_config = inference_config.get("default_query_model", {})
+        use_regex = expected_response_text_config.get("use_regex", False)
+        if not expected_response_text_config:
+            raise ValueError("Missing default_query_model config")
+        if inference_config.get("support_multi_default_queries"):
+            query_config = expected_response_text_config.get(inference_type) or {}
+            expected_response_text = query_config.get("query_output", "")
+            use_regex = query_config.get("use_regex", use_regex)
+        else:
+            expected_response_text = expected_response_text_config.get("query_output")
+        if not expected_response_text:
+            raise ValueError("Missing response text key")
+        if isinstance(expected_response_text, str):
+            expected_response_text = Template(expected_response_text).safe_substitute(model_name=model_name)
+        elif isinstance(expected_response_text, dict):
+            response_output = expected_response_text.get("response_output", "")
+            expected_response_text = Template(response_output).safe_substitute(model_name=model_name)
+
+    # Validate payload shape and extract content
+    validate_llm_response_format(payload)
+    response_text = payload["choices"][0]["message"]["content"]
+
+    if use_regex and expected_response_text is not None:
+        assert re.search(str(expected_response_text), response_text), (
+            f"Expected pattern not found. Pattern: {expected_response_text}"
+        )
+    else:
+        assert response_text == expected_response_text, (
+            f"Expected: {expected_response_text} does not match: {response_text}"
+        )
🧹 Nitpick comments (1)
tests/model_serving/model_server/llmd/conftest.py (1)

23-26: Remove unused parameter to fix Ruff ARG001.

-@pytest.fixture(scope="class")
-def gateway_namespace(admin_client: DynamicClient) -> str:
+@pytest.fixture(scope="class")
+def gateway_namespace() -> str:
     return DEFAULT_GATEWAY_NAMESPACE
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75978bf and 9c72e80.

📒 Files selected for processing (6)
  • tests/model_serving/model_server/llmd/__init__.py (1 hunks)
  • tests/model_serving/model_server/llmd/conftest.py (1 hunks)
  • tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py (1 hunks)
  • tests/model_serving/model_server/llmd/test_llmd_s3.py (1 hunks)
  • tests/model_serving/model_server/llmd/utils.py (1 hunks)
  • utilities/llmd_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/model_serving/model_server/llmd/init.py
  • tests/model_serving/model_server/llmd/test_llmd_oci_cpu.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/model_serving/model_server/llmd/test_llmd_s3.py (5)
tests/model_serving/model_server/llmd/utils.py (3)
  • verify_llm_service_status (44-65)
  • verify_gateway_status (20-41)
  • verify_llmd_pods_not_restarted (68-106)
utilities/constants.py (1)
  • Protocols (92-99)
utilities/llmd_utils.py (1)
  • verify_inference_response_llmd (327-388)
tests/conftest.py (1)
  • admin_client (68-69)
tests/model_serving/model_server/llmd/conftest.py (2)
  • llmd_gateway (65-94)
  • llmd_inference_service_s3 (137-184)
tests/model_serving/model_server/llmd/utils.py (1)
utilities/exceptions.py (1)
  • PodContainersRestartError (109-110)
tests/model_serving/model_server/llmd/conftest.py (4)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/infra.py (1)
  • s3_endpoint_secret (282-348)
utilities/llmd_utils.py (2)
  • create_gateway (33-136)
  • create_llmisvc (140-284)
tests/conftest.py (7)
  • admin_client (68-69)
  • unprivileged_model_namespace (459-466)
  • aws_access_key_id (124-131)
  • aws_secret_access_key (135-142)
  • models_s3_bucket_name (243-250)
  • models_s3_bucket_region (254-261)
  • models_s3_bucket_endpoint (265-272)
utilities/llmd_utils.py (4)
utilities/certificates_utils.py (1)
  • get_ca_bundle (69-104)
utilities/constants.py (2)
  • HTTPRequest (112-115)
  • Timeout (213-224)
utilities/exceptions.py (1)
  • InferenceResponseError (24-25)
utilities/infra.py (1)
  • get_services_by_isvc_label (504-533)
🪛 Ruff (0.13.1)
tests/model_serving/model_server/llmd/conftest.py

24-24: Unused function argument: admin_client

(ARG001)

utilities/llmd_utils.py

102-102: Do not catch blind exception: Exception

(BLE001)


211-211: Avoid specifying long messages outside the exception class

(TRY003)


217-217: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Avoid specifying long messages outside the exception class

(TRY003)


320-320: Do not catch blind exception: Exception

(BLE001)


419-419: Avoid specifying long messages outside the exception class

(TRY003)


421-421: Avoid specifying long messages outside the exception class

(TRY003)


441-441: Avoid specifying long messages outside the exception class

(TRY003)


446-446: Avoid specifying long messages outside the exception class

(TRY003)


451-451: Avoid specifying long messages outside the exception class

(TRY003)


510-510: Do not catch blind exception: Exception

(BLE001)


537-537: Avoid specifying long messages outside the exception class

(TRY003)


558-558: Unused function argument: inference

(ARG001)


576-576: Avoid specifying long messages outside the exception class

(TRY003)


597-597: Avoid specifying long messages outside the exception class

(TRY003)


602-602: Avoid specifying long messages outside the exception class

(TRY003)


609-609: Avoid specifying long messages outside the exception class

(TRY003)


649-652: Avoid specifying long messages outside the exception class

(TRY003)


654-654: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (10)
tests/model_serving/model_server/llmd/conftest.py (3)

78-93: Guard request.param and avoid duplicate kwargs when creating gateway.

-    if isinstance(request.param, str):
-        gateway_class_name = request.param
-        kwargs = {}
-    else:
-        gateway_class_name = request.param.get("gateway_class_name", "openshift-default")
-        kwargs = {k: v for k, v in request.param.items() if k != "gateway_class_name"}
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        gateway_class_name = param
+        kwargs = {}
+    elif isinstance(param, dict):
+        gateway_class_name = param.get("gateway_class_name", "openshift-default")
+        kwargs = {k: v for k, v in param.items() if k != "gateway_class_name"}
+    else:
+        gateway_class_name = "openshift-default"
+        kwargs = {}
+    wait_for_condition = kwargs.pop("wait_for_condition", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_5MIN)
@@
-        wait_for_condition=True,
-        timeout=Timeout.TIMEOUT_5MIN,
+        wait_for_condition=wait_for_condition,
+        timeout=timeout,
         teardown=False,  # Don't delete gateway in teardown
         **kwargs,

103-133: Same robustness for llmd_inference_service: guard param and pop duplicates.

-    if isinstance(request.param, str):
-        name_suffix = request.param
-        kwargs = {}
-    else:
-        name_suffix = request.param.get("name_suffix", "basic")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        name_suffix = param
+        kwargs = {}
+    elif isinstance(param, dict):
+        name_suffix = param.get("name_suffix", "basic")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}
+    else:
+        name_suffix = "basic"
+        kwargs = {}
 
-    service_name = kwargs.get("name", f"llm-{name_suffix}")
+    service_name = kwargs.pop("name", f"llm-{name_suffix}")
@@
-    container_resources = kwargs.get(
+    container_resources = kwargs.pop(
         "container_resources",
         {
             "limits": {"cpu": "1", "memory": "10Gi"},
             "requests": {"cpu": "100m", "memory": "8Gi"},
         },
     )
+    storage_uri = kwargs.pop("storage_uri", VLLM_STORAGE_OCI)
+    container_image = kwargs.pop("container_image", VLLM_CPU_IMAGE)
+    wait = kwargs.pop("wait", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_15MIN)
@@
-        storage_uri=kwargs.get("storage_uri", VLLM_STORAGE_OCI),
-        container_image=kwargs.get("container_image", VLLM_CPU_IMAGE),
+        storage_uri=storage_uri,
+        container_image=container_image,
         container_resources=container_resources,
-        wait=True,
-        timeout=Timeout.TIMEOUT_15MIN,
-        **{k: v for k, v in kwargs.items() if k != "name"},
+        wait=wait,
+        timeout=timeout,
+        **kwargs,

136-184: S3 fixture: same guard/pop pattern and build full S3 path with bucket.

Prevents crashes and ensures a valid S3 URI is formed.

-    if isinstance(request.param, str):
-        name_suffix = request.param
-        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
-    else:
-        name_suffix = request.param.get("name_suffix", "s3")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        name_suffix = param
+        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
+    elif isinstance(param, dict):
+        name_suffix = param.get("name_suffix", "s3")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}
+    else:
+        name_suffix = "s3"
+        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
 
-    service_name = kwargs.get("name", f"llm-{name_suffix}")
+    service_name = kwargs.pop("name", f"llm-{name_suffix}")
@@
-    if "storage_key" not in kwargs:
-        kwargs["storage_key"] = llmd_s3_secret.name
-
-    if "storage_path" not in kwargs:
-        kwargs["storage_path"] = DEFAULT_S3_STORAGE_PATH
+    storage_key = kwargs.pop("storage_key", llmd_s3_secret.name)
+    storage_path = kwargs.pop("storage_path", DEFAULT_S3_STORAGE_PATH)
+    # Build "bucket/prefix" if only a prefix was provided
+    from tests.conftest import models_s3_bucket_name as _bucket_name  # type: ignore
+    try:
+        bucket = _bucket_name  # fixture callable imported above
+    except Exception:
+        bucket = None
+    if not str(storage_path).startswith("s3://"):
+        if bucket:
+            storage_path = f"{bucket.rstrip('/')}/{str(storage_path).lstrip('/')}"
@@
-    container_resources = kwargs.get(
+    container_resources = kwargs.pop(
         "container_resources",
         {
             "limits": {"cpu": "1", "memory": "10Gi"},
             "requests": {"cpu": "100m", "memory": "8Gi"},
         },
     )
-    with create_llmisvc(
+    container_image = kwargs.pop("container_image", VLLM_CPU_IMAGE)
+    wait = kwargs.pop("wait", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_15MIN)
+    with create_llmisvc(
         client=admin_client,
         name=service_name,
         namespace=unprivileged_model_namespace.name,
-        storage_key=kwargs.get("storage_key"),
-        storage_path=kwargs.get("storage_path"),
-        container_image=kwargs.get("container_image", VLLM_CPU_IMAGE),
+        storage_key=storage_key,
+        storage_path=storage_path,
+        container_image=container_image,
         container_resources=container_resources,
         service_account=llmd_s3_service_account.name,
-        wait=True,
-        timeout=Timeout.TIMEOUT_15MIN,
-        **{
-            k: v
-            for k, v in kwargs.items()
-            if k not in ["name", "storage_key", "storage_path", "container_image", "container_resources"]
-        },
+        wait=wait,
+        timeout=timeout,
+        **kwargs,
     ) as llm_service:
         yield llm_service
utilities/llmd_utils.py (6)

491-494: Curl header formatting bug splits arguments. Keep quoted -H header.

-        header = HTTPRequest.CONTENT_JSON.replace("-H ", "").replace("'", "")
+        header = HTTPRequest.CONTENT_JSON
         cmd_exec = "curl -i -s"
-        cmd = f"{cmd_exec} -X POST -d '{body}' -H {header} -H 'Accept: application/json'"
+        cmd = f"{cmd_exec} -X POST -d '{body}' {header} -H 'Accept: application/json'"

505-510: Wrong deployment_mode forces insecure.

Use “raw-deployment” to let get_ca_bundle work.

-                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw")
+                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw-deployment")

457-459: Fix Template constructor usage (prevents TypeError).

-            template = Template(template=template_str)
+            template = Template(template_str)

202-209: Stop hard‑coding the S3 bucket; accept full URI or bucket/prefix.

Current value ignores the environment and will point to the wrong bucket.

-    if storage_key and storage_path:
-        model_config = {
-            "uri": f"s3://ods-ci-wisdom/{storage_path}",
-        }
+    if storage_key and storage_path:
+        # Treat storage_path as "bucket/prefix" or a full "s3://bucket/prefix"
+        uri = storage_path if str(storage_path).startswith("s3://") else f"s3://{storage_path}"
+        model_config = {"uri": uri}

548-555: Parse JSON body from curl output for validators.

Validators expect a parsed payload, not raw headers.

-        out = self.run_inference(
+        out = self.run_inference(
             model_name=model_name,
             inference_input=inference_input,
             use_default_query=use_default_query,
             insecure=insecure,
             token=token,
         )
-        return {"output": out}
+        body = out.split("\r\n\r\n")[-1]
+        try:
+            payload = json.loads(body)
+        except json.JSONDecodeError:
+            payload = None
+        return {"output": out, "payload": payload}

534-538: Redact Bearer token from error to avoid leakage.

-        if res:
-            return out
-        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {cmd}")
+        if res:
+            return out
+        redacted_cmd = re.sub(r'(Authorization:\s*Bearer\s+)\S+', r'\1<redacted>', cmd)
+        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {redacted_cmd}")
tests/model_serving/model_server/llmd/test_llmd_s3.py (1)

31-38: Confirmed: OPT125M_CPU_INFERENCE_CONFIG has chat_completions.http
OPT125M_CPU_INFERENCE_CONFIG includes a chat_completions -> http section, so get_runtime_config() will not raise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/model_serving/model_server/llmd/conftest.py (1)

24-25: Silence Ruff ARG001: unused fixture arg.

admin_client isn’t used. Rename to underscore to keep fixture ordering without linter noise.

- def gateway_namespace(admin_client: DynamicClient) -> str:
+ def gateway_namespace(_admin_client: DynamicClient) -> str:
utilities/llmd_utils.py (2)

103-105: Avoid blind Exception catch.

Prefer a narrower exception (e.g., specific ocp-resources errors) or drop the try; Gateway.exists already guards existence.


559-566: Silence Ruff ARG001: unused param.

inference isn’t used in _validate_unauthorized_response; rename to underscore.

-def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str], inference: LLMUserInference) -> None:
+def _validate_unauthorized_response(res: Dict[str, Any], token: Optional[str], _inference: LLMUserInference) -> None:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c72e80 and e8bdd8d.

📒 Files selected for processing (3)
  • tests/model_serving/model_server/llmd/conftest.py (1 hunks)
  • tests/model_serving/model_server/llmd/utils.py (1 hunks)
  • utilities/llmd_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/model_serving/model_server/llmd/utils.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/model_serving/model_server/llmd/conftest.py (4)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/infra.py (1)
  • s3_endpoint_secret (282-348)
utilities/llmd_utils.py (2)
  • create_llmd_gateway (33-137)
  • create_llmisvc (141-285)
tests/conftest.py (7)
  • admin_client (68-69)
  • unprivileged_model_namespace (459-466)
  • aws_access_key_id (124-131)
  • aws_secret_access_key (135-142)
  • models_s3_bucket_name (243-250)
  • models_s3_bucket_region (254-261)
  • models_s3_bucket_endpoint (265-272)
utilities/llmd_utils.py (4)
utilities/certificates_utils.py (1)
  • get_ca_bundle (69-104)
utilities/constants.py (2)
  • HTTPRequest (112-115)
  • Timeout (213-224)
utilities/exceptions.py (1)
  • InferenceResponseError (24-25)
utilities/infra.py (1)
  • get_services_by_isvc_label (504-533)
🪛 Ruff (0.13.1)
tests/model_serving/model_server/llmd/conftest.py

24-24: Unused function argument: admin_client

(ARG001)

utilities/llmd_utils.py

103-103: Do not catch blind exception: Exception

(BLE001)


212-212: Avoid specifying long messages outside the exception class

(TRY003)


218-218: Avoid specifying long messages outside the exception class

(TRY003)


229-229: Avoid specifying long messages outside the exception class

(TRY003)


321-321: Do not catch blind exception: Exception

(BLE001)


420-420: Avoid specifying long messages outside the exception class

(TRY003)


422-422: Avoid specifying long messages outside the exception class

(TRY003)


442-442: Avoid specifying long messages outside the exception class

(TRY003)


447-447: Avoid specifying long messages outside the exception class

(TRY003)


452-452: Avoid specifying long messages outside the exception class

(TRY003)


511-511: Do not catch blind exception: Exception

(BLE001)


538-538: Avoid specifying long messages outside the exception class

(TRY003)


559-559: Unused function argument: inference

(ARG001)


577-577: Avoid specifying long messages outside the exception class

(TRY003)


598-598: Avoid specifying long messages outside the exception class

(TRY003)


603-603: Avoid specifying long messages outside the exception class

(TRY003)


610-610: Avoid specifying long messages outside the exception class

(TRY003)


650-653: Avoid specifying long messages outside the exception class

(TRY003)


655-655: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (11)
utilities/llmd_utils.py (8)

580-656: Validate against parsed payload instead of raw “output” blob.

Current logic indexes res as if it contained structured data; after parsing, use res["payload"] and handle streaming vs non‑streaming.

 def _validate_authorized_response(
     res: Dict[str, Any],
     inference: LLMUserInference,
     inference_config: Dict[str, Any],
     inference_type: str,
     expected_response_text: Optional[str],
     use_default_query: bool,
     model_name: str,
 ) -> None:
-    """Validate response for authorized users."""
-
-    use_regex = False
+    """Validate response for authorized users."""
+    use_regex = False
@@
-    if inference.inference_response_text_key_name:
-        if inference_type == inference.STREAMING:
-            if output := re.findall(
-                rf"{inference.inference_response_text_key_name}\": \"(.*)\"",
-                res[inference.inference_response_key_name],
-                re.MULTILINE,
-            ):
-                assert "".join(output) == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {output}"
-                )
-        elif inference_type == inference.INFER or use_regex:
-            formatted_res = json.dumps(res[inference.inference_response_text_key_name]).replace(" ", "")
-            if use_regex and expected_response_text is not None:
-                assert re.search(expected_response_text, formatted_res), (
-                    f"Expected: {expected_response_text} not found in: {formatted_res}"
-                )
-            else:
-                formatted_res = json.dumps(res[inference.inference_response_key_name]).replace(" ", "")
-                assert formatted_res == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match output: {formatted_res}"
-                )
-        else:
-            response = res[inference.inference_response_key_name]
-            if isinstance(response, list):
-                response = response[0]
-
-            if isinstance(response, dict):
-                response_text = response[inference.inference_response_text_key_name]
-                assert response_text == expected_response_text, (
-                    f"Expected: {expected_response_text} does not match response: {response_text}"
-                )
-            else:
-                raise InferenceResponseError(
-                    "Inference response output does not match expected output format."
-                    f"Expected: {expected_response_text}.\nResponse: {res}"
-                )
-    else:
-        raise InferenceResponseError(f"Inference response output not found in response. Response: {res}")
+    # Streaming: validate against raw output
+    if inference_type == inference.STREAMING:
+        key = inference.inference_response_text_key_name or "content"
+        if output := re.findall(rf'{re.escape(key)}\": \"(.*)\"', res.get("output", ""), re.MULTILINE):
+            assert "".join(output) == expected_response_text, (
+                f"Expected: {expected_response_text} does not match response: {output}"
+            )
+        else:
+            raise InferenceResponseError("Streaming response did not contain expected content.")
+        return
+
+    # Non-streaming: use parsed payload
+    payload = res.get("payload")
+    if not isinstance(payload, dict):
+        raise InferenceResponseError("Failed to parse JSON payload from response.")
+
+    # Default to OpenAI Chat Completions shape
+    try:
+        response_text = payload["choices"][0]["message"]["content"]
+    except Exception as exc:
+        raise InferenceResponseError(f"Unexpected payload shape: {payload}") from exc
+
+    if use_regex and expected_response_text is not None:
+        assert re.search(str(expected_response_text), response_text), (
+            f"Expected pattern not found. Pattern: {expected_response_text}"
+        )
+    else:
+        assert response_text == expected_response_text, (
+            f"Expected: {expected_response_text} does not match: {response_text}"
+        )

492-495: Curl header formatting bug; causes split args.

Don’t strip -H/quotes from CONTENT_JSON; pass it verbatim.

-        header = HTTPRequest.CONTENT_JSON.replace("-H ", "").replace("'", "")
-        cmd_exec = "curl -i -s"
-        cmd = f"{cmd_exec} -X POST -d '{body}' -H {header} -H 'Accept: application/json'"
+        header = HTTPRequest.CONTENT_JSON
+        cmd_exec = "curl -i -s"
+        cmd = f"{cmd_exec} -X POST -d '{body}' {header} -H 'Accept: application/json'"

505-509: Wrong deployment_mode for get_ca_bundle(); forces insecure.

Use “raw-deployment” as per certificates_utils.

-                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw")
+                ca_bundle = get_ca_bundle(client=client, deployment_mode="raw-deployment")

535-539: Avoid Bearer token leakage in error message.

Redact Authorization header before raising.

-        res, out, err = run_command(command=shlex.split(cmd), verify_stderr=False, check=False)
-        if res:
-            return out
-        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {cmd}")
+        res, out, err = run_command(command=shlex.split(cmd), verify_stderr=False, check=False)
+        if res:
+            return out
+        redacted_cmd = re.sub(r'(-H\s+[\'"]Authorization:\s*Bearer\s+)\S+([\'"])', r'\1<redacted>\2', cmd)
+        raise ValueError(f"Inference failed with error: {err}\nOutput: {out}\nCommand: {redacted_cmd}")

540-556: Parse JSON body from curl output and return payload.

Validators need structured payload; “-i” includes headers.

-        out = self.run_inference(
+        out = self.run_inference(
             model_name=model_name,
             inference_input=inference_input,
             use_default_query=use_default_query,
             insecure=insecure,
             token=token,
         )
-        return {"output": out}
+        # Extract last HTTP body (supports multiple responses/redirects)
+        parts = out.split("\r\n\r\n")
+        body = parts[-1] if parts else out
+        try:
+            payload = json.loads(body)
+        except json.JSONDecodeError:
+            payload = None
+        return {"output": out, "payload": payload}

457-460: Template() called with wrong kwarg; raises TypeError.

Use positional constructor.

-            template = Template(template=template_str)
-            body = template.safe_substitute(model_name=model_name)
+            template = Template(template_str)
+            body = template.safe_substitute(model_name=model_name)

612-617: Fix Template() misuse in expected-text rendering.

Same constructor issue here.

-        if isinstance(expected_response_text, str):
-            expected_response_text = Template(template=expected_response_text).safe_substitute(model_name=model_name)
-        elif isinstance(expected_response_text, dict):
-            response_output = expected_response_text.get("response_output")
-            if response_output is not None:
-                expected_response_text = Template(template=response_output).safe_substitute(model_name=model_name)
+        if isinstance(expected_response_text, str):
+            expected_response_text = Template(expected_response_text).safe_substitute(model_name=model_name)
+        elif isinstance(expected_response_text, dict):
+            response_output = expected_response_text.get("response_output")
+            if response_output is not None:
+                expected_response_text = Template(response_output).safe_substitute(model_name=model_name)

203-206: Hard-coded S3 bucket breaks non-default envs; build URI from input.

Currently forces s3://ods-ci-wisdom, ignoring provided bucket/path.

-    if storage_key and storage_path:
-        model_config = {
-            "uri": f"s3://ods-ci-wisdom/{storage_path}",
-        }
+    if storage_key and storage_path:
+        # Accept full s3://... or bucket/prefix form
+        uri = storage_path if str(storage_path).startswith("s3://") else f"s3://{storage_path}"
+        model_config = {"uri": uri}
tests/model_serving/model_server/llmd/conftest.py (3)

79-95: Guard against missing request.param and avoid duplicate kwargs in create_llmd_gateway().

Unconditional access to request.param can raise AttributeError when not parametrized. Also, kwargs can duplicate explicit args (wait_for_condition/timeout), causing TypeError.

-    if isinstance(request.param, str):
-        gateway_class_name = request.param
-        kwargs = {}
-    else:
-        gateway_class_name = request.param.get("gateway_class_name", "openshift-default")
-        kwargs = {k: v for k, v in request.param.items() if k != "gateway_class_name"}
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        gateway_class_name = param
+        kwargs = {}
+    elif isinstance(param, dict):
+        gateway_class_name = param.get("gateway_class_name", "openshift-default")
+        kwargs = {k: v for k, v in param.items() if k != "gateway_class_name"}
+    else:
+        gateway_class_name = "openshift-default"
+        kwargs = {}
+    wait_for_condition = kwargs.pop("wait_for_condition", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_5MIN)
@@
-    with create_llmd_gateway(
+    with create_llmd_gateway(
         client=admin_client,
         namespace=gateway_namespace,
         gateway_class_name=gateway_class_name,
-        wait_for_condition=True,
-        timeout=Timeout.TIMEOUT_5MIN,
+        wait_for_condition=wait_for_condition,
+        timeout=timeout,
         teardown=False,  # Don't delete gateway in teardown
-        **kwargs,
+        **kwargs,
     ) as gateway:

104-133: Same issues in llmd_inference_service: guard param and pop duplicates.

Potential AttributeError on request.param and duplicate named args (name, storage_uri, container_image, container_resources, wait, timeout).

-    if isinstance(request.param, str):
-        name_suffix = request.param
-        kwargs = {}
-    else:
-        name_suffix = request.param.get("name_suffix", "basic")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
-
-    service_name = kwargs.get("name", f"llm-{name_suffix}")
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        name_suffix = param
+        kwargs = {}
+    elif isinstance(param, dict):
+        name_suffix = param.get("name_suffix", "basic")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}
+    else:
+        name_suffix = "basic"
+        kwargs = {}
+
+    service_name = kwargs.pop("name", f"llm-{name_suffix}")
@@
-    container_resources = kwargs.get(
+    container_resources = kwargs.pop(
         "container_resources",
         {
             "limits": {"cpu": "1", "memory": "10Gi"},
             "requests": {"cpu": "100m", "memory": "8Gi"},
         },
     )
+    storage_uri = kwargs.pop("storage_uri", VLLM_STORAGE_OCI)
+    container_image = kwargs.pop("container_image", VLLM_CPU_IMAGE)
+    wait = kwargs.pop("wait", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_15MIN)
@@
-    with create_llmisvc(
+    with create_llmisvc(
         client=admin_client,
         name=service_name,
         namespace=unprivileged_model_namespace.name,
-        storage_uri=kwargs.get("storage_uri", VLLM_STORAGE_OCI),
-        container_image=kwargs.get("container_image", VLLM_CPU_IMAGE),
+        storage_uri=storage_uri,
+        container_image=container_image,
         container_resources=container_resources,
-        wait=True,
-        timeout=Timeout.TIMEOUT_15MIN,
-        **{k: v for k, v in kwargs.items() if k != "name"},
+        wait=wait,
+        timeout=timeout,
+        **kwargs,
     ) as llm_service:

145-185: Apply same robustness to S3 fixture and allow wait/timeout override.

Guard request.param and pop duplicate args (name, storage_key, storage_path, container_image, container_resources, wait, timeout).

-    if isinstance(request.param, str):
-        name_suffix = request.param
-        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
-    else:
-        name_suffix = request.param.get("name_suffix", "s3")
-        kwargs = {k: v for k, v in request.param.items() if k != "name_suffix"}
-
-    service_name = kwargs.get("name", f"llm-{name_suffix}")
+    param = getattr(request, "param", None)
+    if isinstance(param, str):
+        name_suffix = param
+        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
+    elif isinstance(param, dict):
+        name_suffix = param.get("name_suffix", "s3")
+        kwargs = {k: v for k, v in param.items() if k != "name_suffix"}
+    else:
+        name_suffix = "s3"
+        kwargs = {"storage_path": DEFAULT_S3_STORAGE_PATH}
+
+    service_name = kwargs.pop("name", f"llm-{name_suffix}")
@@
-    if "storage_key" not in kwargs:
-        kwargs["storage_key"] = llmd_s3_secret.name
-
-    if "storage_path" not in kwargs:
-        kwargs["storage_path"] = DEFAULT_S3_STORAGE_PATH
+    storage_key = kwargs.pop("storage_key", llmd_s3_secret.name)
+    storage_path = kwargs.pop("storage_path", DEFAULT_S3_STORAGE_PATH)
@@
-    container_resources = kwargs.get(
+    container_resources = kwargs.pop(
         "container_resources",
         {
             "limits": {"cpu": "1", "memory": "10Gi"},
             "requests": {"cpu": "100m", "memory": "8Gi"},
         },
     )
+    container_image = kwargs.pop("container_image", VLLM_CPU_IMAGE)
+    wait = kwargs.pop("wait", True)
+    timeout = kwargs.pop("timeout", Timeout.TIMEOUT_15MIN)
@@
-    with create_llmisvc(
+    with create_llmisvc(
         client=admin_client,
         name=service_name,
         namespace=unprivileged_model_namespace.name,
-        storage_key=kwargs.get("storage_key"),
-        storage_path=kwargs.get("storage_path"),
-        container_image=kwargs.get("container_image", VLLM_CPU_IMAGE),
+        storage_key=storage_key,
+        storage_path=storage_path,
+        container_image=container_image,
         container_resources=container_resources,
         service_account=llmd_s3_service_account.name,
-        wait=True,
-        timeout=Timeout.TIMEOUT_15MIN,
-        **{
-            k: v
-            for k, v in kwargs.items()
-            if k not in ["name", "storage_key", "storage_path", "container_image", "container_resources"]
-        },
+        wait=wait,
+        timeout=timeout,
+        **kwargs,
     ) as llm_service:

@Raghul-M Raghul-M enabled auto-merge (squash) September 23, 2025 15:17
@mwaykole mwaykole disabled auto-merge September 23, 2025 15:20
@mwaykole mwaykole merged commit e0efa53 into opendatahub-io:main Sep 23, 2025
13 checks passed
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

sheltoncyril pushed a commit to sheltoncyril/opendatahub-tests that referenced this pull request Oct 2, 2025
* Add LLMD model server tests and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix: break long line in utils.py to meet flake8 line length requirement

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

---------

Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@mwaykole mwaykole changed the title feat: add llmd test cpu test and lib Feature: add llmd test and lib as foundation of llmd test Nov 16, 2025
mwaykole added a commit to mwaykole/opendatahub-tests that referenced this pull request Jan 23, 2026
* Add LLMD model server tests and utilities

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix precommit

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

* fix: break long line in utils.py to meet flake8 line length requirement

* fix gateway thing

Signed-off-by: Milind Waykole <mwaykole@redhat.com>

---------

Signed-off-by: Milind Waykole <mwaykole@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants