Skip to content

test: add openai compatible tests for responses, vector stores and files#582

Merged
dbasunag merged 11 commits intoopendatahub-io:mainfrom
Ygnas:openai-compatible-tests
Sep 18, 2025
Merged

test: add openai compatible tests for responses, vector stores and files#582
dbasunag merged 11 commits intoopendatahub-io:mainfrom
Ygnas:openai-compatible-tests

Conversation

@Ygnas
Copy link
Copy Markdown
Contributor

@Ygnas Ygnas commented Sep 4, 2025

  • Add openai compatible tests for responses, vector stores and files)
  • Add new fixture parameter:
    • llama_stack_storage_size to configure storage size

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • Configurable storage size for server startup.
    • Built-in vector store provisioning with automatic TorchTune document ingestion.
    • Unified model metadata container for consistent model and embedding selection.
    • API response validation utility for keyword-based checks.
  • Tests

    • Expanded RAG coverage: embeddings, vector I/O, simple/full responses, and vector-store search.
    • Standardized TorchTune expectations and centralized validation flows.
    • New helpers to create response functions using vector search and parameterized server configs.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 4, 2025

📝 Walkthrough

Walkthrough

Refactors test server config to read params and optionally include storage; adds ModelInfo and vector-store fixtures with document ingestion; extends RAG tests to use ModelInfo and responses API; adds TorchTune test expectations and API-response validation utilities plus helpers to build response functions.

Changes

Cohort / File(s) Summary of Changes
Test fixtures & server config
tests/llama_stack/conftest.py
Derives server_config from request params (uses fixture values via request.getfixturevalue), makes storage optional via llama_stack_storage_size, and adds fixtures: llama_stack_models, vector_store, vector_store_with_docs. Adds _download_and_upload_file (with retry) and document ingestion flow.
RAG tests
tests/llama_stack/rag/test_rag.py
Expands parametrization to include llama_stack_server_config; replaces runtime model discovery with llama_stack_models: ModelInfo; adds tests test_rag_simple_responses, test_rag_full_responses, test_rag_vector_store_search; integrates TorchTune expectations and centralized validation utilities.
RAG utilities
utilities/rag_utils.py
Adds ModelInfo NamedTuple and TypedDicts (TurnExpectation, TurnResult, ValidationSummary, ValidationResult); implements validate_api_responses to invoke a response function across test turns and produce per-turn results and a summary.
Test constants
tests/llama_stack/constants.py
Adds TorchTuneTestExpectation dataclass and TORCHTUNE_TEST_EXPECTATIONS list with five predefined test cases (questions, expected keywords, descriptions).
Core test param update
tests/llama_stack/core/test_llamastack_core.py
Adds llama_stack_storage_size: "10Gi" to the pytest server config parameter set.
Test helpers
tests/llama_stack/utils.py
Adds get_torchtune_test_expectations() to convert constants into TurnExpectation dicts and create_response_function(...) to build a responses API callable using llama_stack_models and a VectorStore.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary change: adding OpenAI-compatible tests that cover responses, vector stores, and files; it is concise, on-topic, and meaningful for a reviewer scanning history. It clearly reflects the PR objectives and matches the changes in tests, fixtures, and utilities described in the diff.
Docstring Coverage ✅ Passed Docstring coverage is 80.95% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 4, 2025

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/verified', '/build-push-pr-image', '/lgtm', '/hold', '/cherry-pick', '/wip'}

@Ygnas Ygnas force-pushed the openai-compatible-tests branch 2 times, most recently from cb6d487 to 9e3a62a Compare September 4, 2025 11:26
@Ygnas Ygnas marked this pull request as ready for review September 4, 2025 11:27
@Ygnas Ygnas requested a review from a team as a code owner September 4, 2025 11:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
tests/llama_stack/conftest.py (1)

90-90: Verify ConfigMap name stays in sync.

userConfig.configMapName hardcodes "rag-llama-stack-config-map". If renamed, startup will break. Consider centralizing this name as a constant used by both fixtures.

tests/llama_stack/rag/test_rag.py (2)

355-360: Align assertion list with message; include 'llm'.

Apply this diff:

-        assert any(
-            answer in content.lower() for answer in ["model", "assistant", "ai", "artificial", "language model"]
-        ), (
-            f"The LLM didn't provide any of the expected answers "
-            f"['model', 'assistant', 'ai', 'llm', 'language model']. Got: {content}"
-        )
+        expected = ["model", "assistant", "ai", "llm", "language model", "artificial"]
+        assert any(token in content.lower() for token in expected), (
+            f"The LLM didn't provide any of the expected answers {expected}. Got: {content}"
+        )

426-428: Prefer xfail over skip to keep the test visible in reports.

Marking as xfail(reason=..., strict=False) keeps signal without hiding regressions when it starts passing.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a49ee47 and 9e3a62a.

📒 Files selected for processing (3)
  • tests/llama_stack/conftest.py (2 hunks)
  • tests/llama_stack/rag/test_rag.py (4 hunks)
  • utilities/rag_utils.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/llama_stack/conftest.py (1)
tests/conftest.py (2)
  • admin_client (68-69)
  • model_namespace (100-120)
tests/llama_stack/rag/test_rag.py (2)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (294-325)
utilities/rag_utils.py (2)
  • TurnExpectation (34-37)
  • validate_api_responses (218-309)
🔇 Additional comments (6)
utilities/rag_utils.py (1)

4-4: Importing Callable is correct and necessary.

tests/llama_stack/conftest.py (3)

13-13: ConfigMap import looks good.


85-85: Confirm CLI entrypoint is valid in the image.

Ensure the container image exposes the llama CLI and that llama stack run /etc/llama-stack/run.yaml is the correct invocation.


265-265: Fixture dependency wiring is correct.

tests/llama_stack/rag/test_rag.py (2)

1-1: Adding io is appropriate for in-memory uploads.


67-114: Expectations helper reads well and matches validator usage.

@Ygnas Ygnas force-pushed the openai-compatible-tests branch 2 times, most recently from abc79d7 to 9f3b267 Compare September 4, 2025 12:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/llama_stack/rag/test_rag.py (1)

390-402: Guard against None from responses API (prevents crash in validate_api_responses).

response.output_text can be None; validate_api_responses calls .lower(), which will crash. Return a safe string.

-            return response.output_text
+            return response.output_text or ""

Optional: define _response_fn after model_id and vector_store are created for clarity.

🧹 Nitpick comments (5)
tests/llama_stack/rag/test_rag.py (5)

63-72: Retries/timeouts look good; add jitter and surface the URL on failure.

Minor resilience/readability tweak: include the URL in the raised error and add a small backoff jitter to reduce thundering herd in CI.

-            for attempt in range(3):
+            for attempt in range(3):
                 try:
                     response = requests.get(url, timeout=30)
                     response.raise_for_status()
                     downloaded_files.append((file_name, response.content))
                     break
-                except requests.exceptions.RequestException:
-                    if attempt == 2:
-                        raise
-                    time.sleep(1)
+                except requests.exceptions.RequestException as e:
+                    if attempt == 2:
+                        raise RuntimeError(f"Failed to download {url} after {attempt + 1} attempts") from e
+                    time.sleep(1 + 0.2 * attempt)

101-111: Reduce brittleness of QAT expectations (hyphenation varies).

Docs may spell “Quantization-Aware Training” without the hyphen. Consider adding token-level fallbacks to avoid false negatives.

         {
             "question": "How can I optimize model training for quantization?",
             "expected_keywords": [
-                "Quantization-Aware Training",
+                "Quantization-Aware Training",
+                "quantization aware training",
                 "QAT",
                 "training",
                 "fine-tuning",
-                "fake",
-                "quantized",
+                "fake",
+                "quantized",
+                "quantization",
             ],
             "description": "Should provide information about QAT (Quantization-Aware Training)",
         },

353-369: Fix mismatch in assertion vs. error message (“llm” missing from check).

Include "llm" in the allowed tokens to match the failure message.

-        assert any(
-            answer in content.lower() for answer in ["model", "assistant", "ai", "artificial", "language model"]
-        ), (
-            f"The LLM didn't provide any of the expected answers "
-            f"['model', 'assistant', 'ai', 'llm', 'language model']. Got: {content}"
-        )
+        assert any(
+            answer in content.lower()
+            for answer in ["model", "assistant", "ai", "artificial", "language model", "llm"]
+        ), (
+            "The LLM didn't provide any of the expected answers "
+            "['model', 'assistant', 'ai', 'llm', 'language model']."
+            f" Got: {content}"
+        )

371-379: Make capability assertion tolerant to singular/plural.

Allow “answer” or “answers” to avoid brittle failures.

-        assert "answer" in content, "The LLM didn't provide the expected answer to the prompt"
+        assert any(t in content.lower() for t in ("answer", "answers")), \
+            "The LLM didn't provide the expected answer to the prompt"

461-463: Mirror the BytesIO compatibility tweak here (even though skipped).

Use positional initializer for broader interpreter compatibility and consistency with the change above.

-                file_like = io.BytesIO(initial_bytes=file_content)
+                file_like = io.BytesIO(file_content)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9e3a62a and 9f3b267.

📒 Files selected for processing (3)
  • tests/llama_stack/conftest.py (2 hunks)
  • tests/llama_stack/rag/test_rag.py (4 hunks)
  • utilities/rag_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/llama_stack/conftest.py
  • utilities/rag_utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/llama_stack/rag/test_rag.py (2)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (294-325)
utilities/rag_utils.py (2)
  • TurnExpectation (34-37)
  • validate_api_responses (218-303)
🔇 Additional comments (1)
tests/llama_stack/rag/test_rag.py (1)

415-424: Add wait for vector store indexing to avoid flakiness
File ingestion into vector stores is asynchronous; calling validate_api_responses immediately can yield empty results. Introduce a best-effort wait (e.g. poll vector_stores.files.list and check each item’s status until it’s "completed"/"ready", or use a dedicated client wait/status API if provided) before invoking validation. Confirm that vector_stores.files.list returns a status field (or identify the proper status endpoint) and adjust polling accordingly.

@Ygnas Ygnas force-pushed the openai-compatible-tests branch 3 times, most recently from fed9148 to 06d9487 Compare September 4, 2025 20:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
utilities/rag_utils.py (1)

226-311: Fix type/shape mismatch and None-guard in validate_api_responses.

Returned per-turn keys and summary don’t match TurnResult/ValidationSummary and .lower() will crash if response_fn returns None. Align fields and harden against None.

Apply:

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
     verbose: bool = True,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful = 0
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
-                "found_keywords": found,
-                "missing_keywords": missing,
-                "response": response,
-                "success": success,
+                "found_keywords": found,
+                "missing_keywords": missing,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
+                "success": success,
+                "error": None,
             }
@@
-            all_results.append({
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
-                "success": False,
-                "error": str(e),
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
+                "success": False,
+                "error": str(e),
             })
@@
-    total = len(test_cases)
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    total = len(test_cases)
+    summary: ValidationSummary = {
+        "total_turns": total,
+        "successful_turns": successful,
+        "failed_turns": total - successful,
+        "success_rate": successful / total if total > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-        LOGGER.info(f"Total: {summary['total']}")
-        LOGGER.info(f"Passed: {summary['passed']}")
-        LOGGER.info(f"Failed: {summary['failed']}")
+        LOGGER.info(f"Total turns: {summary['total_turns']}")
+        LOGGER.info(f"Successful: {summary['successful_turns']}")
+        LOGGER.info(f"Failed: {summary['failed_turns']}")
         LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
@@
-    return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
+    return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
tests/llama_stack/conftest.py (1)

205-205: Replace zero‑width default in OTEL service name.

The fallback uses U+200B and is invisible; use an explicit value.

-      service_name: "${env.OTEL_SERVICE_NAME:=\u200b}"
+      service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"
tests/llama_stack/rag/test_rag.py (1)

138-150: Guard against None in responses API.

output_text can be None; returning it unguarded breaks validate_api_responses on .lower().

-            return response.output_text
+            return response.output_text or ""
🧹 Nitpick comments (5)
utilities/rag_utils.py (1)

84-223: Minor: share keyword-match logic to avoid drift.

Consider factoring the common “lower+contains” logic with validate_rag_agent_responses to a helper.

tests/llama_stack/conftest.py (1)

346-352: Guard model selection with clear errors.

next(...) will raise StopIteration; provide explicit assertions for better triage.

-    models = llama_stack_client.models.list()
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
-
-    embedding_model = next(m for m in models if m.api_model_type == "embedding")
-    embedding_dimension = embedding_model.metadata["embedding_dimension"]
+    models = llama_stack_client.models.list()
+    llm = next((m for m in models if m.api_model_type == "llm"), None)
+    assert llm is not None, "No LLM model found from llama_stack_client.models.list()"
+    embedding_model = next((m for m in models if m.api_model_type == "embedding"), None)
+    assert embedding_model is not None, "No embedding model found from llama_stack_client.models.list()"
+    embedding_dimension = embedding_model.metadata.get("embedding_dimension")
+    assert isinstance(embedding_dimension, int), "Embedding model missing integer 'embedding_dimension' metadata"
+    model_id = llm.identifier
tests/llama_stack/rag/test_rag.py (3)

123-126: Tighten return type of factory.

Use typing.Callable for clarity.

-    ) -> callable:
+    ) -> Callable[..., str]:

Remember to import:

-from typing import List
+from typing import List, Callable

41-74: Remove unused helper to avoid drift with fixture.

_download_torchtune_docs duplicates fixture logic and appears unused in this file.

-    def _download_torchtune_docs(self) -> list[tuple[str, bytes]]:
-        """
-        Helper method to download PyTorch TorchTune documentation files.
-        ...
-        return downloaded_files
-```

---

`268-361`: **Consider moving vector DB setup/teardown to a fixture.**

test_rag_build_rag_agent mixes setup/teardown with assertions. A fixture mirroring vector_store would improve reuse and readability.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**💡 Knowledge Base configuration:**

- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 9f3b267a6092f97ed0126c02cf5db756f7e7b731 and 06d9487ed226f71c8f0995283806241db7facb6c.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `tests/llama_stack/conftest.py` (4 hunks)
* `tests/llama_stack/rag/test_rag.py` (10 hunks)
* `utilities/rag_utils.py` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>📚 Learning: 2025-08-21T19:08:14.982Z</summary>

Learnt from: dbasunag
PR: #527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.


**Applied to files:**
- `tests/llama_stack/conftest.py`

</details>
<details>
<summary>📚 Learning: 2025-06-06T12:22:57.057Z</summary>

Learnt from: dbasunag
PR: #338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.


**Applied to files:**
- `tests/llama_stack/rag/test_rag.py`

</details>

</details><details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>tests/llama_stack/conftest.py (3)</summary><blockquote>

<details>
<summary>utilities/rag_utils.py (2)</summary>

* `create_llama_stack_distribution` (20-39)
* `ModelInfo` (11-16)

</details>
<details>
<summary>tests/llama_stack/utils.py (2)</summary>

* `create_llama_stack_distribution` (17-37)
* `wait_for_llama_stack_client_ready` (41-52)

</details>
<details>
<summary>tests/conftest.py (2)</summary>

* `admin_client` (68-69)
* `model_namespace` (100-120)

</details>

</blockquote></details>
<details>
<summary>utilities/rag_utils.py (1)</summary><blockquote>

<details>
<summary>tests/llama_stack/conftest.py (1)</summary>

* `llama_stack_client` (296-327)

</details>

</blockquote></details>
<details>
<summary>tests/llama_stack/rag/test_rag.py (2)</summary><blockquote>

<details>
<summary>tests/llama_stack/conftest.py (4)</summary>

* `llama_stack_client` (296-327)
* `llama_stack_models` (331-352)
* `vector_store` (356-385)
* `vector_store_with_docs` (389-449)

</details>
<details>
<summary>utilities/rag_utils.py (3)</summary>

* `TurnExpectation` (42-45)
* `validate_api_responses` (226-311)
* `validate_rag_agent_responses` (84-223)

</details>

</blockquote></details>

</details>

</details>

<details>
<summary>🔇 Additional comments (3)</summary><blockquote>

<details>
<summary>utilities/rag_utils.py (1)</summary><blockquote>

`11-16`: **ModelInfo looks good.**

Clear, minimal container for model metadata; aligns with downstream fixtures.

</blockquote></details>
<details>
<summary>tests/llama_stack/conftest.py (2)</summary><blockquote>

`87-93`: **ConfigMap-driven run command wiring looks correct.**

Command and userConfig binding to run.yaml via ConfigMap are consistent.

---

`388-449`: **Good: network calls hardened with timeout+retries and resources cleaned up.**

Download loop uses timeouts and retries; vector store cleanup is handled.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@Ygnas Ygnas requested a review from dbasunag September 4, 2025 20:24
@Ygnas Ygnas force-pushed the openai-compatible-tests branch 2 times, most recently from 1909846 to ded8179 Compare September 10, 2025 10:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
utilities/rag_utils.py (1)

226-311: Align result/summary shapes with TurnResult/ValidationSummary and harden against None responses.

Current keys ("response", "total", "passed", "failed") don’t match the TypedDicts and will break consumers. Also .lower() on a None response will crash. Align fields and add lengths/events.

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
     verbose: bool = True,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful = 0
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
-                "success": success,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
+                "success": success,
+                "error": None,
             }
@@
-            all_results.append({
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
                 "success": False,
                 "error": str(e),
             })
-            if verbose:
-                LOGGER.error(f"ERROR: {str(e)}")
+            if verbose:
+                LOGGER.exception("ERROR validating question %s: %s", question, e)
@@
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    summary: ValidationSummary = {
+        "total_turns": total,
+        "successful_turns": successful,
+        "failed_turns": total - successful,
+        "success_rate": successful / total if total > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-        LOGGER.info(f"Total: {summary['total']}")
-        LOGGER.info(f"Passed: {summary['passed']}")
-        LOGGER.info(f"Failed: {summary['failed']}")
-        LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
+        LOGGER.info(f"Total: {summary['total_turns']}")
+        LOGGER.info(f"Passed: {summary['successful_turns']}")
+        LOGGER.info(f"Failed: {summary['failed_turns']}")
+        LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")

Additional: keeping a broad except Exception is acceptable for test utilities, but if you want to satisfy Ruff BLE001, narrow to anticipated exceptions or add # noqa: BLE001. Also prefer logger formatting over f-strings.

tests/llama_stack/conftest.py (1)

100-260: Replace zero‑width space default in OTEL service name.

The invisible U+200B default can confuse tooling. Use an explicit, visible default.

-      service_name: "${env.OTEL_SERVICE_NAME:=\u200b}"
+      service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"
tests/llama_stack/rag/test_rag.py (1)

49-77: Guard against None from responses API and add types to helper.

response.output_text can be None; returning it will break validation on .lower().

-    def _create_response_function(
-        self, llama_stack_client: LlamaStackClient, llama_stack_models, vector_store
-    ) -> callable:
+    def _create_response_function(
+        self, llama_stack_client: LlamaStackClient, llama_stack_models, vector_store
+    ) -> callable:
@@
-        def _response_fn(*, question: str) -> str:
+        def _response_fn(*, question: str) -> str:
             response = llama_stack_client.responses.create(
                 input=question,
                 model=llama_stack_models.model_id,
                 tools=[
@@
-            )
-            return response.output_text
+            )
+            return response.output_text or ""
🧹 Nitpick comments (9)
tests/llama_stack/constants.py (2)

19-25: Make TorchTuneTestExpectation immutable.

Mark the dataclass as frozen to prevent accidental mutation in tests.

-@dataclass
+@dataclass(frozen=True)
 class TorchTuneTestExpectation:

28-67: Consider inlining expectations in the test file for triage simplicity.

Per repo practice (keep parametrization inline), moving these to the test module may ease debugging; or add a brief comment/link here pointing to the tests consuming them.

tests/llama_stack/conftest.py (3)

262-279: Silence unused-fixture warnings without breaking dependency ordering.

These fixtures are intentionally unused to enforce ordering. Assign them to _ to satisfy Ruff without renaming parameters (which would break pytest resolution).

 def llama_stack_distribution(
     admin_client: DynamicClient,
     model_namespace: Namespace,
     enabled_llama_stack_operator: DataScienceCluster,
     llama_stack_server_config: Dict[str, Any],
     llama_stack_config_map: ConfigMap,
 ) -> Generator[LlamaStackDistribution, None, None]:
+    # Ensure fixture dependencies are realized; values are unused by design.
+    _ = (enabled_llama_stack_operator, llama_stack_config_map)

312-329: Log full traceback on port-forward setup failures.

Use LOGGER.exception to capture stack traces.

-    except Exception as e:
-        LOGGER.error(f"Failed to set up port forwarding: {e}")
-        raise
+    except Exception as e:
+        LOGGER.exception("Failed to set up port forwarding: %s", e)
+        raise

387-427: Narrow exception handling and log tracebacks when downloading docs.

Catching generic Exception triggers BLE001. Also prefer exception() to include traceback.

-    except (requests.exceptions.RequestException, Exception) as e:
-        LOGGER.warning(f"Failed to download and upload file {url}: {e}")
-        raise
+    except requests.exceptions.RequestException as e:
+        LOGGER.exception("Failed to download/upload %s: %s", url, e)
+        raise
tests/llama_stack/rag/test_rag.py (4)

168-177: Case-insensitive assertion for robustness.

Normalize content for the identity check to reduce flakiness.

-        content = response.output_message.content
+        content = (response.output_message.content or "").lower()
         assert content is not None, "LLM response content is None"
-        assert "model" in content, "The LLM didn't provide the expected answer to the prompt"
+        assert "model" in content, "The LLM didn't provide the expected answer to the prompt"

340-345: Tweak failure message to match test intent.

This test validates the responses API (not the agent). Minor wording fix.

-        assert validation_result["success"], f"RAG agent validation failed. Summary: {validation_result['summary']}"
+        assert validation_result["success"], f"Responses validation failed. Summary: {validation_result['summary']}"

33-47: Optional: drop the helper and inline parametrization.

To align with repo preference, consider inlining these expectations in the test (or parametrizing with pytest) to avoid context switching.


318-325: Unused fixtures in signature.

minio_pod and minio_data_connection aren’t used. If they’re only for side effects, add _ = (minio_pod, minio_data_connection) at the top of the test; otherwise remove them.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06d9487 and ded8179.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/constants.py (2 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • utilities/rag_utils.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (1)
  • create_llama_stack_distribution (17-37)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (297-328)
tests/llama_stack/rag/test_rag.py (3)
utilities/rag_utils.py (2)
  • TurnExpectation (42-45)
  • validate_api_responses (226-311)
tests/llama_stack/conftest.py (4)
  • llama_stack_client (297-328)
  • llama_stack_models (332-353)
  • vector_store (357-384)
  • vector_store_with_docs (430-460)
tests/conftest.py (2)
  • minio_pod (480-520)
  • minio_data_connection (546-558)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py

266-266: Unused function argument: enabled_llama_stack_operator

(ARG001)


268-268: Unused function argument: llama_stack_config_map

(ARG001)


383-383: Do not catch blind exception: Exception

(BLE001)


422-422: Consider moving this statement to an else block

(TRY300)

utilities/rag_utils.py

281-281: Do not catch blind exception: Exception

(BLE001)


293-293: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


293-293: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tests/llama_stack/rag/test_rag.py

320-320: Unused method argument: minio_pod

(ARG002)


321-321: Unused method argument: minio_data_connection

(ARG002)

🔇 Additional comments (2)
utilities/rag_utils.py (1)

11-17: ModelInfo NamedTuple looks good.

tests/llama_stack/conftest.py (1)

88-94: Confirm CRD schema for userConfig/configMapName and command override.

Ensure this matches the operator/CRD version used in CI; otherwise deployment may ignore the config map or entrypoint.

@Ygnas Ygnas requested a review from adolfo-ab September 10, 2025 11:02
@Ygnas Ygnas force-pushed the openai-compatible-tests branch from ded8179 to b48de0d Compare September 10, 2025 16:08
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
utilities/rag_utils.py (1)

226-312: Align validate_api_responses with TurnResult/ValidationSummary; add response_length/event_count; improve logging

The per-turn/result keys don’t match declared TypedDicts (response vs response_content; missing response_length/event_count), and the summary uses different field names (total/passed vs total_turns/successful_turns). Also prefer LOGGER.exception and guard against None responses. This will otherwise break consumers and type checks.

Apply:

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
     verbose: bool = True,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful_turns = 0
@@
-    for idx, test in enumerate(test_cases, 1):
+    for idx, test in enumerate(test_cases, 1):
         question = test["question"]
         expected_keywords = test["expected_keywords"]
         description = test.get("description", "")
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            found = [kw for kw in expected_keywords if kw.lower() in response_lower]
-            missing = [kw for kw in expected_keywords if kw.lower() not in response_lower]
-            success = len(found) >= min_keywords_required
+            found = [kw for kw in expected_keywords if kw.lower() in response_lower]
+            missing = [kw for kw in expected_keywords if kw.lower() not in response_lower]
+            success = len(found) >= min_keywords_required
@@
-            if success:
-                successful += 1
+            if success:
+                successful_turns += 1
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
-                "success": success,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
+                "success": success,
+                "error": None,
             }
@@
-            all_results.append({
+            LOGGER.exception("ERROR validating question: %s", question)
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
-                "success": False,
-                "error": str(e),
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
+                "success": False,
+                "error": str(e),
             })
@@
-    total = len(test_cases)
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    total_turns = len(test_cases)
+    summary: ValidationSummary = {
+        "total_turns": total_turns,
+        "successful_turns": successful_turns,
+        "failed_turns": total_turns - successful_turns,
+        "success_rate": successful_turns / total_turns if total_turns > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-        LOGGER.info(f"Total: {summary['total']}")
-        LOGGER.info(f"Passed: {summary['passed']}")
-        LOGGER.info(f"Failed: {summary['failed']}")
+        LOGGER.info(f"Total: {summary['total_turns']}")
+        LOGGER.info(f"Passed: {summary['successful_turns']}")
+        LOGGER.info(f"Failed: {summary['failed_turns']}")
         LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
@@
-    return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
+    return cast(ValidationResult, {"success": successful_turns == total_turns, "results": all_results, "summary": summary})
tests/llama_stack/conftest.py (1)

200-210: Replace zero‑width default in telemetry service_name

The U+200B default is invisible and confuses tooling. Use a visible default.

-      service_name: "${env.OTEL_SERVICE_NAME:=\u200b}"
+      service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"
🧹 Nitpick comments (5)
tests/llama_stack/conftest.py (5)

262-269: Silence unused-fixture-arg lints without changing behavior

These deps are intentionally unused to enforce ordering. Assign to _ to satisfy linters.

 def llama_stack_distribution(
@@
-    llama_stack_config_map: ConfigMap,
+    llama_stack_config_map: ConfigMap,
 ) -> Generator[LlamaStackDistribution, None, None]:
+    _ = enabled_llama_stack_operator  # ensure fixture execution (lint: ARG001)
+    _ = llama_stack_config_map        # ensure fixture execution (lint: ARG001)
     with create_llama_stack_distribution(

312-329: Preserve traceback on port-forward failure

Use LOGGER.exception to capture stacktrace.

-    except Exception as e:
-        LOGGER.error(f"Failed to set up port forwarding: {e}")
-        raise
+    except Exception:
+        LOGGER.exception("Failed to set up port forwarding")
+        raise

356-385: Vector store: ensure unique name; verify delete signature

  • Name “test_vector_store” may collide across parallel runs; add a short UUID suffix.
  • Please verify vector_stores.delete signature; some clients expect positional id (delete(vector_store.id)).
-    vector_store = llama_stack_client.vector_stores.create(
-        name="test_vector_store",
+    vector_store = llama_stack_client.vector_stores.create(
+        name=f"test_vector_store_{uuid4().hex[:8]}",
         embedding_model=llama_stack_models.embedding_model.identifier,
         embedding_dimension=llama_stack_models.embedding_dimension,
     )
@@
-        llama_stack_client.vector_stores.delete(id=vector_store.id)
+        llama_stack_client.vector_stores.delete(vector_store.id)
         LOGGER.info(f"Deleted vector store {vector_store.id}")

Add import (top of file):

from uuid import uuid4

387-427: File download/upload: cleaner control flow and better diagnostics

Move the success return into an else, and log with traceback on failures. Optional: replace the “not sure why this is needed” comment with a TODO plus reason or pass a filename/content-type to files.create if supported.

-    try:
+    try:
         response = requests.get(url, timeout=30)
         response.raise_for_status()
@@
-        return True
-
-    except (requests.exceptions.RequestException, Exception) as e:
-        LOGGER.warning(f"Failed to download and upload file {url}: {e}")
-        raise
+    except (requests.exceptions.RequestException, Exception):
+        LOGGER.exception("Failed to download and upload file: %s", url)
+        raise
+    else:
+        return True

446-459: Gracefully skip in disconnected environments

Avoid failures on air‑gapped clusters by skipping when a DISCONNECTED/AIRGAPPED flag is set.

 def vector_store_with_docs(llama_stack_client: LlamaStackClient, vector_store: Any) -> Generator[Any, None, None]:
@@
-    # Download TorchTune documentation files
+    # Skip when running in disconnected environments
+    if os.getenv("DISCONNECTED", "").lower() in {"1", "true", "yes"} or os.getenv("AIRGAPPED", "").lower() in {"1", "true", "yes"}:
+        pytest.skip("Skipping doc download in disconnected environment")
+
+    # Download TorchTune documentation files
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ded8179 and b48de0d.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/constants.py (2 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • utilities/rag_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/llama_stack/constants.py
  • tests/llama_stack/rag/test_rag.py
🧰 Additional context used
🧬 Code graph analysis (2)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (297-328)
tests/llama_stack/conftest.py (3)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (2)
  • create_llama_stack_distribution (17-37)
  • wait_for_llama_stack_client_ready (41-52)
utilities/constants.py (1)
  • Timeout (213-224)
🪛 Ruff (0.12.2)
utilities/rag_utils.py

281-281: Do not catch blind exception: Exception

(BLE001)


293-293: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


293-293: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tests/llama_stack/conftest.py

266-266: Unused function argument: enabled_llama_stack_operator

(ARG001)


268-268: Unused function argument: llama_stack_config_map

(ARG001)


383-383: Do not catch blind exception: Exception

(BLE001)


422-422: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (1)
utilities/rag_utils.py (1)

11-16: ModelInfo NamedTuple: LGTM

Clear, minimal container for model metadata.

@Ygnas Ygnas force-pushed the openai-compatible-tests branch 2 times, most recently from f51d180 to 70de5d8 Compare September 11, 2025 12:12
@Ygnas Ygnas requested a review from dbasunag September 17, 2025 09:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/llama_stack/rag/test_rag.py (1)

135-159: Avoid None dereference; normalize content and compare lowercased.

Second assertion calls .lower() before None check; simplify both to be None-safe.

-        content = response.output_message.content
-        assert content is not None, "LLM response content is None"
-        assert "model" in content, "The LLM didn't provide the expected answer to the prompt"
+        content = response.output_message.content or ""
+        assert "model" in content.lower(), "The LLM didn't provide the expected answer to the prompt"
@@
-        content = response.output_message.content.lower()
-        assert content is not None, "LLM response content is None"
-        assert "answer" in content, "The LLM didn't provide the expected answer to the prompt"
+        content = response.output_message.content or ""
+        assert "answer" in content.lower(), "The LLM didn't provide the expected answer to the prompt"
♻️ Duplicate comments (5)
tests/llama_stack/conftest.py (4)

317-349: Consider skipping on disconnected clusters.

This fixture does network I/O; mark with a custom marker/skip if offline to reduce CI flakes.


215-238: Return embedding model identifier to match ModelInfo typing.

Fixture returns an embedding model object but ModelInfo.embedding_model is str. Also add robust checks (existing review suggested this earlier).

-    models = llama_stack_client.models.list()
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
-
-    embedding_model = next(m for m in models if m.api_model_type == "embedding")
-    embedding_dimension = embedding_model.metadata["embedding_dimension"]
-
-    return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension)
+    models = llama_stack_client.models.list()
+    llm = next((m for m in models if m.api_model_type == "llm"), None)
+    if not llm:
+        raise ValueError("No LLM model found in LlamaStackClient.models.list()")
+    emb = next((m for m in models if m.api_model_type == "embedding"), None)
+    if not emb:
+        raise ValueError("No embedding model found in LlamaStackClient.models.list()")
+    md = getattr(emb, "metadata", {}) or {}
+    if "embedding_dimension" not in md:
+        raise ValueError("Embedding model metadata missing 'embedding_dimension'")
+    return ModelInfo(model_id=llm.identifier, embedding_model=emb.identifier, embedding_dimension=md["embedding_dimension"])

240-271: Stop using .identifier at call sites; pass string embedding model id.

This removes type: ignore and matches ModelInfo.

     vector_store = llama_stack_client.vector_stores.create(
         name="test_vector_store",
-        embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+        embedding_model=llama_stack_models.embedding_model,
         embedding_dimension=llama_stack_models.embedding_dimension,
     )

126-145: Fix unbound variable in llama_stack_user_config_configmap; add clearer FileNotFound error.

llama_stack_user_config_configmap_path can be referenced before assignment when unparameterized.

-    if hasattr(request, "param"):
-        if request.param.get("llama_stack_user_config_configmap_path"):
-            llama_stack_user_config_configmap_path = request.param.get("llama_stack_user_config_configmap_path")
-
-    assert llama_stack_user_config_configmap_path is not None, (
+    params = getattr(request, "param", {}) or {}
+    llama_stack_user_config_configmap_path = params.get("llama_stack_user_config_configmap_path")
+    assert llama_stack_user_config_configmap_path, (
         "Fixture parameter llama_stack_user_config_configmap_path is required"
     )
     config_path = Path(__file__).parent / llama_stack_user_config_configmap_path
-
-    with open(config_path, "r") as f:
-        llama_stack_userconfig_configmap_data = f.read()
+    try:
+        with open(config_path, "r") as f:
+            llama_stack_userconfig_configmap_data = f.read()
+    except FileNotFoundError:
+        raise FileNotFoundError(f"User-config ConfigMap file not found at: {config_path}")
utilities/rag_utils.py (1)

226-305: Align validate_api_responses to TurnResult/ValidationSummary shapes; guard None; improve logging.

Results use response instead of response_content and the summary keys don’t match the public types. Also .lower() on None can crash; and errors should use LOGGER.exception.

Apply this diff:

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful = 0
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
                 "success": success,
+                "error": None,
             }
@@
-        except Exception as e:
-            all_results.append({
+        except Exception as e:
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
                 "success": False,
                 "error": str(e),
             })
-            LOGGER.error(f"[{idx}] ERROR: {str(e)}")
+            LOGGER.exception(f"[{idx}] ERROR while validating API response for: {question}")
@@
-    total = len(test_cases)
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    total = len(test_cases)
+    summary: ValidationSummary = {
+        "total_turns": total,
+        "successful_turns": successful,
+        "failed_turns": total - successful,
+        "success_rate": successful / total if total > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-    LOGGER.info(f"Total: {summary['total']}")
-    LOGGER.info(f"Passed: {summary['passed']}")
-    LOGGER.info(f"Failed: {summary['failed']}")
-    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
+    LOGGER.info(f"Total turns: {summary['total_turns']}")
+    LOGGER.info(f"Successful: {summary['successful_turns']}")
+    LOGGER.info(f"Failed: {summary['failed_turns']}")
+    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
@@
-    return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary})
+    return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
🧹 Nitpick comments (4)
tests/llama_stack/utils.py (1)

75-102: Harden response function: type hint and None-safe return.

Return type should reflect a string-producing callable, and guard output_text being None.

-def create_response_function(
-    llama_stack_client: LlamaStackClient, llama_stack_models: ModelInfo, vector_store: VectorStore
-) -> Callable:
+def create_response_function(
+    llama_stack_client: LlamaStackClient, llama_stack_models: ModelInfo, vector_store: VectorStore
+) -> Callable[..., str]:
@@
-    def _response_fn(*, question: str) -> str:
+    def _response_fn(*, question: str) -> str:
@@
-        return response.output_text
+        return response.output_text or ""
tests/llama_stack/conftest.py (2)

180-213: Log full traceback when port-forward fails.

Use LOGGER.exception in except to aid debugging.

-    except Exception as e:
-        LOGGER.error(f"Failed to set up port forwarding: {e}")
+    except Exception:
+        LOGGER.exception("Failed to set up port forwarding")
         raise

273-315: Make temp-file upload cross‑platform and improve error logging.

NamedTemporaryFile reopened on Windows can fail; use delete=False and unlink. Also log traceback on failures and tighten type for vector_store.

-@retry(
+@retry(
     wait_timeout=Timeout.TIMEOUT_1MIN,
     sleep=5,
     exceptions_dict={requests.exceptions.RequestException: [], Exception: []},
 )
-def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: Any) -> bool:
+def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: VectorStore) -> bool:
@@
-        with tempfile.NamedTemporaryFile(mode="wb", suffix=f"_{local_file_name}") as temp_file:
-            temp_file.write(response.content)
-            temp_file_path = temp_file.name
-
-            # Upload saved file to LlamaStack
-            with open(temp_file_path, "rb") as file_to_upload:
-                uploaded_file = llama_stack_client.files.create(file=file_to_upload, purpose="assistants")
-
-            # Add file to vector store
-            llama_stack_client.vector_stores.files.create(vector_store_id=vector_store.id, file_id=uploaded_file.id)
-
-        return True
+        with tempfile.NamedTemporaryFile(mode="wb", suffix=f"_{local_file_name}", delete=False) as temp_file:
+            temp_file.write(response.content)
+            temp_file_path = temp_file.name
+        # Upload saved file to LlamaStack
+        with open(temp_file_path, "rb") as file_to_upload:
+            uploaded_file = llama_stack_client.files.create(file=file_to_upload, purpose="assistants")
+        # Add file to vector store
+        llama_stack_client.vector_stores.files.create(vector_store_id=vector_store.id, file_id=uploaded_file.id)
+        os.remove(temp_file_path)
+        return True
@@
-    except (requests.exceptions.RequestException, Exception) as e:
-        LOGGER.warning(f"Failed to download and upload file {url}: {e}")
+    except (requests.exceptions.RequestException, Exception):
+        LOGGER.exception(f"Failed to download and upload file {url}")
         raise
tests/llama_stack/rag/test_rag.py (1)

36-43: Silence linter about unused fixture parameter llama_stack_user_config_configmap.

It’s intentionally present to ensure setup; add a no-op delete at the top of each test body.

 def test_rag_inference_embeddings(
@@
 ) -> None:
+        # Fixture is intentionally unused, ensures configmap exists
+        del llama_stack_user_config_configmap

Repeat the same two-line snippet at the top of the other tests that include this fixture.

Also applies to: 60-65, 120-125, 161-166, 257-263, 289-296, 318-323

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 753c08b and 19d7ef6.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • tests/llama_stack/utils.py (2 hunks)
  • utilities/rag_utils.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.

Applied to files:

  • tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (4)
tests/llama_stack/utils.py (2)
tests/llama_stack/conftest.py (4)
  • llama_stack_client (181-212)
  • vector_store (241-270)
  • llama_stack_distribution (148-162)
  • llama_stack_models (216-237)
utilities/rag_utils.py (2)
  • TurnExpectation (42-45)
  • ModelInfo (11-16)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (181-212)
tests/llama_stack/conftest.py (4)
tests/llama_stack/utils.py (2)
  • create_llama_stack_distribution (20-40)
  • wait_for_llama_stack_client_ready (44-55)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
utilities/constants.py (1)
  • Timeout (213-224)
tests/conftest.py (2)
  • admin_client (68-69)
  • model_namespace (100-120)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (5)
  • llama_stack_client (181-212)
  • vector_store (241-270)
  • llama_stack_user_config_configmap (110-144)
  • llama_stack_models (216-237)
  • vector_store_with_docs (318-348)
utilities/rag_utils.py (3)
  • validate_rag_agent_responses (84-223)
  • validate_api_responses (226-305)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (3)
  • get_torchtune_test_expectations (58-72)
  • create_response_function (75-103)
  • _response_fn (90-101)
🪛 Ruff (0.12.2)
utilities/rag_utils.py

277-277: Do not catch blind exception: Exception

(BLE001)


288-288: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


288-288: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tests/llama_stack/conftest.py

269-269: Do not catch blind exception: Exception

(BLE001)


310-310: Consider moving this statement to an else block

(TRY300)

tests/llama_stack/rag/test_rag.py

40-40: Unused method argument: llama_stack_user_config_configmap

(ARG002)


63-63: Unused method argument: llama_stack_user_config_configmap

(ARG002)


123-123: Unused method argument: llama_stack_user_config_configmap

(ARG002)


164-164: Unused method argument: llama_stack_user_config_configmap

(ARG002)


261-261: Unused method argument: llama_stack_user_config_configmap

(ARG002)


293-293: Unused method argument: llama_stack_user_config_configmap

(ARG002)


321-321: Unused method argument: llama_stack_user_config_configmap

(ARG002)

🔇 Additional comments (4)
tests/llama_stack/utils.py (1)

58-73: LGTM on expectation mapping.

Clean conversion to TurnExpectation dicts.

tests/llama_stack/conftest.py (2)

53-63: LGTM on safe params override.

Using params = getattr(..., {}) avoids AttributeError on unparameterized fixtures.


96-105: LGTM on optional storage/userConfig wiring.

Nice gating behind params; command override is clear.

tests/llama_stack/rag/test_rag.py (1)

305-316: LGTM on responses+vector store validation flow.

Using create_response_function and validate_api_responses is clean. Will pass once shapes in rag_utils are aligned and None is guarded.

Comment on lines 77 to 87
try:
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=embedding_model.identifier,
embedding_dimension=embedding_dimension, # type: ignore
embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore
provider_id="milvus",
)

# Calculate embeddings
embeddings_response = llama_stack_client.inference.embeddings(
model_id=embedding_model.identifier,
model_id=llama_stack_models.embedding_model.identifier, # type: ignore
contents=["First chunk of text"],
output_dimension=embedding_dimension, # type: ignore
output_dimension=llama_stack_models.embedding_dimension, # type: ignore
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix vector DB register/embedding calls to use string embedding model id.

Prevents attribute errors and removes ignores.

             llama_stack_client.vector_dbs.register(
                 vector_db_id=vector_db_id,
-                embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-                embedding_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+                embedding_model=llama_stack_models.embedding_model,
+                embedding_dimension=llama_stack_models.embedding_dimension,
                 provider_id="milvus",
             )
@@
             embeddings_response = llama_stack_client.inference.embeddings(
-                model_id=llama_stack_models.embedding_model.identifier,  # type: ignore
+                model_id=llama_stack_models.embedding_model,
                 contents=["First chunk of text"],
-                output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+                output_dimension=llama_stack_models.embedding_dimension,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=embedding_model.identifier,
embedding_dimension=embedding_dimension, # type: ignore
embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore
provider_id="milvus",
)
# Calculate embeddings
embeddings_response = llama_stack_client.inference.embeddings(
model_id=embedding_model.identifier,
model_id=llama_stack_models.embedding_model.identifier, # type: ignore
contents=["First chunk of text"],
output_dimension=embedding_dimension, # type: ignore
output_dimension=llama_stack_models.embedding_dimension, # type: ignore
)
try:
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=llama_stack_models.embedding_model,
embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)
# Calculate embeddings
embeddings_response = llama_stack_client.inference.embeddings(
model_id=llama_stack_models.embedding_model,
contents=["First chunk of text"],
output_dimension=llama_stack_models.embedding_dimension,
)
🤖 Prompt for AI Agents
In tests/llama_stack/rag/test_rag.py around lines 77 to 91, the vector DB
register and embeddings calls are passing a non-string attribute (e.g.
.identifier) with type ignores; replace those with the string embedding model id
instead. Specifically, pass llama_stack_models.embedding_model (the string id)
to vector_dbs.register as embedding_model and to inference.embeddings as
model_id, keep embedding_dimension as-is, and remove the type: ignore comments.

llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=embedding_model.identifier,
embedding_dimension=embedding_dimension,
embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use string embedding model id for RAG DB registration.

         llama_stack_client.vector_dbs.register(
             vector_db_id=vector_db_id,
-            embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-            embedding_dimension=llama_stack_models.embedding_dimension,
+            embedding_model=llama_stack_models.embedding_model,
+            embedding_dimension=llama_stack_models.embedding_dimension,
             provider_id="milvus",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=embedding_model.identifier,
embedding_dimension=embedding_dimension,
embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
embedding_model=llama_stack_models.embedding_model,
embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)
🤖 Prompt for AI Agents
In tests/llama_stack/rag/test_rag.py around lines 180 to 186, the RAG DB
registration must receive a string ID for the embedding model rather than an
object; change the embedding_model argument to pass the embedding model's string
identifier (for example llama_stack_models.embedding_model.identifier or
str(llama_stack_models.embedding_model.identifier)), remove the unnecessary "#
type: ignore" and ensure the value is a plain str so the vector DB receives the
model id.

@Ygnas Ygnas force-pushed the openai-compatible-tests branch 2 times, most recently from d627a19 to d837947 Compare September 17, 2025 14:51
@Ygnas Ygnas force-pushed the openai-compatible-tests branch from d837947 to 159f222 Compare September 17, 2025 15:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/llama_stack/rag/test_rag.py (1)

1-352: Sweep: remove .identifier dereferences for embedding models (fix ModelInfo usage).

  • tests/llama_stack/conftest.py:254 — change embedding_model=llama_stack_models.embedding_model.identifier → embedding_model=llama_stack_models.embedding_model
  • tests/llama_stack/rag/test_rag.py:47, 76, 83, 176 — remove .identifier from llama_stack_models.embedding_model in model_id/embedding_model args
  • tests/llama_stack/core/test_llamastack_core.py:42 — change embedding_model_id = embedding_model.identifier → embedding_model_id = embedding_model

Re-run the ripgrep check to confirm zero matches.

♻️ Duplicate comments (7)
utilities/rag_utils.py (1)

226-305: Align validate_api_responses shapes with TurnResult/ValidationSummary; guard None; improve logging.

  • Per‑turn keys differ from TurnResult (response vs response_content, missing response_length/event_count/error).
  • Summary keys differ from ValidationSummary.
  • .lower() on a None response will crash; guard it.
  • Prefer LOGGER.exception to retain stack traces.

Apply this diff:

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful = 0
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
                 "success": success,
+                "error": None,
             }
@@
-            all_results.append({
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
-                "success": False,
-                "error": str(e),
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
+                "success": False,
+                "error": str(e),
             })
-            LOGGER.error(f"[{idx}] ERROR: {str(e)}")
+            LOGGER.exception(f"[{idx}] ERROR while validating API response")
@@
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    summary: ValidationSummary = {
+        "total_turns": total,
+        "successful_turns": successful,
+        "failed_turns": total - successful,
+        "success_rate": successful / total if total > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-    LOGGER.info(f"Total: {summary['total']}")
-    LOGGER.info(f"Passed: {summary['passed']}")
-    LOGGER.info(f"Failed: {summary['failed']}")
-    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
+    LOGGER.info(f"Total: {summary['total_turns']}")
+    LOGGER.info(f"Passed: {summary['successful_turns']}")
+    LOGGER.info(f"Failed: {summary['failed_turns']}")
+    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
@@
-    return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary})
+    return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
tests/llama_stack/conftest.py (3)

120-131: Fix unbound variable in ConfigMap fixture and simplify param access.

llama_stack_user_config_configmap_path can be referenced before assignment when the fixture is used without params.

-    if hasattr(request, "param"):
-        if request.param.get("llama_stack_user_config_configmap_path"):
-            llama_stack_user_config_configmap_path = request.param.get("llama_stack_user_config_configmap_path")
-
-    assert llama_stack_user_config_configmap_path is not None, (
+    params = getattr(request, "param", {}) or {}
+    llama_stack_user_config_configmap_path = params.get("llama_stack_user_config_configmap_path")
+    assert llama_stack_user_config_configmap_path, (
         "Fixture parameter llama_stack_user_config_configmap_path is required"
     )
     config_path = Path(__file__).parent / llama_stack_user_config_configmap_path
 
-    with open(config_path, "r") as f:
-        llama_stack_userconfig_configmap_data = f.read()
+    llama_stack_userconfig_configmap_data = Path(config_path).read_text()

209-232: Return string ids in ModelInfo and harden model discovery.

Align with utilities.rag_utils.ModelInfo (embedding_model: str) and avoid StopIteration/KeyError.

 def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo:
@@
-    models = llama_stack_client.models.list()
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
-
-    embedding_model = next(m for m in models if m.api_model_type == "embedding")
-    embedding_dimension = embedding_model.metadata["embedding_dimension"]
-
-    return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension)
+    models = llama_stack_client.models.list()
+    llm = next((m for m in models if m.api_model_type == "llm"), None)
+    if not llm:
+        raise ValueError("No LLM model found in LlamaStackClient.models.list()")
+    emb = next((m for m in models if m.api_model_type == "embedding"), None)
+    if not emb:
+        raise ValueError("No embedding model found in LlamaStackClient.models.list()")
+    md = getattr(emb, "metadata", {}) or {}
+    if "embedding_dimension" not in md:
+        raise ValueError("Embedding model metadata missing 'embedding_dimension'")
+    return ModelInfo(model_id=llm.identifier, embedding_model=emb.identifier, embedding_dimension=md["embedding_dimension"])

251-256: Pass the string embedding model id; remove type ignore.

     vector_store = llama_stack_client.vector_stores.create(
         name="test_vector_store",
-        embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+        embedding_model=llama_stack_models.embedding_model,
         embedding_dimension=llama_stack_models.embedding_dimension,
     )
tests/llama_stack/rag/test_rag.py (3)

47-50: Use string embedding model id and remove type ignore.

-        embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=llama_stack_models.embedding_model.identifier,
-            contents=["First chunk of text"],
-            output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
-        )
+        embeddings_response = llama_stack_client.inference.embeddings(
+            model_id=llama_stack_models.embedding_model,
+            contents=["First chunk of text"],
+            output_dimension=llama_stack_models.embedding_dimension,
+        )

74-86: Fix vector DB register and embeddings calls to use string id.

         llama_stack_client.vector_dbs.register(
             vector_db_id=vector_db_id,
-            embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-            embedding_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+            embedding_model=llama_stack_models.embedding_model,
+            embedding_dimension=llama_stack_models.embedding_dimension,
             provider_id="milvus",
         )
@@
-        embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=llama_stack_models.embedding_model.identifier,  # type: ignore
+        embeddings_response = llama_stack_client.inference.embeddings(
+            model_id=llama_stack_models.embedding_model,
             contents=["First chunk of text"],
-            output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+            output_dimension=llama_stack_models.embedding_dimension,
         )

174-178: Pass string embedding model id for RAG DB registration.

         llama_stack_client.vector_dbs.register(
             vector_db_id=vector_db_id,
-            embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-            embedding_dimension=llama_stack_models.embedding_dimension,
+            embedding_model=llama_stack_models.embedding_model,
+            embedding_dimension=llama_stack_models.embedding_dimension,
             provider_id="milvus",
         )
🧹 Nitpick comments (5)
tests/llama_stack/conftest.py (4)

96-101: Wire in user-config support and keep unparameterized-safe access.

PR description mentions llama_stack_user_config_enabled and ConfigMap usage, but server_config doesn’t act on it. Add optional userConfig and command wiring when the param is set.

     if params.get("llama_stack_storage_size"):
         storage_size = params.get("llama_stack_storage_size")
         server_config["storage"] = {"size": storage_size}
 
+    # Optional: enable custom user config via ConfigMap
+    if params.get("llama_stack_user_config_enabled"):
+        # Ensure ConfigMap exists (triggers the fixture)
+        request.getfixturevalue("llama_stack_user_config_configmap")
+        server_config["userConfig"] = {"configMapName": "llama-stack-user-config"}
+        # Ensure the container runs with explicit run config
+        server_config["containerSpec"].setdefault("command", ["/bin/sh", "-c", "llama stack run /etc/llama-stack/run.yaml"])
+
     return server_config

Please confirm the run.yaml path is correct for your image.


260-265: Avoid blind except; include traceback.

Catching bare Exception hides actionable details. If you need the broad catch for cleanup, at least log the stack trace.

-    except Exception as e:
-        LOGGER.warning(f"Failed to delete vector store {vector_store.id}: {e}")
+    except Exception:
+        LOGGER.warning(f"Failed to delete vector store {vector_store.id}", exc_info=True)  # noqa: BLE001

267-271: Retry only network errors and log with traceback.

Broadened retries on all Exception can mask logic/file errors. Limit retries to RequestException and log properly.

 @retry(
     wait_timeout=Timeout.TIMEOUT_1MIN,
     sleep=5,
-    exceptions_dict={requests.exceptions.RequestException: [], Exception: []},
+    exceptions_dict={requests.exceptions.RequestException: []},
 )
 def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: Any) -> bool:
@@
-    except (requests.exceptions.RequestException, Exception) as e:
-        LOGGER.warning(f"Failed to download and upload file {url}: {e}")
-        raise
+    except requests.exceptions.RequestException:
+        LOGGER.warning(f"Failed to download and upload file {url}", exc_info=True)
+        raise

Also applies to: 306-308


190-206: Use LOGGER.exception to preserve traceback on port-forward setup failure.

-    except Exception as e:
-        LOGGER.error(f"Failed to set up port forwarding: {e}")
+    except Exception:
+        LOGGER.exception("Failed to set up port forwarding")
         raise
tests/llama_stack/rag/test_rag.py (1)

17-22: Avoid no-op storage size param in smoke param set.

An empty string makes the new llama_stack_storage_size ineffective; either drop the key or set a real value (e.g., "10Gi") if you need to exercise the code path.

         pytest.param(
             {"name": "test-llamastack-rag"},
-            {"llama_stack_storage_size": ""},
+            {},  # no storage override in smoke run
         ),
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19d7ef6 and d837947.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • tests/llama_stack/utils.py (2 hunks)
  • utilities/rag_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.

Applied to files:

  • tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (4)
tests/llama_stack/utils.py (2)
  • create_llama_stack_distribution (20-40)
  • wait_for_llama_stack_client_ready (44-55)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
utilities/constants.py (1)
  • Timeout (213-224)
tests/conftest.py (2)
  • admin_client (68-69)
  • model_namespace (100-120)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
  • llama_stack_client (175-206)
  • vector_store (235-264)
  • llama_stack_models (210-231)
  • vector_store_with_docs (312-342)
utilities/rag_utils.py (2)
  • validate_api_responses (226-305)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (3)
  • get_torchtune_test_expectations (58-72)
  • create_response_function (75-103)
  • _response_fn (90-101)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (175-206)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py

263-263: Do not catch blind exception: Exception

(BLE001)


304-304: Consider moving this statement to an else block

(TRY300)

utilities/rag_utils.py

277-277: Do not catch blind exception: Exception

(BLE001)


288-288: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


288-288: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (6)
tests/llama_stack/rag/test_rag.py (5)

130-132: LGTM — agent creation uses ModelInfo.model_id correctly.


151-153: Assert before lowercasing to avoid potential NoneType errors.

Reorder to check for None before calling .lower().

-        content = response.output_message.content.lower()
-        assert content is not None, "LLM response content is None"
+        assert response.output_message.content is not None, "LLM response content is None"
+        content = response.output_message.content.lower()

[raise_nitpick_refactor]


251-281: LGTM — simple responses smoke covers the endpoint well.

Assertions are minimal and appropriate for smoke.


297-307: Minor: adjust assertion message and rely on aligned result shape.

Change the failure message to reflect API validation (not agent), and after aligning validate_api_responses, the summary keys change; keeping generic Summary: {validation_result['summary']} is fine.

-        assert validation_result["success"], f"RAG agent validation failed. Summary: {validation_result['summary']}"
+        assert validation_result["success"], f"API responses validation failed. Summary: {validation_result['summary']}"

[raise_nitpick_refactor]


331-351: LGTM — vector store search validations are solid.

Covers shape and content presence without over-constraining.

utilities/rag_utils.py (1)

11-16: Update callers/tests: ModelInfo.embedding_model is now a str

Downstream code treats embedding_model as an object with .identifier — change callers to pass the string id (e.g., m.identifier) and remove any .identifier dereferences.

Check/modify: tests/llama_stack/conftest.py (llama_stack_models, vector_store) and tests/llama_stack/rag/test_rag.py.

Open a follow-up PR to update those tests?

@Ygnas Ygnas requested review from adolfo-ab and jgarciao September 17, 2025 15:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
tests/llama_stack/conftest.py (4)

169-191: Make model discovery robust and guard against missing models/metadata.

The function uses next() without defaults and direct metadata access, which can raise StopIteration or KeyError if models are missing or metadata is incomplete.

Apply this diff to add proper error handling:

 def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo:
     """
     Returns model information from the LlamaStack client.
@@ -183,10 +183,18 @@ def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo:
         ModelInfo: NamedTuple containing model information
     """
     models = llama_stack_client.models.list()
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
+    llm = next((m for m in models if m.api_model_type == "llm"), None)
+    if not llm:
+        raise ValueError("No LLM model found in LlamaStackClient.models.list()")
 
-    embedding_model = next(m for m in models if m.api_model_type == "embedding")
-    embedding_dimension = embedding_model.metadata["embedding_dimension"]
+    embedding_model = next((m for m in models if m.api_model_type == "embedding"), None)
+    if not embedding_model:
+        raise ValueError("No embedding model found in LlamaStackClient.models.list()")
+    
+    md = getattr(embedding_model, "metadata", {}) or {}
+    if "embedding_dimension" not in md:
+        raise ValueError("Embedding model metadata missing 'embedding_dimension'")
 
-    return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension)
+    return ModelInfo(
+        model_id=llm.identifier, 
+        embedding_model=embedding_model.identifier, 
+        embedding_dimension=md["embedding_dimension"]
+    )

214-216: Fix embedding model parameter - use string identifier not object.

Based on the ModelInfo typing in utilities/rag_utils.py, embedding_model should be a string, but you're passing an object with .identifier and using # type: ignore.

Apply this diff to use the string identifier:

     vector_store = llama_stack_client.vector_stores.create(
         name="test_vector_store",
-        embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+        embedding_model=llama_stack_models.embedding_model,
         embedding_dimension=llama_stack_models.embedding_dimension,
     )

221-221: Use correct parameter name for vector store deletion.

The llama_stack_client.vector_stores.delete method expects vector_store_id instead of id according to the OpenAPI specification.

Apply this diff to fix the parameter name:

-        llama_stack_client.vector_stores.delete(vector_store_id=vector_store.id)
+        llama_stack_client.vector_stores.delete(vector_store_id=vector_store.id)

Note: The current code actually already uses vector_store_id=, so this is already correct. Good implementation!


272-302: Verify network access in air-gapped environments.

This fixture downloads files from external URLs which will fail in air-gapped/disconnected clusters.

Consider adding a parameter or environment variable to skip this fixture in disconnected environments, or provide an alternative mechanism for testing without external dependencies.

tests/llama_stack/rag/test_rag.py (3)

35-49: Fix embedding model identifier usage.

The test is accessing .identifier on the embedding model, but based on the ModelInfo typing, embedding_model should already be a string.

Apply this diff to use the string directly:

         embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=llama_stack_models.embedding_model.identifier,
+            model_id=llama_stack_models.embedding_model,
             contents=["First chunk of text"],
-            output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+            output_dimension=llama_stack_models.embedding_dimension,
         )

76-85: Fix embedding model identifier usage.

Same issue with accessing .identifier on the embedding model.

Apply this diff:

             llama_stack_client.vector_dbs.register(
                 vector_db_id=vector_db_id,
-                embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+                embedding_model=llama_stack_models.embedding_model,
-                embedding_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+                embedding_dimension=llama_stack_models.embedding_dimension,
                 provider_id="milvus",
             )

             # Calculate embeddings
             embeddings_response = llama_stack_client.inference.embeddings(
-                model_id=llama_stack_models.embedding_model.identifier,  # type: ignore
+                model_id=llama_stack_models.embedding_model,
                 contents=["First chunk of text"],
-                output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+                output_dimension=llama_stack_models.embedding_dimension,
             )

176-177: Fix embedding model identifier usage.

Same issue with accessing .identifier on the embedding model.

Apply this diff:

         llama_stack_client.vector_dbs.register(
             vector_db_id=vector_db_id,
-            embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+            embedding_model=llama_stack_models.embedding_model,
-            embedding_dimension=llama_stack_models.embedding_dimension,
+            embedding_dimension=llama_stack_models.embedding_dimension,
             provider_id="milvus",
         )
🧹 Nitpick comments (2)
tests/llama_stack/conftest.py (1)

227-231: Consider making exceptions more specific in retry decorator.

The retry decorator catches broad Exception types. Consider being more specific about which exceptions should trigger retries.

Apply this diff to be more specific:

 @retry(
     wait_timeout=Timeout.TIMEOUT_1MIN,
     sleep=5,
-    exceptions_dict={requests.exceptions.RequestException: [], Exception: []},
+    exceptions_dict={requests.exceptions.RequestException: [], OSError: []},
 )
utilities/rag_utils.py (1)

226-305: Address inconsistent response field and improve exception handling.

The function has two issues:

  1. Per-turn results use "response" key, but TurnResult TypedDict expects "response_content"
  2. The function catches broad Exception which is flagged by static analysis

Apply this diff to align with types and improve exception handling:

             result = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
+                "response_content": response,
+                "response_length": len(response),
+                "event_count": 0,
                 "success": success,
+                "error": None,
             }

-        except Exception as e:
+        except (requests.RequestException, ValueError, TypeError) as e:
             all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
                 "success": False,
                 "error": str(e),
             })
-            LOGGER.error(f"[{idx}] ERROR: {str(e)}")
+            LOGGER.exception(f"[{idx}] ERROR: {e}")

     total = len(test_cases)
     summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
+        "total_turns": total,
+        "successful_turns": successful,
+        "failed_turns": total - successful,
         "success_rate": successful / total if total > 0 else 0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
     }

     LOGGER.info("\n" + "=" * 40)
     LOGGER.info("Validation Summary:")
-    LOGGER.info(f"Total: {summary['total']}")
-    LOGGER.info(f"Passed: {summary['passed']}")
-    LOGGER.info(f"Failed: {summary['failed']}")
+    LOGGER.info(f"Total: {summary['total_turns']}")
+    LOGGER.info(f"Passed: {summary['successful_turns']}")
+    LOGGER.info(f"Failed: {summary['failed_turns']}")
     LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d837947 and 159f222.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • tests/llama_stack/utils.py (2 hunks)
  • utilities/rag_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.

Applied to files:

  • tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
tests/llama_stack/utils.py (2)
  • create_llama_stack_distribution (20-40)
  • wait_for_llama_stack_client_ready (44-55)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
utilities/constants.py (1)
  • Timeout (213-224)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (135-166)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
  • llama_stack_client (135-166)
  • vector_store (195-224)
  • llama_stack_models (170-191)
  • vector_store_with_docs (272-302)
utilities/rag_utils.py (3)
  • validate_rag_agent_responses (84-223)
  • validate_api_responses (226-305)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (3)
  • get_torchtune_test_expectations (58-72)
  • create_response_function (75-103)
  • _response_fn (90-101)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py

223-223: Do not catch blind exception: Exception

(BLE001)


264-264: Consider moving this statement to an else block

(TRY300)

utilities/rag_utils.py

277-277: Do not catch blind exception: Exception

(BLE001)


288-288: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


288-288: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (21)
tests/llama_stack/conftest.py (8)

2-2: LGTM!

Properly imported tempfile for file handling in the new document upload functionality.


7-7: LGTM!

Added requests import for HTTP operations in the document download functionality.


11-11: LGTM!

Correctly imported VectorStore type for the new vector store fixtures.


17-17: LGTM!

Added retry decorator import for robust file download operations.


22-22: LGTM!

Added ModelInfo import to use the centralized model information container.


52-60: LGTM!

The parameterized server configuration approach is well-implemented, allowing flexible configuration through request parameters while maintaining safe defaults.


94-98: LGTM!

The storage size configuration is correctly applied when the parameter is provided. The implementation maintains backward compatibility.


232-269: LGTM - Robust file download and upload implementation.

The function correctly handles file downloads with proper error handling, converts RST to TXT extensions, and uploads files to the vector store. The retry decorator and logging provide good resilience.

tests/llama_stack/rag/test_rag.py (10)

7-7: LGTM!

Correctly imported VectorStore type for the new vector store tests.


9-9: LGTM!

Added proper imports for the validation utilities and ModelInfo container.


11-11: LGTM!

Imported helper functions from utils module for test expectations and response function creation.


17-25: LGTM!

Added llama_stack_server_config to parameterization with empty storage size. This enables the server configuration to be properly injected into tests.


130-132: LGTM!

Correctly uses ModelInfo for agent creation with the model ID.


185-185: LGTM!

Correctly uses ModelInfo for RAG agent creation.


220-220: LGTM!

Uses the centralized test expectations from the helper function.


251-281: LGTM!

Simple responses test is well-structured and properly validates basic LLM functionality. Good use of parameterized test cases with expected keywords.


283-308: LGTM!

Excellent integration test that combines vector store with documents and the responses API. Uses the centralized helper functions and validation utilities effectively.


310-351: LGTM!

Comprehensive vector store search test that validates the search functionality with multiple queries. Good validation of response structure and content.

utilities/rag_utils.py (3)

4-4: LGTM!

Added necessary imports for the new functionality including Callable and NamedTuple.


11-16: LGTM!

The ModelInfo NamedTuple is well-designed with proper type annotations. The embedding_model field correctly typed as str to hold the model identifier.


42-74: LGTM!

The TypedDict definitions provide excellent type safety for the validation system. All fields are properly typed and documented through their usage context.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
utilities/rag_utils.py (1)

226-306: Align validate_api_responses with TurnResult/ValidationSummary and guard None responses

The current keys don’t match the declared TypedDicts and .lower() on None will crash. Align shapes, compute lengths, and use LOGGER.exception.

 def validate_api_responses(
     response_fn: Callable[..., str],
     test_cases: List[TurnExpectation],
     min_keywords_required: int = 1,
 ) -> ValidationResult:
@@
-    all_results = []
-    successful = 0
+    all_results: List[TurnResult] = []
+    successful_turns = 0
@@
-        try:
-            response = response_fn(question=question)
-            response_lower = response.lower()
+        try:
+            response_text = response_fn(question=question) or ""
+            response_lower = response_text.lower()
@@
-            found = [kw for kw in expected_keywords if kw.lower() in response_lower]
-            missing = [kw for kw in expected_keywords if kw.lower() not in response_lower]
-            success = len(found) >= min_keywords_required
+            found = [kw for kw in expected_keywords if kw.lower() in response_lower]
+            missing = [kw for kw in expected_keywords if kw.lower() not in response_lower]
+            success = len(found) >= min_keywords_required
@@
-            if success:
-                successful += 1
+            if success:
+                successful_turns += 1
@@
-            result = {
+            result: TurnResult = {
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": found,
                 "missing_keywords": missing,
-                "response": response,
+                "response_content": response_text,
+                "response_length": len(response_text),
+                "event_count": 0,
                 "success": success,
+                "error": None,
             }
@@
-            all_results.append({
+            all_results.append({
                 "question": question,
                 "description": description,
                 "expected_keywords": expected_keywords,
                 "found_keywords": [],
                 "missing_keywords": expected_keywords,
-                "response": "",
-                "success": False,
-                "error": str(e),
+                "response_content": "",
+                "response_length": 0,
+                "event_count": 0,
+                "success": False,
+                "error": str(e),
             })
-            LOGGER.error(f"[{idx}] ERROR: {str(e)}")
+            LOGGER.exception("[validate_api_responses] Error on case %s", idx)
@@
-    total = len(test_cases)
-    summary = {
-        "total": total,
-        "passed": successful,
-        "failed": total - successful,
-        "success_rate": successful / total if total > 0 else 0,
-    }
+    total = len(test_cases)
+    summary: ValidationSummary = {
+        "total_turns": total,
+        "successful_turns": successful_turns,
+        "failed_turns": total - successful_turns,
+        "success_rate": successful_turns / total if total > 0 else 0.0,
+        "total_events": sum(r["event_count"] for r in all_results),
+        "total_response_length": sum(r["response_length"] for r in all_results),
+    }
@@
-    LOGGER.info(f"Total: {summary['total']}")
-    LOGGER.info(f"Passed: {summary['passed']}")
-    LOGGER.info(f"Failed: {summary['failed']}")
-    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
+    LOGGER.info(f"Total turns: {summary['total_turns']}")
+    LOGGER.info(f"Successful: {summary['successful_turns']}")
+    LOGGER.info(f"Failed: {summary['failed_turns']}")
+    LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
@@
-    return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary})
+    return cast(ValidationResult, {"success": successful_turns == total, "results": all_results, "summary": summary})
tests/llama_stack/conftest.py (1)

169-191: Return string ids in ModelInfo and make discovery robust

ModelInfo.embedding_model is typed as str, but an object is returned. Also guard against missing models and bad metadata.

 def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo:
@@
-    models = llama_stack_client.models.list()
-    model_id = next(m for m in models if m.api_model_type == "llm").identifier
-
-    embedding_model = next(m for m in models if m.api_model_type == "embedding")
-    embedding_dimension = embedding_model.metadata["embedding_dimension"]
-
-    return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension)
+    models = llama_stack_client.models.list()
+    llm = next((m for m in models if m.api_model_type == "llm"), None)
+    if not llm:
+        raise ValueError("No LLM model found in LlamaStackClient.models.list()")
+    emb = next((m for m in models if m.api_model_type == "embedding"), None)
+    if not emb:
+        raise ValueError("No embedding model found in LlamaStackClient.models.list()")
+    md = getattr(emb, "metadata", {}) or {}
+    dim = md.get("embedding_dimension")
+    if not isinstance(dim, int):
+        raise ValueError("Embedding model metadata missing or invalid 'embedding_dimension'")
+    return ModelInfo(model_id=llm.identifier, embedding_model=str(emb.identifier), embedding_dimension=dim)
🧹 Nitpick comments (10)
utilities/rag_utils.py (1)

186-199: Log exceptions with stack traces

Use LOGGER.exception to aid debugging.

-        except Exception as e:
-            LOGGER.error(f"Error processing turn '{question}': {str(e)}")
+        except Exception:
+            LOGGER.exception("Error processing turn %r", question)
             turn_result = {
tests/llama_stack/conftest.py (4)

211-216: Pass the string embedding model id (drop .identifier and type: ignore)

Matches ModelInfo typing and avoids runtime/type ignores.

     vector_store = llama_stack_client.vector_stores.create(
         name="test_vector_store",
-        embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
+        embedding_model=llama_stack_models.embedding_model,
         embedding_dimension=llama_stack_models.embedding_dimension,
     )

165-166: Include traceback on port‑forward setup failure

LOGGER.exception emits the stack trace; easier triage.

-        LOGGER.error(f"Failed to set up port forwarding: {e}")
+        LOGGER.exception("Failed to set up port forwarding")

266-268: Narrow exception type and log with traceback in downloader

Avoid catching bare Exception; capture request and file I/O errors; include stack trace.

-    except (requests.exceptions.RequestException, Exception) as e:
-        LOGGER.warning(f"Failed to download and upload file {url}: {e}")
-        raise
+    except (requests.exceptions.RequestException, OSError):
+        LOGGER.exception("Failed to download and upload file %s", url)
+        raise

287-301: Optional: ensure files are indexed before tests proceed

If vector-store ingestion is async, consider polling file status or search readiness to reduce flakiness.

tests/llama_stack/rag/test_rag.py (5)

47-50: Use string embedding model id and remove type ignore

Aligns with ModelInfo typing and avoids attribute errors.

-        embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=llama_stack_models.embedding_model.identifier,
-            contents=["First chunk of text"],
-            output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
-        )
+        embeddings_response = llama_stack_client.inference.embeddings(
+            model_id=llama_stack_models.embedding_model,
+            contents=["First chunk of text"],
+            output_dimension=llama_stack_models.embedding_dimension,
+        )

74-86: Fix vector DB register/embeddings to use string embedding model id

Remove .identifier and type: ignore.

             llama_stack_client.vector_dbs.register(
                 vector_db_id=vector_db_id,
-                embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-                embedding_dimension=llama_stack_models.embedding_dimension,  # type: ignore
+                embedding_model=llama_stack_models.embedding_model,
+                embedding_dimension=llama_stack_models.embedding_dimension,
                 provider_id="milvus",
             )
@@
-            embeddings_response = llama_stack_client.inference.embeddings(
-                model_id=llama_stack_models.embedding_model.identifier,  # type: ignore
-                contents=["First chunk of text"],
-                output_dimension=llama_stack_models.embedding_dimension,  # type: ignore
-            )
+            embeddings_response = llama_stack_client.inference.embeddings(
+                model_id=llama_stack_models.embedding_model,
+                contents=["First chunk of text"],
+                output_dimension=llama_stack_models.embedding_dimension,
+            )

174-178: Use string embedding model id for RAG DB registration

Consistent typing; fewer surprises.

         llama_stack_client.vector_dbs.register(
             vector_db_id=vector_db_id,
-            embedding_model=llama_stack_models.embedding_model.identifier,  # type: ignore
-            embedding_dimension=llama_stack_models.embedding_dimension,
+            embedding_model=llama_stack_models.embedding_model,
+            embedding_dimension=llama_stack_models.embedding_dimension,
             provider_id="milvus",
         )

297-306: Ensure response_fn never returns None (prevents .lower() crash in validator)

create_response_function currently returns response.output_text, which can be None. Either keep your validator fix (recommended) or update the helper to coerce to str.

Proposed change in tests/llama_stack/utils.py:

-        return response.output_text
+        return response.output_text or ""

331-351: Minor: consider minimal retry around vector store search

If indexing is eventual, add a short retry/backoff to reduce flakes in CI.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d837947 and 159f222.

📒 Files selected for processing (4)
  • tests/llama_stack/conftest.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (9 hunks)
  • tests/llama_stack/utils.py (2 hunks)
  • utilities/rag_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.

Applied to files:

  • tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.

Applied to files:

  • tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
tests/llama_stack/utils.py (2)
  • create_llama_stack_distribution (20-40)
  • wait_for_llama_stack_client_ready (44-55)
utilities/rag_utils.py (2)
  • create_llama_stack_distribution (20-39)
  • ModelInfo (11-16)
utilities/constants.py (1)
  • Timeout (213-224)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
  • llama_stack_client (135-166)
  • vector_store (195-224)
  • llama_stack_models (170-191)
  • vector_store_with_docs (272-302)
utilities/rag_utils.py (3)
  • validate_rag_agent_responses (84-223)
  • validate_api_responses (226-305)
  • ModelInfo (11-16)
tests/llama_stack/utils.py (3)
  • get_torchtune_test_expectations (58-72)
  • create_response_function (75-103)
  • _response_fn (90-101)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (135-166)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py

223-223: Do not catch blind exception: Exception

(BLE001)


264-264: Consider moving this statement to an else block

(TRY300)

utilities/rag_utils.py

277-277: Do not catch blind exception: Exception

(BLE001)


288-288: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


288-288: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (1)
tests/llama_stack/rag/test_rag.py (1)

16-23: Confirm intention: empty storage size disables storage override

An empty string is falsy and won’t set server_config["storage"]. If you intended to provision storage, pass a size (e.g., "10Gi").

@dbasunag dbasunag merged commit 94670b3 into opendatahub-io:main Sep 18, 2025
11 checks passed
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

sheltoncyril pushed a commit to sheltoncyril/opendatahub-tests that referenced this pull request Oct 2, 2025
…les (opendatahub-io#582)

* test: add openai compatible tests for responses, vector stores and files api

* feat: Add fixture parameters for LLamaStackDistribution creation

Add fixture parameters:
- llama_stack_user_config_enabled and lama_stack_user_config_configmap
to configure custom llama-stack user configuration
- llama_stack_storage_size to configure storage size

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix possible bug when no parameters are provided to the fixture

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix possible bug when no parameters are provided to the fixture

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix parameter name when calling vector_stores.delete

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: address review suggestions

---------

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Co-authored-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Co-authored-by: Jorge <jgarciao@users.noreply.github.com>
mwaykole pushed a commit to mwaykole/opendatahub-tests that referenced this pull request Jan 23, 2026
…les (opendatahub-io#582)

* test: add openai compatible tests for responses, vector stores and files api

* feat: Add fixture parameters for LLamaStackDistribution creation

Add fixture parameters:
- llama_stack_user_config_enabled and lama_stack_user_config_configmap
to configure custom llama-stack user configuration
- llama_stack_storage_size to configure storage size

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix possible bug when no parameters are provided to the fixture

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix possible bug when no parameters are provided to the fixture

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: fix parameter name when calling vector_stores.delete

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: address review suggestions

---------

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Co-authored-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Co-authored-by: Jorge <jgarciao@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants