Skip to content

fix: Fix rag tests after rebase#591

Merged
jgarciao merged 5 commits intoopendatahub-io:mainfrom
jgarciao:fix-rag-tests-after-rebase
Sep 9, 2025
Merged

fix: Fix rag tests after rebase#591
jgarciao merged 5 commits intoopendatahub-io:mainfrom
jgarciao:fix-rag-tests-after-rebase

Conversation

@jgarciao
Copy link
Copy Markdown
Contributor

@jgarciao jgarciao commented Sep 8, 2025

  • Modify llama-stack RAG tests to use again MaaS. They were failing when using Qwen because that model doesn't have tool calling enabled
  • Fix typing issues in llama-stack core tests

Summary by CodeRabbit

  • Tests
    • Strengthened core and RAG test coverage with clearer assertions and explicit typing.
    • Simplified RAG test setup by removing external storage dependencies, reducing fixture requirements.
    • Updated RAG agent expectations to verify multiple intents in responses (answer, translate, summarize, chat).
    • Embedding tests now auto-detect embedding dimensions from the model.
    • Maintained vector DB ingestion/retrieval coverage with streamlined test signatures.

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
When using Qwen the tests are failing, as it seems not to have
tool calling enabled

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
@jgarciao jgarciao requested a review from a team as a code owner September 8, 2025 10:48
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 8, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Test updates across core and RAG suites: added explicit type annotations and return types in core tests, plus extra assertions in inference. RAG tests removed MinIO-related fixtures and parameterization, adjusted assertions, and now derive embedding dimensions dynamically. No new public APIs; only test signatures and expectations changed.

Changes

Cohort / File(s) Summary of changes
Core tests
tests/llama_stack/core/test_llamastack_core.py
Added explicit parameter type annotations (Pod, Secret, LlamaStackClient) and -> None returns to four tests; test_inference gained assertions validating non-empty content containing "ACK".
RAG tests
tests/llama_stack/rag/test_rag.py
Removed MinIO-related fixtures from class params and test signatures; tests now rely only on llama_stack_client. Updated test_rag_simple_agent to assert multiple lowercase keywords. Modified embeddings test to derive dimension from model at runtime. Maintained vector DB ingestion/retrieval flow with updated signatures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa4fb4 and 8feaece.

📒 Files selected for processing (2)
  • tests/llama_stack/core/test_llamastack_core.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (5 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 8, 2025

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/verified', '/cherry-pick', '/hold', '/lgtm', '/wip', '/build-push-pr-image'}

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/llama_stack/rag/test_rag.py (1)

113-149: Fix None-check before calling .lower() and de-duplicate assertions.

Current order can raise on None. Also condense repetitive keyword checks.

-        content = response.output_message.content.lower()
-        assert content is not None, "LLM response content is None"
-        assert "answer" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "translate" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "summarize" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "chat" in content, "The LLM didn't provide the expected answer to the prompt"
+        content_raw = response.output_message.content
+        assert content_raw is not None, "LLM response content is None"
+        content = content_raw.lower()
+        for kw in ("answer", "translate", "summarize", "chat"):
+            assert kw in content, f"The LLM didn't mention expected capability: {kw}"
♻️ Duplicate comments (3)
tests/llama_stack/core/test_llamastack_core.py (3)

44-46: Same: drop unused fixture params here.

Follow the same @usefixtures refactor and remove minio_pod/minio_data_connection from this signature.

-    def test_model_register(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_model_register(self, llama_stack_client: LlamaStackClient) -> None:

52-54: Same: drop unused fixture params here.

-    def test_model_list(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_model_list(self, llama_stack_client: LlamaStackClient) -> None:

64-66: Same: drop unused fixture params here.

-    def test_inference(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_inference(self, llama_stack_client: LlamaStackClient) -> None:
🧹 Nitpick comments (5)
tests/llama_stack/core/test_llamastack_core.py (2)

25-27: Use @usefixtures instead of unused fixture params (silences Ruff ARG002 and clarifies intent).

The fixtures minio_pod and minio_data_connection are only needed for side effects; they aren’t referenced. Prefer @pytest.mark.usefixtures(...) and drop them from the signatures.

Apply:

@@
-@pytest.mark.rawdeployment
-@pytest.mark.smoke
-class TestLlamaStackCore:
-    def test_lls_server_initial_state(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+@pytest.mark.rawdeployment
+@pytest.mark.smoke
+@pytest.mark.usefixtures("minio_pod", "minio_data_connection")
+class TestLlamaStackCore:
+    def test_lls_server_initial_state(self, llama_stack_client: LlamaStackClient) -> None:

If you adopt this, also remove now-unused imports:

-from ocp_resources.pod import Pod
-from ocp_resources.secret import Secret

Alternative minimal change: keep signatures and add per-arg ignores:

-    def test_lls_server_initial_state(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_lls_server_initial_state(  # noqa: ARG002
+        self,
+        minio_pod: Pod,  # noqa: ARG002
+        minio_data_connection: Secret,  # noqa: ARG002
+        llama_stack_client: LlamaStackClient,
+    ) -> None:

4-7: Type-only imports: optional clean-up if you adopt @usefixtures.

If you remove Pod/Secret from test signatures, drop these imports to prevent unused-import lint. If you keep them for typing only, consider from __future__ import annotations or if TYPE_CHECKING: guards.

tests/llama_stack/rag/test_rag.py (3)

32-47: Remove the # type: ignore by validating/casting embedding_dimension.

Make the expectation explicit and avoid masking type issues.

-        embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=embedding_model.identifier,
-            contents=["First chunk of text"],
-            output_dimension=embedding_dimension,  # type: ignore
-        )
+        ed = embedding_model.metadata.get("embedding_dimension")
+        assert isinstance(ed, int) and ed > 0, "Invalid embedding_dimension in model metadata"
+        embeddings_response = llama_stack_client.inference.embeddings(
+            model_id=embedding_model.identifier,
+            contents=["First chunk of text"],
+            output_dimension=ed,
+        )

54-54: Apply the same explicit check/cast for embedding_dimension in this test.

Prevents silent misconfigurations and removes implicit typing assumptions.

-        embedding_model = next(m for m in models if m.api_model_type == "embedding")
-        embedding_dimension = embedding_model.metadata["embedding_dimension"]
+        embedding_model = next(m for m in models if m.api_model_type == "embedding")
+        ed = embedding_model.metadata.get("embedding_dimension")
+        assert isinstance(ed, int) and ed > 0, "Invalid embedding_dimension in model metadata"
@@
-            llama_stack_client.vector_dbs.register(
+            llama_stack_client.vector_dbs.register(
                 vector_db_id=vector_db_id,
                 embedding_model=embedding_model.identifier,
-                embedding_dimension=embedding_dimension,  # type: ignore
+                embedding_dimension=ed,
                 provider_id="milvus",
             )
@@
-            embeddings_response = llama_stack_client.inference.embeddings(
+            embeddings_response = llama_stack_client.inference.embeddings(
                 model_id=embedding_model.identifier,
                 contents=["First chunk of text"],
-                output_dimension=embedding_dimension,  # type: ignore
+                output_dimension=ed,
             )

152-152: Ensure the chosen LLM supports tool-calling (prefer MaaS).

To avoid regressions when a non-tool-enabled model is first in the list, select a tool-capable model (e.g., MaaS) if available.

-        model_id = next(m for m in models if m.api_model_type == "llm").identifier
+        llm_models = [m for m in models if m.api_model_type == "llm"]
+        # Prefer MaaS (tool-calling) models if present; fallback to the first LLM.
+        preferred = next((m for m in llm_models if "maas" in m.identifier.lower()), llm_models[0])
+        model_id = preferred.identifier

To double-check what models are exposed and their identifiers, you can print them during a failing run or log them at debug level.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 491db86 and 656ebb8.

📒 Files selected for processing (2)
  • tests/llama_stack/core/test_llamastack_core.py (4 hunks)
  • tests/llama_stack/rag/test_rag.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/llama_stack/rag/test_rag.py (1)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (128-159)
tests/llama_stack/core/test_llamastack_core.py (4)
tests/llama_stack/conftest.py (1)
  • llama_stack_client (128-159)
utilities/constants.py (1)
  • MinIo (273-333)
tests/conftest.py (2)
  • minio_pod (480-520)
  • minio_data_connection (546-558)
tests/llama_stack/constants.py (2)
  • LlamaStackProviders (4-14)
  • Inference (7-8)
🪛 Ruff (0.12.2)
tests/llama_stack/rag/test_rag.py

145-145: Use of assert detected

(S101)


146-146: Use of assert detected

(S101)


147-147: Use of assert detected

(S101)


148-148: Use of assert detected

(S101)


149-149: Use of assert detected

(S101)

tests/llama_stack/core/test_llamastack_core.py

26-26: Unused method argument: minio_pod

(ARG002)


26-26: Unused method argument: minio_data_connection

(ARG002)


45-45: Unused method argument: minio_pod

(ARG002)


45-45: Unused method argument: minio_data_connection

(ARG002)


50-50: Use of assert detected

(S101)


53-53: Unused method argument: minio_pod

(ARG002)


53-53: Unused method argument: minio_data_connection

(ARG002)


65-65: Unused method argument: minio_pod

(ARG002)


65-65: Unused method argument: minio_data_connection

(ARG002)

🔇 Additional comments (1)
tests/llama_stack/rag/test_rag.py (1)

15-16: LGTM: simplified parameterization.

Switching to only model_namespace aligns with the MaaS-based setup and removes MinIO coupling.

@jgarciao jgarciao merged commit 5eee07b into opendatahub-io:main Sep 9, 2025
7 of 8 checks passed
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 9, 2025

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

mwaykole pushed a commit to mwaykole/opendatahub-tests that referenced this pull request Jan 23, 2026
* fix: fix typing issues in llama-stack core tests

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

* fix: Modify llama-stack RAG tests to use MaaS

When using Qwen the tests are failing, as it seems not to have
tool calling enabled

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

---------

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants