fix: Fix rag tests after rebase by jgarciao · Pull Request #591 · opendatahub-io/opendatahub-tests

jgarciao · 2025-09-08T10:48:28Z

Modify llama-stack RAG tests to use again MaaS. They were failing when using Qwen because that model doesn't have tool calling enabled
Fix typing issues in llama-stack core tests

Summary by CodeRabbit

Tests
- Strengthened core and RAG test coverage with clearer assertions and explicit typing.
- Simplified RAG test setup by removing external storage dependencies, reducing fixture requirements.
- Updated RAG agent expectations to verify multiple intents in responses (answer, translate, summarize, chat).
- Embedding tests now auto-detect embedding dimensions from the model.
- Maintained vector DB ingestion/retrieval coverage with streamlined test signatures.

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

When using Qwen the tests are failing, as it seems not to have tool calling enabled Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

coderabbitai · 2025-09-08T10:48:36Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Test updates across core and RAG suites: added explicit type annotations and return types in core tests, plus extra assertions in inference. RAG tests removed MinIO-related fixtures and parameterization, adjusted assertions, and now derive embedding dimensions dynamically. No new public APIs; only test signatures and expectations changed.

Changes

Cohort / File(s)	Summary of changes
Core tests `tests/llama_stack/core/test_llamastack_core.py`	Added explicit parameter type annotations (Pod, Secret, LlamaStackClient) and `-> None` returns to four tests; `test_inference` gained assertions validating non-empty content containing "ACK".
RAG tests `tests/llama_stack/rag/test_rag.py`	Removed MinIO-related fixtures from class params and test signatures; tests now rely only on `llama_stack_client`. Updated `test_rag_simple_agent` to assert multiple lowercase keywords. Modified embeddings test to derive dimension from model at runtime. Maintained vector DB ingestion/retrieval flow with updated signatures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa4fb4 and 8feaece.

📒 Files selected for processing (2)

tests/llama_stack/core/test_llamastack_core.py (4 hunks)
tests/llama_stack/rag/test_rag.py (5 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-08T10:48:49Z

The following are automatically added/executed:

PR size label.
Run pre-commit
Run tox
Add PR author as the PR assignee
Build image based on the PR

Available user actions:

To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
lgtm label removed on each new commit push.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.

Supported labels

{'/verified', '/cherry-pick', '/hold', '/lgtm', '/wip', '/build-push-pr-image'}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/llama_stack/rag/test_rag.py (1)

113-149: Fix None-check before calling .lower() and de-duplicate assertions.

Current order can raise on None. Also condense repetitive keyword checks.

-        content = response.output_message.content.lower()
-        assert content is not None, "LLM response content is None"
-        assert "answer" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "translate" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "summarize" in content, "The LLM didn't provide the expected answer to the prompt"
-        assert "chat" in content, "The LLM didn't provide the expected answer to the prompt"
+        content_raw = response.output_message.content
+        assert content_raw is not None, "LLM response content is None"
+        content = content_raw.lower()
+        for kw in ("answer", "translate", "summarize", "chat"):
+            assert kw in content, f"The LLM didn't mention expected capability: {kw}"

♻️ Duplicate comments (3)

tests/llama_stack/core/test_llamastack_core.py (3)

44-46: Same: drop unused fixture params here.

Follow the same @usefixtures refactor and remove minio_pod/minio_data_connection from this signature.

-    def test_model_register(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_model_register(self, llama_stack_client: LlamaStackClient) -> None:

52-54: Same: drop unused fixture params here.

-    def test_model_list(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_model_list(self, llama_stack_client: LlamaStackClient) -> None:

64-66: Same: drop unused fixture params here.

-    def test_inference(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_inference(self, llama_stack_client: LlamaStackClient) -> None:

🧹 Nitpick comments (5)

tests/llama_stack/core/test_llamastack_core.py (2)

25-27: Use @usefixtures instead of unused fixture params (silences Ruff ARG002 and clarifies intent).

The fixtures minio_pod and minio_data_connection are only needed for side effects; they aren’t referenced. Prefer @pytest.mark.usefixtures(...) and drop them from the signatures.

Apply:
@@
-@pytest.mark.rawdeployment
-@pytest.mark.smoke
-class TestLlamaStackCore:
-    def test_lls_server_initial_state(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+@pytest.mark.rawdeployment
+@pytest.mark.smoke
+@pytest.mark.usefixtures("minio_pod", "minio_data_connection")
+class TestLlamaStackCore:
+    def test_lls_server_initial_state(self, llama_stack_client: LlamaStackClient) -> None:
If you adopt this, also remove now-unused imports:
-from ocp_resources.pod import Pod
-from ocp_resources.secret import Secret
Alternative minimal change: keep signatures and add per-arg ignores:
-    def test_lls_server_initial_state(
-        self, minio_pod: Pod, minio_data_connection: Secret, llama_stack_client: LlamaStackClient
-    ) -> None:
+    def test_lls_server_initial_state(  # noqa: ARG002
+        self,
+        minio_pod: Pod,  # noqa: ARG002
+        minio_data_connection: Secret,  # noqa: ARG002
+        llama_stack_client: LlamaStackClient,
+    ) -> None:
4-7: Type-only imports: optional clean-up if you adopt @usefixtures.

If you remove Pod/Secret from test signatures, drop these imports to prevent unused-import lint. If you keep them for typing only, consider from __future__ import annotations or if TYPE_CHECKING: guards.

tests/llama_stack/rag/test_rag.py (3)

32-47: Remove the # type: ignore by validating/casting embedding_dimension.

Make the expectation explicit and avoid masking type issues.

-        embeddings_response = llama_stack_client.inference.embeddings(
-            model_id=embedding_model.identifier,
-            contents=["First chunk of text"],
-            output_dimension=embedding_dimension,  # type: ignore
-        )
+        ed = embedding_model.metadata.get("embedding_dimension")
+        assert isinstance(ed, int) and ed > 0, "Invalid embedding_dimension in model metadata"
+        embeddings_response = llama_stack_client.inference.embeddings(
+            model_id=embedding_model.identifier,
+            contents=["First chunk of text"],
+            output_dimension=ed,
+        )

54-54: Apply the same explicit check/cast for embedding_dimension in this test.

Prevents silent misconfigurations and removes implicit typing assumptions.

-        embedding_model = next(m for m in models if m.api_model_type == "embedding")
-        embedding_dimension = embedding_model.metadata["embedding_dimension"]
+        embedding_model = next(m for m in models if m.api_model_type == "embedding")
+        ed = embedding_model.metadata.get("embedding_dimension")
+        assert isinstance(ed, int) and ed > 0, "Invalid embedding_dimension in model metadata"
@@
-            llama_stack_client.vector_dbs.register(
+            llama_stack_client.vector_dbs.register(
                 vector_db_id=vector_db_id,
                 embedding_model=embedding_model.identifier,
-                embedding_dimension=embedding_dimension,  # type: ignore
+                embedding_dimension=ed,
                 provider_id="milvus",
             )
@@
-            embeddings_response = llama_stack_client.inference.embeddings(
+            embeddings_response = llama_stack_client.inference.embeddings(
                 model_id=embedding_model.identifier,
                 contents=["First chunk of text"],
-                output_dimension=embedding_dimension,  # type: ignore
+                output_dimension=ed,
             )

152-152: Ensure the chosen LLM supports tool-calling (prefer MaaS).

To avoid regressions when a non-tool-enabled model is first in the list, select a tool-capable model (e.g., MaaS) if available.

-        model_id = next(m for m in models if m.api_model_type == "llm").identifier
+        llm_models = [m for m in models if m.api_model_type == "llm"]
+        # Prefer MaaS (tool-calling) models if present; fallback to the first LLM.
+        preferred = next((m for m in llm_models if "maas" in m.identifier.lower()), llm_models[0])
+        model_id = preferred.identifier

To double-check what models are exposed and their identifiers, you can print them during a failing run or log them at debug level.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 491db86 and 656ebb8.

📒 Files selected for processing (2)

tests/llama_stack/core/test_llamastack_core.py (4 hunks)
tests/llama_stack/rag/test_rag.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/llama_stack/rag/test_rag.py (1)

tests/llama_stack/conftest.py (1)

llama_stack_client (128-159)

tests/llama_stack/core/test_llamastack_core.py (4)

tests/llama_stack/conftest.py (1)

llama_stack_client (128-159)

utilities/constants.py (1)

MinIo (273-333)

tests/conftest.py (2)

minio_pod (480-520)

minio_data_connection (546-558)

tests/llama_stack/constants.py (2)

LlamaStackProviders (4-14)

Inference (7-8)

🪛 Ruff (0.12.2)

tests/llama_stack/rag/test_rag.py

145-145: Use of assert detected

(S101)

146-146: Use of assert detected

(S101)

147-147: Use of assert detected

(S101)

148-148: Use of assert detected

(S101)

149-149: Use of assert detected

(S101)

tests/llama_stack/core/test_llamastack_core.py

26-26: Unused method argument: minio_pod

(ARG002)

26-26: Unused method argument: minio_data_connection

(ARG002)

45-45: Unused method argument: minio_pod

(ARG002)

45-45: Unused method argument: minio_data_connection

(ARG002)

50-50: Use of assert detected

(S101)

53-53: Unused method argument: minio_pod

(ARG002)

53-53: Unused method argument: minio_data_connection

(ARG002)

65-65: Unused method argument: minio_pod

(ARG002)

65-65: Unused method argument: minio_data_connection

(ARG002)

🔇 Additional comments (1)

tests/llama_stack/rag/test_rag.py (1)

15-16: LGTM: simplified parameterization.

Switching to only model_namespace aligns with the MaaS-based setup and removes MinIO coupling.

tests/llama_stack/rag/test_rag.py

github-actions · 2025-09-09T11:13:39Z

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

* fix: fix typing issues in llama-stack core tests Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: Modify llama-stack RAG tests to use MaaS When using Qwen the tests are failing, as it seems not to have tool calling enabled Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

jgarciao added 2 commits September 8, 2025 12:36

fix: fix typing issues in llama-stack core tests

d480b97

Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

fix: Modify llama-stack RAG tests to use MaaS

656ebb8

When using Qwen the tests are failing, as it seems not to have tool calling enabled Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>

jgarciao requested a review from a team as a code owner September 8, 2025 10:48

jgarciao requested review from adolfo-ab and jiripetrlik September 8, 2025 10:48

github-actions bot added the size/s label Sep 8, 2025

github-actions bot assigned jgarciao Sep 8, 2025

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

jiripetrlik reviewed Sep 8, 2025

View reviewed changes

tests/llama_stack/rag/test_rag.py Show resolved Hide resolved

rhods-ci-bot added the commented-by-jiripetrlik label Sep 8, 2025

Merge branch 'main' into fix-rag-tests-after-rebase

1b9a536

rhods-ci-bot added commented-by-jgarciao and removed commented-by-jiripetrlik labels Sep 8, 2025

jiripetrlik approved these changes Sep 8, 2025

View reviewed changes

rhods-ci-bot added the lgtm-by-jiripetrlik label Sep 8, 2025

Merge branch 'main' into fix-rag-tests-after-rebase

84bbf55

rhods-ci-bot removed commented-by-jgarciao lgtm-by-jiripetrlik labels Sep 8, 2025

adolfo-ab approved these changes Sep 9, 2025

View reviewed changes

rhods-ci-bot added the lgtm-by-adolfo-ab label Sep 9, 2025

Merge branch 'main' into fix-rag-tests-after-rebase

8feaece

rhods-ci-bot removed the lgtm-by-adolfo-ab label Sep 9, 2025

jgarciao merged commit 5eee07b into opendatahub-io:main Sep 9, 2025
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix rag tests after rebase#591

fix: Fix rag tests after rebase#591
jgarciao merged 5 commits intoopendatahub-io:mainfrom
jgarciao:fix-rag-tests-after-rebase

jgarciao commented Sep 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 8, 2025 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jgarciao commented Sep 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jgarciao commented Sep 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 8, 2025 •

edited

Loading