test: add openai compatible tests for responses, vector stores and files#582
test: add openai compatible tests for responses, vector stores and files#582dbasunag merged 11 commits intoopendatahub-io:mainfrom
Conversation
📝 WalkthroughWalkthroughRefactors test server config to read params and optionally include storage; adds ModelInfo and vector-store fixtures with document ingestion; extends RAG tests to use ModelInfo and responses API; adds TorchTune test expectations and API-response validation utilities plus helpers to build response functions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Pre-merge checks✅ Passed checks (3 passed)
|
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/verified', '/build-push-pr-image', '/lgtm', '/hold', '/cherry-pick', '/wip'} |
cb6d487 to
9e3a62a
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
tests/llama_stack/conftest.py (1)
90-90: Verify ConfigMap name stays in sync.userConfig.configMapName hardcodes "rag-llama-stack-config-map". If renamed, startup will break. Consider centralizing this name as a constant used by both fixtures.
tests/llama_stack/rag/test_rag.py (2)
355-360: Align assertion list with message; include 'llm'.Apply this diff:
- assert any( - answer in content.lower() for answer in ["model", "assistant", "ai", "artificial", "language model"] - ), ( - f"The LLM didn't provide any of the expected answers " - f"['model', 'assistant', 'ai', 'llm', 'language model']. Got: {content}" - ) + expected = ["model", "assistant", "ai", "llm", "language model", "artificial"] + assert any(token in content.lower() for token in expected), ( + f"The LLM didn't provide any of the expected answers {expected}. Got: {content}" + )
426-428: Prefer xfail over skip to keep the test visible in reports.Marking as xfail(reason=..., strict=False) keeps signal without hiding regressions when it starts passing.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/llama_stack/conftest.py(2 hunks)tests/llama_stack/rag/test_rag.py(4 hunks)utilities/rag_utils.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/llama_stack/conftest.py (1)
tests/conftest.py (2)
admin_client(68-69)model_namespace(100-120)
tests/llama_stack/rag/test_rag.py (2)
tests/llama_stack/conftest.py (1)
llama_stack_client(294-325)utilities/rag_utils.py (2)
TurnExpectation(34-37)validate_api_responses(218-309)
🔇 Additional comments (6)
utilities/rag_utils.py (1)
4-4: Importing Callable is correct and necessary.tests/llama_stack/conftest.py (3)
13-13: ConfigMap import looks good.
85-85: Confirm CLI entrypoint is valid in the image.Ensure the container image exposes the
llamaCLI and thatllama stack run /etc/llama-stack/run.yamlis the correct invocation.
265-265: Fixture dependency wiring is correct.tests/llama_stack/rag/test_rag.py (2)
1-1: Adding io is appropriate for in-memory uploads.
67-114: Expectations helper reads well and matches validator usage.
abc79d7 to
9f3b267
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/llama_stack/rag/test_rag.py (1)
390-402: Guard against None from responses API (prevents crash in validate_api_responses).response.output_text can be None; validate_api_responses calls .lower(), which will crash. Return a safe string.
- return response.output_text + return response.output_text or ""Optional: define
_response_fnaftermodel_idandvector_storeare created for clarity.
🧹 Nitpick comments (5)
tests/llama_stack/rag/test_rag.py (5)
63-72: Retries/timeouts look good; add jitter and surface the URL on failure.Minor resilience/readability tweak: include the URL in the raised error and add a small backoff jitter to reduce thundering herd in CI.
- for attempt in range(3): + for attempt in range(3): try: response = requests.get(url, timeout=30) response.raise_for_status() downloaded_files.append((file_name, response.content)) break - except requests.exceptions.RequestException: - if attempt == 2: - raise - time.sleep(1) + except requests.exceptions.RequestException as e: + if attempt == 2: + raise RuntimeError(f"Failed to download {url} after {attempt + 1} attempts") from e + time.sleep(1 + 0.2 * attempt)
101-111: Reduce brittleness of QAT expectations (hyphenation varies).Docs may spell “Quantization-Aware Training” without the hyphen. Consider adding token-level fallbacks to avoid false negatives.
{ "question": "How can I optimize model training for quantization?", "expected_keywords": [ - "Quantization-Aware Training", + "Quantization-Aware Training", + "quantization aware training", "QAT", "training", "fine-tuning", - "fake", - "quantized", + "fake", + "quantized", + "quantization", ], "description": "Should provide information about QAT (Quantization-Aware Training)", },
353-369: Fix mismatch in assertion vs. error message (“llm” missing from check).Include "llm" in the allowed tokens to match the failure message.
- assert any( - answer in content.lower() for answer in ["model", "assistant", "ai", "artificial", "language model"] - ), ( - f"The LLM didn't provide any of the expected answers " - f"['model', 'assistant', 'ai', 'llm', 'language model']. Got: {content}" - ) + assert any( + answer in content.lower() + for answer in ["model", "assistant", "ai", "artificial", "language model", "llm"] + ), ( + "The LLM didn't provide any of the expected answers " + "['model', 'assistant', 'ai', 'llm', 'language model']." + f" Got: {content}" + )
371-379: Make capability assertion tolerant to singular/plural.Allow “answer” or “answers” to avoid brittle failures.
- assert "answer" in content, "The LLM didn't provide the expected answer to the prompt" + assert any(t in content.lower() for t in ("answer", "answers")), \ + "The LLM didn't provide the expected answer to the prompt"
461-463: Mirror the BytesIO compatibility tweak here (even though skipped).Use positional initializer for broader interpreter compatibility and consistency with the change above.
- file_like = io.BytesIO(initial_bytes=file_content) + file_like = io.BytesIO(file_content)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/llama_stack/conftest.py(2 hunks)tests/llama_stack/rag/test_rag.py(4 hunks)utilities/rag_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/llama_stack/conftest.py
- utilities/rag_utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/llama_stack/rag/test_rag.py (2)
tests/llama_stack/conftest.py (1)
llama_stack_client(294-325)utilities/rag_utils.py (2)
TurnExpectation(34-37)validate_api_responses(218-303)
🔇 Additional comments (1)
tests/llama_stack/rag/test_rag.py (1)
415-424: Add wait for vector store indexing to avoid flakiness
File ingestion into vector stores is asynchronous; callingvalidate_api_responsesimmediately can yield empty results. Introduce a best-effort wait (e.g. pollvector_stores.files.listand check each item’sstatusuntil it’s"completed"/"ready", or use a dedicated client wait/status API if provided) before invoking validation. Confirm thatvector_stores.files.listreturns astatusfield (or identify the proper status endpoint) and adjust polling accordingly.
fed9148 to
06d9487
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
utilities/rag_utils.py (1)
226-311: Fix type/shape mismatch and None-guard in validate_api_responses.Returned per-turn keys and summary don’t match TurnResult/ValidationSummary and
.lower()will crash if response_fn returns None. Align fields and harden against None.Apply:
def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, verbose: bool = True, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful = 0 @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, - "found_keywords": found, - "missing_keywords": missing, - "response": response, - "success": success, + "found_keywords": found, + "missing_keywords": missing, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, + "success": success, + "error": None, } @@ - all_results.append({ + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", - "success": False, - "error": str(e), + "response_content": "", + "response_length": 0, + "event_count": 0, + "success": False, + "error": str(e), }) @@ - total = len(test_cases) - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + total = len(test_cases) + summary: ValidationSummary = { + "total_turns": total, + "successful_turns": successful, + "failed_turns": total - successful, + "success_rate": successful / total if total > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") + LOGGER.info(f"Total turns: {summary['total_turns']}") + LOGGER.info(f"Successful: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") @@ - return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary}) + return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})tests/llama_stack/conftest.py (1)
205-205: Replace zero‑width default in OTEL service name.The fallback uses U+200B and is invisible; use an explicit value.
- service_name: "${env.OTEL_SERVICE_NAME:=\u200b}" + service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"tests/llama_stack/rag/test_rag.py (1)
138-150: Guard against None in responses API.output_text can be None; returning it unguarded breaks validate_api_responses on .lower().
- return response.output_text + return response.output_text or ""
🧹 Nitpick comments (5)
utilities/rag_utils.py (1)
84-223: Minor: share keyword-match logic to avoid drift.Consider factoring the common “lower+contains” logic with validate_rag_agent_responses to a helper.
tests/llama_stack/conftest.py (1)
346-352: Guard model selection with clear errors.next(...) will raise StopIteration; provide explicit assertions for better triage.
- models = llama_stack_client.models.list() - model_id = next(m for m in models if m.api_model_type == "llm").identifier - - embedding_model = next(m for m in models if m.api_model_type == "embedding") - embedding_dimension = embedding_model.metadata["embedding_dimension"] + models = llama_stack_client.models.list() + llm = next((m for m in models if m.api_model_type == "llm"), None) + assert llm is not None, "No LLM model found from llama_stack_client.models.list()" + embedding_model = next((m for m in models if m.api_model_type == "embedding"), None) + assert embedding_model is not None, "No embedding model found from llama_stack_client.models.list()" + embedding_dimension = embedding_model.metadata.get("embedding_dimension") + assert isinstance(embedding_dimension, int), "Embedding model missing integer 'embedding_dimension' metadata" + model_id = llm.identifiertests/llama_stack/rag/test_rag.py (3)
123-126: Tighten return type of factory.Use typing.Callable for clarity.
- ) -> callable: + ) -> Callable[..., str]:Remember to import:
-from typing import List +from typing import List, Callable
41-74: Remove unused helper to avoid drift with fixture._download_torchtune_docs duplicates fixture logic and appears unused in this file.
- def _download_torchtune_docs(self) -> list[tuple[str, bytes]]: - """ - Helper method to download PyTorch TorchTune documentation files. - ... - return downloaded_files -``` --- `268-361`: **Consider moving vector DB setup/teardown to a fixture.** test_rag_build_rag_agent mixes setup/teardown with assertions. A fixture mirroring vector_store would improve reuse and readability. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 9f3b267a6092f97ed0126c02cf5db756f7e7b731 and 06d9487ed226f71c8f0995283806241db7facb6c. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `tests/llama_stack/conftest.py` (4 hunks) * `tests/llama_stack/rag/test_rag.py` (10 hunks) * `utilities/rag_utils.py` (2 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (2)</summary> <details> <summary>📚 Learning: 2025-08-21T19:08:14.982Z</summary>Learnt from: dbasunag
PR: #527
File: tests/model_registry/rbac/conftest.py:358-359
Timestamp: 2025-08-21T19:08:14.982Z
Learning: In tests/model_registry/rbac/conftest.py, the model catalog configmap only exists in one specific namespace (py_config["model_registry_namespace"]) regardless of where model registry instances are deployed, making it a shared resource that serves all model registry instances across namespaces.**Applied to files:** - `tests/llama_stack/conftest.py` </details> <details> <summary>📚 Learning: 2025-06-06T12:22:57.057Z</summary>Learnt from: dbasunag
PR: #338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.**Applied to files:** - `tests/llama_stack/rag/test_rag.py` </details> </details><details> <summary>🧬 Code graph analysis (3)</summary> <details> <summary>tests/llama_stack/conftest.py (3)</summary><blockquote> <details> <summary>utilities/rag_utils.py (2)</summary> * `create_llama_stack_distribution` (20-39) * `ModelInfo` (11-16) </details> <details> <summary>tests/llama_stack/utils.py (2)</summary> * `create_llama_stack_distribution` (17-37) * `wait_for_llama_stack_client_ready` (41-52) </details> <details> <summary>tests/conftest.py (2)</summary> * `admin_client` (68-69) * `model_namespace` (100-120) </details> </blockquote></details> <details> <summary>utilities/rag_utils.py (1)</summary><blockquote> <details> <summary>tests/llama_stack/conftest.py (1)</summary> * `llama_stack_client` (296-327) </details> </blockquote></details> <details> <summary>tests/llama_stack/rag/test_rag.py (2)</summary><blockquote> <details> <summary>tests/llama_stack/conftest.py (4)</summary> * `llama_stack_client` (296-327) * `llama_stack_models` (331-352) * `vector_store` (356-385) * `vector_store_with_docs` (389-449) </details> <details> <summary>utilities/rag_utils.py (3)</summary> * `TurnExpectation` (42-45) * `validate_api_responses` (226-311) * `validate_rag_agent_responses` (84-223) </details> </blockquote></details> </details> </details> <details> <summary>🔇 Additional comments (3)</summary><blockquote> <details> <summary>utilities/rag_utils.py (1)</summary><blockquote> `11-16`: **ModelInfo looks good.** Clear, minimal container for model metadata; aligns with downstream fixtures. </blockquote></details> <details> <summary>tests/llama_stack/conftest.py (2)</summary><blockquote> `87-93`: **ConfigMap-driven run command wiring looks correct.** Command and userConfig binding to run.yaml via ConfigMap are consistent. --- `388-449`: **Good: network calls hardened with timeout+retries and resources cleaned up.** Download loop uses timeouts and retries; vector store cleanup is handled. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
1909846 to
ded8179
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
utilities/rag_utils.py (1)
226-311: Align result/summary shapes with TurnResult/ValidationSummary and harden against None responses.Current keys ("response", "total", "passed", "failed") don’t match the TypedDicts and will break consumers. Also
.lower()on a None response will crash. Align fields and add lengths/events.def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, verbose: bool = True, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful = 0 @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, - "success": success, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, + "success": success, + "error": None, } @@ - all_results.append({ + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", + "response_content": "", + "response_length": 0, + "event_count": 0, "success": False, "error": str(e), }) - if verbose: - LOGGER.error(f"ERROR: {str(e)}") + if verbose: + LOGGER.exception("ERROR validating question %s: %s", question, e) @@ - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + summary: ValidationSummary = { + "total_turns": total, + "successful_turns": successful, + "failed_turns": total - successful, + "success_rate": successful / total if total > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") - LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") + LOGGER.info(f"Total: {summary['total_turns']}") + LOGGER.info(f"Passed: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") + LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")Additional: keeping a broad
except Exceptionis acceptable for test utilities, but if you want to satisfy Ruff BLE001, narrow to anticipated exceptions or add# noqa: BLE001. Also prefer logger formatting over f-strings.tests/llama_stack/conftest.py (1)
100-260: Replace zero‑width space default in OTEL service name.The invisible U+200B default can confuse tooling. Use an explicit, visible default.
- service_name: "${env.OTEL_SERVICE_NAME:=\u200b}" + service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"tests/llama_stack/rag/test_rag.py (1)
49-77: Guard against None from responses API and add types to helper.
response.output_textcan be None; returning it will break validation on.lower().- def _create_response_function( - self, llama_stack_client: LlamaStackClient, llama_stack_models, vector_store - ) -> callable: + def _create_response_function( + self, llama_stack_client: LlamaStackClient, llama_stack_models, vector_store + ) -> callable: @@ - def _response_fn(*, question: str) -> str: + def _response_fn(*, question: str) -> str: response = llama_stack_client.responses.create( input=question, model=llama_stack_models.model_id, tools=[ @@ - ) - return response.output_text + ) + return response.output_text or ""
🧹 Nitpick comments (9)
tests/llama_stack/constants.py (2)
19-25: Make TorchTuneTestExpectation immutable.Mark the dataclass as frozen to prevent accidental mutation in tests.
-@dataclass +@dataclass(frozen=True) class TorchTuneTestExpectation:
28-67: Consider inlining expectations in the test file for triage simplicity.Per repo practice (keep parametrization inline), moving these to the test module may ease debugging; or add a brief comment/link here pointing to the tests consuming them.
tests/llama_stack/conftest.py (3)
262-279: Silence unused-fixture warnings without breaking dependency ordering.These fixtures are intentionally unused to enforce ordering. Assign them to
_to satisfy Ruff without renaming parameters (which would break pytest resolution).def llama_stack_distribution( admin_client: DynamicClient, model_namespace: Namespace, enabled_llama_stack_operator: DataScienceCluster, llama_stack_server_config: Dict[str, Any], llama_stack_config_map: ConfigMap, ) -> Generator[LlamaStackDistribution, None, None]: + # Ensure fixture dependencies are realized; values are unused by design. + _ = (enabled_llama_stack_operator, llama_stack_config_map)
312-329: Log full traceback on port-forward setup failures.Use
LOGGER.exceptionto capture stack traces.- except Exception as e: - LOGGER.error(f"Failed to set up port forwarding: {e}") - raise + except Exception as e: + LOGGER.exception("Failed to set up port forwarding: %s", e) + raise
387-427: Narrow exception handling and log tracebacks when downloading docs.Catching generic Exception triggers BLE001. Also prefer
exception()to include traceback.- except (requests.exceptions.RequestException, Exception) as e: - LOGGER.warning(f"Failed to download and upload file {url}: {e}") - raise + except requests.exceptions.RequestException as e: + LOGGER.exception("Failed to download/upload %s: %s", url, e) + raisetests/llama_stack/rag/test_rag.py (4)
168-177: Case-insensitive assertion for robustness.Normalize content for the identity check to reduce flakiness.
- content = response.output_message.content + content = (response.output_message.content or "").lower() assert content is not None, "LLM response content is None" - assert "model" in content, "The LLM didn't provide the expected answer to the prompt" + assert "model" in content, "The LLM didn't provide the expected answer to the prompt"
340-345: Tweak failure message to match test intent.This test validates the responses API (not the agent). Minor wording fix.
- assert validation_result["success"], f"RAG agent validation failed. Summary: {validation_result['summary']}" + assert validation_result["success"], f"Responses validation failed. Summary: {validation_result['summary']}"
33-47: Optional: drop the helper and inline parametrization.To align with repo preference, consider inlining these expectations in the test (or parametrizing with pytest) to avoid context switching.
318-325: Unused fixtures in signature.
minio_podandminio_data_connectionaren’t used. If they’re only for side effects, add_ = (minio_pod, minio_data_connection)at the top of the test; otherwise remove them.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/constants.py(2 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)utilities/rag_utils.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)tests/llama_stack/utils.py (1)
create_llama_stack_distribution(17-37)utilities/constants.py (1)
Timeout(213-224)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(297-328)
tests/llama_stack/rag/test_rag.py (3)
utilities/rag_utils.py (2)
TurnExpectation(42-45)validate_api_responses(226-311)tests/llama_stack/conftest.py (4)
llama_stack_client(297-328)llama_stack_models(332-353)vector_store(357-384)vector_store_with_docs(430-460)tests/conftest.py (2)
minio_pod(480-520)minio_data_connection(546-558)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py
266-266: Unused function argument: enabled_llama_stack_operator
(ARG001)
268-268: Unused function argument: llama_stack_config_map
(ARG001)
383-383: Do not catch blind exception: Exception
(BLE001)
422-422: Consider moving this statement to an else block
(TRY300)
utilities/rag_utils.py
281-281: Do not catch blind exception: Exception
(BLE001)
293-293: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
293-293: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/llama_stack/rag/test_rag.py
320-320: Unused method argument: minio_pod
(ARG002)
321-321: Unused method argument: minio_data_connection
(ARG002)
🔇 Additional comments (2)
utilities/rag_utils.py (1)
11-17: ModelInfo NamedTuple looks good.tests/llama_stack/conftest.py (1)
88-94: Confirm CRD schema for userConfig/configMapName and command override.Ensure this matches the operator/CRD version used in CI; otherwise deployment may ignore the config map or entrypoint.
ded8179 to
b48de0d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
utilities/rag_utils.py (1)
226-312: Align validate_api_responses with TurnResult/ValidationSummary; add response_length/event_count; improve loggingThe per-turn/result keys don’t match declared TypedDicts (response vs response_content; missing response_length/event_count), and the summary uses different field names (total/passed vs total_turns/successful_turns). Also prefer LOGGER.exception and guard against None responses. This will otherwise break consumers and type checks.
Apply:
def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, verbose: bool = True, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful_turns = 0 @@ - for idx, test in enumerate(test_cases, 1): + for idx, test in enumerate(test_cases, 1): question = test["question"] expected_keywords = test["expected_keywords"] description = test.get("description", "") @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - found = [kw for kw in expected_keywords if kw.lower() in response_lower] - missing = [kw for kw in expected_keywords if kw.lower() not in response_lower] - success = len(found) >= min_keywords_required + found = [kw for kw in expected_keywords if kw.lower() in response_lower] + missing = [kw for kw in expected_keywords if kw.lower() not in response_lower] + success = len(found) >= min_keywords_required @@ - if success: - successful += 1 + if success: + successful_turns += 1 @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, - "success": success, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, + "success": success, + "error": None, } @@ - all_results.append({ + LOGGER.exception("ERROR validating question: %s", question) + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", - "success": False, - "error": str(e), + "response_content": "", + "response_length": 0, + "event_count": 0, + "success": False, + "error": str(e), }) @@ - total = len(test_cases) - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + total_turns = len(test_cases) + summary: ValidationSummary = { + "total_turns": total_turns, + "successful_turns": successful_turns, + "failed_turns": total_turns - successful_turns, + "success_rate": successful_turns / total_turns if total_turns > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") + LOGGER.info(f"Total: {summary['total_turns']}") + LOGGER.info(f"Passed: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") @@ - return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary}) + return cast(ValidationResult, {"success": successful_turns == total_turns, "results": all_results, "summary": summary})tests/llama_stack/conftest.py (1)
200-210: Replace zero‑width default in telemetry service_nameThe U+200B default is invisible and confuses tooling. Use a visible default.
- service_name: "${env.OTEL_SERVICE_NAME:=\u200b}" + service_name: "${env.OTEL_SERVICE_NAME:=llama-stack}"
🧹 Nitpick comments (5)
tests/llama_stack/conftest.py (5)
262-269: Silence unused-fixture-arg lints without changing behaviorThese deps are intentionally unused to enforce ordering. Assign to
_to satisfy linters.def llama_stack_distribution( @@ - llama_stack_config_map: ConfigMap, + llama_stack_config_map: ConfigMap, ) -> Generator[LlamaStackDistribution, None, None]: + _ = enabled_llama_stack_operator # ensure fixture execution (lint: ARG001) + _ = llama_stack_config_map # ensure fixture execution (lint: ARG001) with create_llama_stack_distribution(
312-329: Preserve traceback on port-forward failureUse LOGGER.exception to capture stacktrace.
- except Exception as e: - LOGGER.error(f"Failed to set up port forwarding: {e}") - raise + except Exception: + LOGGER.exception("Failed to set up port forwarding") + raise
356-385: Vector store: ensure unique name; verify delete signature
- Name “test_vector_store” may collide across parallel runs; add a short UUID suffix.
- Please verify vector_stores.delete signature; some clients expect positional id (delete(vector_store.id)).
- vector_store = llama_stack_client.vector_stores.create( - name="test_vector_store", + vector_store = llama_stack_client.vector_stores.create( + name=f"test_vector_store_{uuid4().hex[:8]}", embedding_model=llama_stack_models.embedding_model.identifier, embedding_dimension=llama_stack_models.embedding_dimension, ) @@ - llama_stack_client.vector_stores.delete(id=vector_store.id) + llama_stack_client.vector_stores.delete(vector_store.id) LOGGER.info(f"Deleted vector store {vector_store.id}")Add import (top of file):
from uuid import uuid4
387-427: File download/upload: cleaner control flow and better diagnosticsMove the success return into an else, and log with traceback on failures. Optional: replace the “not sure why this is needed” comment with a TODO plus reason or pass a filename/content-type to files.create if supported.
- try: + try: response = requests.get(url, timeout=30) response.raise_for_status() @@ - return True - - except (requests.exceptions.RequestException, Exception) as e: - LOGGER.warning(f"Failed to download and upload file {url}: {e}") - raise + except (requests.exceptions.RequestException, Exception): + LOGGER.exception("Failed to download and upload file: %s", url) + raise + else: + return True
446-459: Gracefully skip in disconnected environmentsAvoid failures on air‑gapped clusters by skipping when a DISCONNECTED/AIRGAPPED flag is set.
def vector_store_with_docs(llama_stack_client: LlamaStackClient, vector_store: Any) -> Generator[Any, None, None]: @@ - # Download TorchTune documentation files + # Skip when running in disconnected environments + if os.getenv("DISCONNECTED", "").lower() in {"1", "true", "yes"} or os.getenv("AIRGAPPED", "").lower() in {"1", "true", "yes"}: + pytest.skip("Skipping doc download in disconnected environment") + + # Download TorchTune documentation files
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/constants.py(2 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)utilities/rag_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/llama_stack/constants.py
- tests/llama_stack/rag/test_rag.py
🧰 Additional context used
🧬 Code graph analysis (2)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(297-328)
tests/llama_stack/conftest.py (3)
utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)tests/llama_stack/utils.py (2)
create_llama_stack_distribution(17-37)wait_for_llama_stack_client_ready(41-52)utilities/constants.py (1)
Timeout(213-224)
🪛 Ruff (0.12.2)
utilities/rag_utils.py
281-281: Do not catch blind exception: Exception
(BLE001)
293-293: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
293-293: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/llama_stack/conftest.py
266-266: Unused function argument: enabled_llama_stack_operator
(ARG001)
268-268: Unused function argument: llama_stack_config_map
(ARG001)
383-383: Do not catch blind exception: Exception
(BLE001)
422-422: Consider moving this statement to an else block
(TRY300)
🔇 Additional comments (1)
utilities/rag_utils.py (1)
11-16: ModelInfo NamedTuple: LGTMClear, minimal container for model metadata.
f51d180 to
70de5d8
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/llama_stack/rag/test_rag.py (1)
135-159: Avoid None dereference; normalize content and compare lowercased.Second assertion calls .lower() before None check; simplify both to be None-safe.
- content = response.output_message.content - assert content is not None, "LLM response content is None" - assert "model" in content, "The LLM didn't provide the expected answer to the prompt" + content = response.output_message.content or "" + assert "model" in content.lower(), "The LLM didn't provide the expected answer to the prompt" @@ - content = response.output_message.content.lower() - assert content is not None, "LLM response content is None" - assert "answer" in content, "The LLM didn't provide the expected answer to the prompt" + content = response.output_message.content or "" + assert "answer" in content.lower(), "The LLM didn't provide the expected answer to the prompt"
♻️ Duplicate comments (5)
tests/llama_stack/conftest.py (4)
317-349: Consider skipping on disconnected clusters.This fixture does network I/O; mark with a custom marker/skip if offline to reduce CI flakes.
215-238: Return embedding model identifier to match ModelInfo typing.Fixture returns an embedding model object but ModelInfo.embedding_model is str. Also add robust checks (existing review suggested this earlier).
- models = llama_stack_client.models.list() - model_id = next(m for m in models if m.api_model_type == "llm").identifier - - embedding_model = next(m for m in models if m.api_model_type == "embedding") - embedding_dimension = embedding_model.metadata["embedding_dimension"] - - return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension) + models = llama_stack_client.models.list() + llm = next((m for m in models if m.api_model_type == "llm"), None) + if not llm: + raise ValueError("No LLM model found in LlamaStackClient.models.list()") + emb = next((m for m in models if m.api_model_type == "embedding"), None) + if not emb: + raise ValueError("No embedding model found in LlamaStackClient.models.list()") + md = getattr(emb, "metadata", {}) or {} + if "embedding_dimension" not in md: + raise ValueError("Embedding model metadata missing 'embedding_dimension'") + return ModelInfo(model_id=llm.identifier, embedding_model=emb.identifier, embedding_dimension=md["embedding_dimension"])
240-271: Stop using .identifier at call sites; pass string embedding model id.This removes type: ignore and matches ModelInfo.
vector_store = llama_stack_client.vector_stores.create( name="test_vector_store", - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, embedding_dimension=llama_stack_models.embedding_dimension, )
126-145: Fix unbound variable in llama_stack_user_config_configmap; add clearer FileNotFound error.llama_stack_user_config_configmap_path can be referenced before assignment when unparameterized.
- if hasattr(request, "param"): - if request.param.get("llama_stack_user_config_configmap_path"): - llama_stack_user_config_configmap_path = request.param.get("llama_stack_user_config_configmap_path") - - assert llama_stack_user_config_configmap_path is not None, ( + params = getattr(request, "param", {}) or {} + llama_stack_user_config_configmap_path = params.get("llama_stack_user_config_configmap_path") + assert llama_stack_user_config_configmap_path, ( "Fixture parameter llama_stack_user_config_configmap_path is required" ) config_path = Path(__file__).parent / llama_stack_user_config_configmap_path - - with open(config_path, "r") as f: - llama_stack_userconfig_configmap_data = f.read() + try: + with open(config_path, "r") as f: + llama_stack_userconfig_configmap_data = f.read() + except FileNotFoundError: + raise FileNotFoundError(f"User-config ConfigMap file not found at: {config_path}")utilities/rag_utils.py (1)
226-305: Align validate_api_responses to TurnResult/ValidationSummary shapes; guard None; improve logging.Results use response instead of response_content and the summary keys don’t match the public types. Also .lower() on None can crash; and errors should use LOGGER.exception.
Apply this diff:
def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful = 0 @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, "success": success, + "error": None, } @@ - except Exception as e: - all_results.append({ + except Exception as e: + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", + "response_content": "", + "response_length": 0, + "event_count": 0, "success": False, "error": str(e), }) - LOGGER.error(f"[{idx}] ERROR: {str(e)}") + LOGGER.exception(f"[{idx}] ERROR while validating API response for: {question}") @@ - total = len(test_cases) - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + total = len(test_cases) + summary: ValidationSummary = { + "total_turns": total, + "successful_turns": successful, + "failed_turns": total - successful, + "success_rate": successful / total if total > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") - LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") + LOGGER.info(f"Total turns: {summary['total_turns']}") + LOGGER.info(f"Successful: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") + LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") @@ - return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary}) + return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})
🧹 Nitpick comments (4)
tests/llama_stack/utils.py (1)
75-102: Harden response function: type hint and None-safe return.Return type should reflect a string-producing callable, and guard output_text being None.
-def create_response_function( - llama_stack_client: LlamaStackClient, llama_stack_models: ModelInfo, vector_store: VectorStore -) -> Callable: +def create_response_function( + llama_stack_client: LlamaStackClient, llama_stack_models: ModelInfo, vector_store: VectorStore +) -> Callable[..., str]: @@ - def _response_fn(*, question: str) -> str: + def _response_fn(*, question: str) -> str: @@ - return response.output_text + return response.output_text or ""tests/llama_stack/conftest.py (2)
180-213: Log full traceback when port-forward fails.Use LOGGER.exception in except to aid debugging.
- except Exception as e: - LOGGER.error(f"Failed to set up port forwarding: {e}") + except Exception: + LOGGER.exception("Failed to set up port forwarding") raise
273-315: Make temp-file upload cross‑platform and improve error logging.NamedTemporaryFile reopened on Windows can fail; use delete=False and unlink. Also log traceback on failures and tighten type for vector_store.
-@retry( +@retry( wait_timeout=Timeout.TIMEOUT_1MIN, sleep=5, exceptions_dict={requests.exceptions.RequestException: [], Exception: []}, ) -def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: Any) -> bool: +def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: VectorStore) -> bool: @@ - with tempfile.NamedTemporaryFile(mode="wb", suffix=f"_{local_file_name}") as temp_file: - temp_file.write(response.content) - temp_file_path = temp_file.name - - # Upload saved file to LlamaStack - with open(temp_file_path, "rb") as file_to_upload: - uploaded_file = llama_stack_client.files.create(file=file_to_upload, purpose="assistants") - - # Add file to vector store - llama_stack_client.vector_stores.files.create(vector_store_id=vector_store.id, file_id=uploaded_file.id) - - return True + with tempfile.NamedTemporaryFile(mode="wb", suffix=f"_{local_file_name}", delete=False) as temp_file: + temp_file.write(response.content) + temp_file_path = temp_file.name + # Upload saved file to LlamaStack + with open(temp_file_path, "rb") as file_to_upload: + uploaded_file = llama_stack_client.files.create(file=file_to_upload, purpose="assistants") + # Add file to vector store + llama_stack_client.vector_stores.files.create(vector_store_id=vector_store.id, file_id=uploaded_file.id) + os.remove(temp_file_path) + return True @@ - except (requests.exceptions.RequestException, Exception) as e: - LOGGER.warning(f"Failed to download and upload file {url}: {e}") + except (requests.exceptions.RequestException, Exception): + LOGGER.exception(f"Failed to download and upload file {url}") raisetests/llama_stack/rag/test_rag.py (1)
36-43: Silence linter about unused fixture parameter llama_stack_user_config_configmap.It’s intentionally present to ensure setup; add a no-op delete at the top of each test body.
def test_rag_inference_embeddings( @@ ) -> None: + # Fixture is intentionally unused, ensures configmap exists + del llama_stack_user_config_configmapRepeat the same two-line snippet at the top of the other tests that include this fixture.
Also applies to: 60-65, 120-125, 161-166, 257-263, 289-296, 318-323
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)tests/llama_stack/utils.py(2 hunks)utilities/rag_utils.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.
Applied to files:
tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (4)
tests/llama_stack/utils.py (2)
tests/llama_stack/conftest.py (4)
llama_stack_client(181-212)vector_store(241-270)llama_stack_distribution(148-162)llama_stack_models(216-237)utilities/rag_utils.py (2)
TurnExpectation(42-45)ModelInfo(11-16)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(181-212)
tests/llama_stack/conftest.py (4)
tests/llama_stack/utils.py (2)
create_llama_stack_distribution(20-40)wait_for_llama_stack_client_ready(44-55)utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)utilities/constants.py (1)
Timeout(213-224)tests/conftest.py (2)
admin_client(68-69)model_namespace(100-120)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (5)
llama_stack_client(181-212)vector_store(241-270)llama_stack_user_config_configmap(110-144)llama_stack_models(216-237)vector_store_with_docs(318-348)utilities/rag_utils.py (3)
validate_rag_agent_responses(84-223)validate_api_responses(226-305)ModelInfo(11-16)tests/llama_stack/utils.py (3)
get_torchtune_test_expectations(58-72)create_response_function(75-103)_response_fn(90-101)
🪛 Ruff (0.12.2)
utilities/rag_utils.py
277-277: Do not catch blind exception: Exception
(BLE001)
288-288: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
288-288: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/llama_stack/conftest.py
269-269: Do not catch blind exception: Exception
(BLE001)
310-310: Consider moving this statement to an else block
(TRY300)
tests/llama_stack/rag/test_rag.py
40-40: Unused method argument: llama_stack_user_config_configmap
(ARG002)
63-63: Unused method argument: llama_stack_user_config_configmap
(ARG002)
123-123: Unused method argument: llama_stack_user_config_configmap
(ARG002)
164-164: Unused method argument: llama_stack_user_config_configmap
(ARG002)
261-261: Unused method argument: llama_stack_user_config_configmap
(ARG002)
293-293: Unused method argument: llama_stack_user_config_configmap
(ARG002)
321-321: Unused method argument: llama_stack_user_config_configmap
(ARG002)
🔇 Additional comments (4)
tests/llama_stack/utils.py (1)
58-73: LGTM on expectation mapping.Clean conversion to TurnExpectation dicts.
tests/llama_stack/conftest.py (2)
53-63: LGTM on safe params override.Using params = getattr(..., {}) avoids AttributeError on unparameterized fixtures.
96-105: LGTM on optional storage/userConfig wiring.Nice gating behind params; command override is clear.
tests/llama_stack/rag/test_rag.py (1)
305-316: LGTM on responses+vector store validation flow.Using create_response_function and validate_api_responses is clean. Will pass once shapes in rag_utils are aligned and None is guarded.
| try: | ||
| llama_stack_client.vector_dbs.register( | ||
| vector_db_id=vector_db_id, | ||
| embedding_model=embedding_model.identifier, | ||
| embedding_dimension=embedding_dimension, # type: ignore | ||
| embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore | ||
| embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore | ||
| provider_id="milvus", | ||
| ) | ||
|
|
||
| # Calculate embeddings | ||
| embeddings_response = llama_stack_client.inference.embeddings( | ||
| model_id=embedding_model.identifier, | ||
| model_id=llama_stack_models.embedding_model.identifier, # type: ignore | ||
| contents=["First chunk of text"], | ||
| output_dimension=embedding_dimension, # type: ignore | ||
| output_dimension=llama_stack_models.embedding_dimension, # type: ignore | ||
| ) | ||
|
|
There was a problem hiding this comment.
Fix vector DB register/embedding calls to use string embedding model id.
Prevents attribute errors and removes ignores.
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
- embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
- embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore
+ embedding_model=llama_stack_models.embedding_model,
+ embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)
@@
embeddings_response = llama_stack_client.inference.embeddings(
- model_id=llama_stack_models.embedding_model.identifier, # type: ignore
+ model_id=llama_stack_models.embedding_model,
contents=["First chunk of text"],
- output_dimension=llama_stack_models.embedding_dimension, # type: ignore
+ output_dimension=llama_stack_models.embedding_dimension,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| llama_stack_client.vector_dbs.register( | |
| vector_db_id=vector_db_id, | |
| embedding_model=embedding_model.identifier, | |
| embedding_dimension=embedding_dimension, # type: ignore | |
| embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore | |
| embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore | |
| provider_id="milvus", | |
| ) | |
| # Calculate embeddings | |
| embeddings_response = llama_stack_client.inference.embeddings( | |
| model_id=embedding_model.identifier, | |
| model_id=llama_stack_models.embedding_model.identifier, # type: ignore | |
| contents=["First chunk of text"], | |
| output_dimension=embedding_dimension, # type: ignore | |
| output_dimension=llama_stack_models.embedding_dimension, # type: ignore | |
| ) | |
| try: | |
| llama_stack_client.vector_dbs.register( | |
| vector_db_id=vector_db_id, | |
| embedding_model=llama_stack_models.embedding_model, | |
| embedding_dimension=llama_stack_models.embedding_dimension, | |
| provider_id="milvus", | |
| ) | |
| # Calculate embeddings | |
| embeddings_response = llama_stack_client.inference.embeddings( | |
| model_id=llama_stack_models.embedding_model, | |
| contents=["First chunk of text"], | |
| output_dimension=llama_stack_models.embedding_dimension, | |
| ) |
🤖 Prompt for AI Agents
In tests/llama_stack/rag/test_rag.py around lines 77 to 91, the vector DB
register and embeddings calls are passing a non-string attribute (e.g.
.identifier) with type ignores; replace those with the string embedding model id
instead. Specifically, pass llama_stack_models.embedding_model (the string id)
to vector_dbs.register as embedding_model and to inference.embeddings as
model_id, keep embedding_dimension as-is, and remove the type: ignore comments.
| llama_stack_client.vector_dbs.register( | ||
| vector_db_id=vector_db_id, | ||
| embedding_model=embedding_model.identifier, | ||
| embedding_dimension=embedding_dimension, | ||
| embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore | ||
| embedding_dimension=llama_stack_models.embedding_dimension, | ||
| provider_id="milvus", | ||
| ) | ||
|
|
There was a problem hiding this comment.
Use string embedding model id for RAG DB registration.
llama_stack_client.vector_dbs.register(
vector_db_id=vector_db_id,
- embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore
- embedding_dimension=llama_stack_models.embedding_dimension,
+ embedding_model=llama_stack_models.embedding_model,
+ embedding_dimension=llama_stack_models.embedding_dimension,
provider_id="milvus",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| llama_stack_client.vector_dbs.register( | |
| vector_db_id=vector_db_id, | |
| embedding_model=embedding_model.identifier, | |
| embedding_dimension=embedding_dimension, | |
| embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore | |
| embedding_dimension=llama_stack_models.embedding_dimension, | |
| provider_id="milvus", | |
| ) | |
| llama_stack_client.vector_dbs.register( | |
| vector_db_id=vector_db_id, | |
| embedding_model=llama_stack_models.embedding_model, | |
| embedding_dimension=llama_stack_models.embedding_dimension, | |
| provider_id="milvus", | |
| ) |
🤖 Prompt for AI Agents
In tests/llama_stack/rag/test_rag.py around lines 180 to 186, the RAG DB
registration must receive a string ID for the embedding model rather than an
object; change the embedding_model argument to pass the embedding model's string
identifier (for example llama_stack_models.embedding_model.identifier or
str(llama_stack_models.embedding_model.identifier)), remove the unnecessary "#
type: ignore" and ensure the value is a plain str so the vector DB receives the
model id.
d627a19 to
d837947
Compare
d837947 to
159f222
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/llama_stack/rag/test_rag.py (1)
1-352: Sweep: remove .identifier dereferences for embedding models (fix ModelInfo usage).
- tests/llama_stack/conftest.py:254 — change embedding_model=llama_stack_models.embedding_model.identifier → embedding_model=llama_stack_models.embedding_model
- tests/llama_stack/rag/test_rag.py:47, 76, 83, 176 — remove
.identifierfrom llama_stack_models.embedding_model in model_id/embedding_model args- tests/llama_stack/core/test_llamastack_core.py:42 — change embedding_model_id = embedding_model.identifier → embedding_model_id = embedding_model
Re-run the ripgrep check to confirm zero matches.
♻️ Duplicate comments (7)
utilities/rag_utils.py (1)
226-305: Align validate_api_responses shapes with TurnResult/ValidationSummary; guard None; improve logging.
- Per‑turn keys differ from TurnResult (response vs response_content, missing response_length/event_count/error).
- Summary keys differ from ValidationSummary.
- .lower() on a None response will crash; guard it.
- Prefer LOGGER.exception to retain stack traces.
Apply this diff:
def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful = 0 @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, "success": success, + "error": None, } @@ - all_results.append({ + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", - "success": False, - "error": str(e), + "response_content": "", + "response_length": 0, + "event_count": 0, + "success": False, + "error": str(e), }) - LOGGER.error(f"[{idx}] ERROR: {str(e)}") + LOGGER.exception(f"[{idx}] ERROR while validating API response") @@ - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + summary: ValidationSummary = { + "total_turns": total, + "successful_turns": successful, + "failed_turns": total - successful, + "success_rate": successful / total if total > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") - LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") + LOGGER.info(f"Total: {summary['total_turns']}") + LOGGER.info(f"Passed: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") + LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") @@ - return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary}) + return cast(ValidationResult, {"success": successful == total, "results": all_results, "summary": summary})tests/llama_stack/conftest.py (3)
120-131: Fix unbound variable in ConfigMap fixture and simplify param access.llama_stack_user_config_configmap_path can be referenced before assignment when the fixture is used without params.
- if hasattr(request, "param"): - if request.param.get("llama_stack_user_config_configmap_path"): - llama_stack_user_config_configmap_path = request.param.get("llama_stack_user_config_configmap_path") - - assert llama_stack_user_config_configmap_path is not None, ( + params = getattr(request, "param", {}) or {} + llama_stack_user_config_configmap_path = params.get("llama_stack_user_config_configmap_path") + assert llama_stack_user_config_configmap_path, ( "Fixture parameter llama_stack_user_config_configmap_path is required" ) config_path = Path(__file__).parent / llama_stack_user_config_configmap_path - with open(config_path, "r") as f: - llama_stack_userconfig_configmap_data = f.read() + llama_stack_userconfig_configmap_data = Path(config_path).read_text()
209-232: Return string ids in ModelInfo and harden model discovery.Align with utilities.rag_utils.ModelInfo (embedding_model: str) and avoid StopIteration/KeyError.
def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo: @@ - models = llama_stack_client.models.list() - model_id = next(m for m in models if m.api_model_type == "llm").identifier - - embedding_model = next(m for m in models if m.api_model_type == "embedding") - embedding_dimension = embedding_model.metadata["embedding_dimension"] - - return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension) + models = llama_stack_client.models.list() + llm = next((m for m in models if m.api_model_type == "llm"), None) + if not llm: + raise ValueError("No LLM model found in LlamaStackClient.models.list()") + emb = next((m for m in models if m.api_model_type == "embedding"), None) + if not emb: + raise ValueError("No embedding model found in LlamaStackClient.models.list()") + md = getattr(emb, "metadata", {}) or {} + if "embedding_dimension" not in md: + raise ValueError("Embedding model metadata missing 'embedding_dimension'") + return ModelInfo(model_id=llm.identifier, embedding_model=emb.identifier, embedding_dimension=md["embedding_dimension"])
251-256: Pass the string embedding model id; remove type ignore.vector_store = llama_stack_client.vector_stores.create( name="test_vector_store", - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, embedding_dimension=llama_stack_models.embedding_dimension, )tests/llama_stack/rag/test_rag.py (3)
47-50: Use string embedding model id and remove type ignore.- embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, - contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore - ) + embeddings_response = llama_stack_client.inference.embeddings( + model_id=llama_stack_models.embedding_model, + contents=["First chunk of text"], + output_dimension=llama_stack_models.embedding_dimension, + )
74-86: Fix vector DB register and embeddings calls to use string id.llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore - embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore + embedding_model=llama_stack_models.embedding_model, + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", ) @@ - embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, # type: ignore + embeddings_response = llama_stack_client.inference.embeddings( + model_id=llama_stack_models.embedding_model, contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore + output_dimension=llama_stack_models.embedding_dimension, )
174-178: Pass string embedding model id for RAG DB registration.llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore - embedding_dimension=llama_stack_models.embedding_dimension, + embedding_model=llama_stack_models.embedding_model, + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", )
🧹 Nitpick comments (5)
tests/llama_stack/conftest.py (4)
96-101: Wire in user-config support and keep unparameterized-safe access.PR description mentions llama_stack_user_config_enabled and ConfigMap usage, but server_config doesn’t act on it. Add optional userConfig and command wiring when the param is set.
if params.get("llama_stack_storage_size"): storage_size = params.get("llama_stack_storage_size") server_config["storage"] = {"size": storage_size} + # Optional: enable custom user config via ConfigMap + if params.get("llama_stack_user_config_enabled"): + # Ensure ConfigMap exists (triggers the fixture) + request.getfixturevalue("llama_stack_user_config_configmap") + server_config["userConfig"] = {"configMapName": "llama-stack-user-config"} + # Ensure the container runs with explicit run config + server_config["containerSpec"].setdefault("command", ["/bin/sh", "-c", "llama stack run /etc/llama-stack/run.yaml"]) + return server_configPlease confirm the run.yaml path is correct for your image.
260-265: Avoid blind except; include traceback.Catching bare Exception hides actionable details. If you need the broad catch for cleanup, at least log the stack trace.
- except Exception as e: - LOGGER.warning(f"Failed to delete vector store {vector_store.id}: {e}") + except Exception: + LOGGER.warning(f"Failed to delete vector store {vector_store.id}", exc_info=True) # noqa: BLE001
267-271: Retry only network errors and log with traceback.Broadened retries on all Exception can mask logic/file errors. Limit retries to RequestException and log properly.
@retry( wait_timeout=Timeout.TIMEOUT_1MIN, sleep=5, - exceptions_dict={requests.exceptions.RequestException: [], Exception: []}, + exceptions_dict={requests.exceptions.RequestException: []}, ) def _download_and_upload_file(url: str, llama_stack_client: LlamaStackClient, vector_store: Any) -> bool: @@ - except (requests.exceptions.RequestException, Exception) as e: - LOGGER.warning(f"Failed to download and upload file {url}: {e}") - raise + except requests.exceptions.RequestException: + LOGGER.warning(f"Failed to download and upload file {url}", exc_info=True) + raiseAlso applies to: 306-308
190-206: Use LOGGER.exception to preserve traceback on port-forward setup failure.- except Exception as e: - LOGGER.error(f"Failed to set up port forwarding: {e}") + except Exception: + LOGGER.exception("Failed to set up port forwarding") raisetests/llama_stack/rag/test_rag.py (1)
17-22: Avoid no-op storage size param in smoke param set.An empty string makes the new llama_stack_storage_size ineffective; either drop the key or set a real value (e.g., "10Gi") if you need to exercise the code path.
pytest.param( {"name": "test-llamastack-rag"}, - {"llama_stack_storage_size": ""}, + {}, # no storage override in smoke run ),
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)tests/llama_stack/utils.py(2 hunks)utilities/rag_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.
Applied to files:
tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (4)
tests/llama_stack/utils.py (2)
create_llama_stack_distribution(20-40)wait_for_llama_stack_client_ready(44-55)utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)utilities/constants.py (1)
Timeout(213-224)tests/conftest.py (2)
admin_client(68-69)model_namespace(100-120)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
llama_stack_client(175-206)vector_store(235-264)llama_stack_models(210-231)vector_store_with_docs(312-342)utilities/rag_utils.py (2)
validate_api_responses(226-305)ModelInfo(11-16)tests/llama_stack/utils.py (3)
get_torchtune_test_expectations(58-72)create_response_function(75-103)_response_fn(90-101)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(175-206)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py
263-263: Do not catch blind exception: Exception
(BLE001)
304-304: Consider moving this statement to an else block
(TRY300)
utilities/rag_utils.py
277-277: Do not catch blind exception: Exception
(BLE001)
288-288: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
288-288: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (6)
tests/llama_stack/rag/test_rag.py (5)
130-132: LGTM — agent creation uses ModelInfo.model_id correctly.
151-153: Assert before lowercasing to avoid potential NoneType errors.Reorder to check for None before calling .lower().
- content = response.output_message.content.lower() - assert content is not None, "LLM response content is None" + assert response.output_message.content is not None, "LLM response content is None" + content = response.output_message.content.lower()[raise_nitpick_refactor]
251-281: LGTM — simple responses smoke covers the endpoint well.Assertions are minimal and appropriate for smoke.
297-307: Minor: adjust assertion message and rely on aligned result shape.Change the failure message to reflect API validation (not agent), and after aligning validate_api_responses, the summary keys change; keeping generic Summary: {validation_result['summary']} is fine.
- assert validation_result["success"], f"RAG agent validation failed. Summary: {validation_result['summary']}" + assert validation_result["success"], f"API responses validation failed. Summary: {validation_result['summary']}"[raise_nitpick_refactor]
331-351: LGTM — vector store search validations are solid.Covers shape and content presence without over-constraining.
utilities/rag_utils.py (1)
11-16: Update callers/tests: ModelInfo.embedding_model is now a strDownstream code treats embedding_model as an object with .identifier — change callers to pass the string id (e.g., m.identifier) and remove any .identifier dereferences.
Check/modify: tests/llama_stack/conftest.py (llama_stack_models, vector_store) and tests/llama_stack/rag/test_rag.py.
Open a follow-up PR to update those tests?
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (7)
tests/llama_stack/conftest.py (4)
169-191: Make model discovery robust and guard against missing models/metadata.The function uses
next()without defaults and direct metadata access, which can raiseStopIterationorKeyErrorif models are missing or metadata is incomplete.Apply this diff to add proper error handling:
def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo: """ Returns model information from the LlamaStack client. @@ -183,10 +183,18 @@ def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo: ModelInfo: NamedTuple containing model information """ models = llama_stack_client.models.list() - model_id = next(m for m in models if m.api_model_type == "llm").identifier + llm = next((m for m in models if m.api_model_type == "llm"), None) + if not llm: + raise ValueError("No LLM model found in LlamaStackClient.models.list()") - embedding_model = next(m for m in models if m.api_model_type == "embedding") - embedding_dimension = embedding_model.metadata["embedding_dimension"] + embedding_model = next((m for m in models if m.api_model_type == "embedding"), None) + if not embedding_model: + raise ValueError("No embedding model found in LlamaStackClient.models.list()") + + md = getattr(embedding_model, "metadata", {}) or {} + if "embedding_dimension" not in md: + raise ValueError("Embedding model metadata missing 'embedding_dimension'") - return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension) + return ModelInfo( + model_id=llm.identifier, + embedding_model=embedding_model.identifier, + embedding_dimension=md["embedding_dimension"] + )
214-216: Fix embedding model parameter - use string identifier not object.Based on the
ModelInfotyping inutilities/rag_utils.py,embedding_modelshould be a string, but you're passing an object with.identifierand using# type: ignore.Apply this diff to use the string identifier:
vector_store = llama_stack_client.vector_stores.create( name="test_vector_store", - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, embedding_dimension=llama_stack_models.embedding_dimension, )
221-221: Use correct parameter name for vector store deletion.The
llama_stack_client.vector_stores.deletemethod expectsvector_store_idinstead ofidaccording to the OpenAPI specification.Apply this diff to fix the parameter name:
- llama_stack_client.vector_stores.delete(vector_store_id=vector_store.id) + llama_stack_client.vector_stores.delete(vector_store_id=vector_store.id)Note: The current code actually already uses
vector_store_id=, so this is already correct. Good implementation!
272-302: Verify network access in air-gapped environments.This fixture downloads files from external URLs which will fail in air-gapped/disconnected clusters.
Consider adding a parameter or environment variable to skip this fixture in disconnected environments, or provide an alternative mechanism for testing without external dependencies.
tests/llama_stack/rag/test_rag.py (3)
35-49: Fix embedding model identifier usage.The test is accessing
.identifieron the embedding model, but based on theModelInfotyping,embedding_modelshould already be a string.Apply this diff to use the string directly:
embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, + model_id=llama_stack_models.embedding_model, contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore + output_dimension=llama_stack_models.embedding_dimension, )
76-85: Fix embedding model identifier usage.Same issue with accessing
.identifieron the embedding model.Apply this diff:
llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, - embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", ) # Calculate embeddings embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, # type: ignore + model_id=llama_stack_models.embedding_model, contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore + output_dimension=llama_stack_models.embedding_dimension, )
176-177: Fix embedding model identifier usage.Same issue with accessing
.identifieron the embedding model.Apply this diff:
llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, - embedding_dimension=llama_stack_models.embedding_dimension, + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", )
🧹 Nitpick comments (2)
tests/llama_stack/conftest.py (1)
227-231: Consider making exceptions more specific in retry decorator.The retry decorator catches broad
Exceptiontypes. Consider being more specific about which exceptions should trigger retries.Apply this diff to be more specific:
@retry( wait_timeout=Timeout.TIMEOUT_1MIN, sleep=5, - exceptions_dict={requests.exceptions.RequestException: [], Exception: []}, + exceptions_dict={requests.exceptions.RequestException: [], OSError: []}, )utilities/rag_utils.py (1)
226-305: Address inconsistent response field and improve exception handling.The function has two issues:
- Per-turn results use
"response"key, butTurnResultTypedDict expects"response_content"- The function catches broad
Exceptionwhich is flagged by static analysisApply this diff to align with types and improve exception handling:
result = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, + "response_content": response, + "response_length": len(response), + "event_count": 0, "success": success, + "error": None, } - except Exception as e: + except (requests.RequestException, ValueError, TypeError) as e: all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", + "response_content": "", + "response_length": 0, + "event_count": 0, "success": False, "error": str(e), }) - LOGGER.error(f"[{idx}] ERROR: {str(e)}") + LOGGER.exception(f"[{idx}] ERROR: {e}") total = len(test_cases) summary = { - "total": total, - "passed": successful, - "failed": total - successful, + "total_turns": total, + "successful_turns": successful, + "failed_turns": total - successful, "success_rate": successful / total if total > 0 else 0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), } LOGGER.info("\n" + "=" * 40) LOGGER.info("Validation Summary:") - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") + LOGGER.info(f"Total: {summary['total_turns']}") + LOGGER.info(f"Passed: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") LOGGER.info(f"Success rate: {summary['success_rate']:.1%}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)tests/llama_stack/utils.py(2 hunks)utilities/rag_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.
Applied to files:
tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
tests/llama_stack/utils.py (2)
create_llama_stack_distribution(20-40)wait_for_llama_stack_client_ready(44-55)utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)utilities/constants.py (1)
Timeout(213-224)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(135-166)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
llama_stack_client(135-166)vector_store(195-224)llama_stack_models(170-191)vector_store_with_docs(272-302)utilities/rag_utils.py (3)
validate_rag_agent_responses(84-223)validate_api_responses(226-305)ModelInfo(11-16)tests/llama_stack/utils.py (3)
get_torchtune_test_expectations(58-72)create_response_function(75-103)_response_fn(90-101)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py
223-223: Do not catch blind exception: Exception
(BLE001)
264-264: Consider moving this statement to an else block
(TRY300)
utilities/rag_utils.py
277-277: Do not catch blind exception: Exception
(BLE001)
288-288: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
288-288: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (21)
tests/llama_stack/conftest.py (8)
2-2: LGTM!Properly imported
tempfilefor file handling in the new document upload functionality.
7-7: LGTM!Added
requestsimport for HTTP operations in the document download functionality.
11-11: LGTM!Correctly imported
VectorStoretype for the new vector store fixtures.
17-17: LGTM!Added
retrydecorator import for robust file download operations.
22-22: LGTM!Added
ModelInfoimport to use the centralized model information container.
52-60: LGTM!The parameterized server configuration approach is well-implemented, allowing flexible configuration through request parameters while maintaining safe defaults.
94-98: LGTM!The storage size configuration is correctly applied when the parameter is provided. The implementation maintains backward compatibility.
232-269: LGTM - Robust file download and upload implementation.The function correctly handles file downloads with proper error handling, converts RST to TXT extensions, and uploads files to the vector store. The retry decorator and logging provide good resilience.
tests/llama_stack/rag/test_rag.py (10)
7-7: LGTM!Correctly imported
VectorStoretype for the new vector store tests.
9-9: LGTM!Added proper imports for the validation utilities and
ModelInfocontainer.
11-11: LGTM!Imported helper functions from utils module for test expectations and response function creation.
17-25: LGTM!Added
llama_stack_server_configto parameterization with empty storage size. This enables the server configuration to be properly injected into tests.
130-132: LGTM!Correctly uses
ModelInfofor agent creation with the model ID.
185-185: LGTM!Correctly uses
ModelInfofor RAG agent creation.
220-220: LGTM!Uses the centralized test expectations from the helper function.
251-281: LGTM!Simple responses test is well-structured and properly validates basic LLM functionality. Good use of parameterized test cases with expected keywords.
283-308: LGTM!Excellent integration test that combines vector store with documents and the responses API. Uses the centralized helper functions and validation utilities effectively.
310-351: LGTM!Comprehensive vector store search test that validates the search functionality with multiple queries. Good validation of response structure and content.
utilities/rag_utils.py (3)
4-4: LGTM!Added necessary imports for the new functionality including
CallableandNamedTuple.
11-16: LGTM!The
ModelInfoNamedTuple is well-designed with proper type annotations. Theembedding_modelfield correctly typed asstrto hold the model identifier.
42-74: LGTM!The TypedDict definitions provide excellent type safety for the validation system. All fields are properly typed and documented through their usage context.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
utilities/rag_utils.py (1)
226-306: Align validate_api_responses with TurnResult/ValidationSummary and guard None responsesThe current keys don’t match the declared TypedDicts and
.lower()on None will crash. Align shapes, compute lengths, and use LOGGER.exception.def validate_api_responses( response_fn: Callable[..., str], test_cases: List[TurnExpectation], min_keywords_required: int = 1, ) -> ValidationResult: @@ - all_results = [] - successful = 0 + all_results: List[TurnResult] = [] + successful_turns = 0 @@ - try: - response = response_fn(question=question) - response_lower = response.lower() + try: + response_text = response_fn(question=question) or "" + response_lower = response_text.lower() @@ - found = [kw for kw in expected_keywords if kw.lower() in response_lower] - missing = [kw for kw in expected_keywords if kw.lower() not in response_lower] - success = len(found) >= min_keywords_required + found = [kw for kw in expected_keywords if kw.lower() in response_lower] + missing = [kw for kw in expected_keywords if kw.lower() not in response_lower] + success = len(found) >= min_keywords_required @@ - if success: - successful += 1 + if success: + successful_turns += 1 @@ - result = { + result: TurnResult = { "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": found, "missing_keywords": missing, - "response": response, + "response_content": response_text, + "response_length": len(response_text), + "event_count": 0, "success": success, + "error": None, } @@ - all_results.append({ + all_results.append({ "question": question, "description": description, "expected_keywords": expected_keywords, "found_keywords": [], "missing_keywords": expected_keywords, - "response": "", - "success": False, - "error": str(e), + "response_content": "", + "response_length": 0, + "event_count": 0, + "success": False, + "error": str(e), }) - LOGGER.error(f"[{idx}] ERROR: {str(e)}") + LOGGER.exception("[validate_api_responses] Error on case %s", idx) @@ - total = len(test_cases) - summary = { - "total": total, - "passed": successful, - "failed": total - successful, - "success_rate": successful / total if total > 0 else 0, - } + total = len(test_cases) + summary: ValidationSummary = { + "total_turns": total, + "successful_turns": successful_turns, + "failed_turns": total - successful_turns, + "success_rate": successful_turns / total if total > 0 else 0.0, + "total_events": sum(r["event_count"] for r in all_results), + "total_response_length": sum(r["response_length"] for r in all_results), + } @@ - LOGGER.info(f"Total: {summary['total']}") - LOGGER.info(f"Passed: {summary['passed']}") - LOGGER.info(f"Failed: {summary['failed']}") - LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") + LOGGER.info(f"Total turns: {summary['total_turns']}") + LOGGER.info(f"Successful: {summary['successful_turns']}") + LOGGER.info(f"Failed: {summary['failed_turns']}") + LOGGER.info(f"Success rate: {summary['success_rate']:.1%}") @@ - return cast("ValidationResult", {"success": successful == total, "results": all_results, "summary": summary}) + return cast(ValidationResult, {"success": successful_turns == total, "results": all_results, "summary": summary})tests/llama_stack/conftest.py (1)
169-191: Return string ids in ModelInfo and make discovery robustModelInfo.embedding_model is typed as str, but an object is returned. Also guard against missing models and bad metadata.
def llama_stack_models(llama_stack_client: LlamaStackClient) -> ModelInfo: @@ - models = llama_stack_client.models.list() - model_id = next(m for m in models if m.api_model_type == "llm").identifier - - embedding_model = next(m for m in models if m.api_model_type == "embedding") - embedding_dimension = embedding_model.metadata["embedding_dimension"] - - return ModelInfo(model_id=model_id, embedding_model=embedding_model, embedding_dimension=embedding_dimension) + models = llama_stack_client.models.list() + llm = next((m for m in models if m.api_model_type == "llm"), None) + if not llm: + raise ValueError("No LLM model found in LlamaStackClient.models.list()") + emb = next((m for m in models if m.api_model_type == "embedding"), None) + if not emb: + raise ValueError("No embedding model found in LlamaStackClient.models.list()") + md = getattr(emb, "metadata", {}) or {} + dim = md.get("embedding_dimension") + if not isinstance(dim, int): + raise ValueError("Embedding model metadata missing or invalid 'embedding_dimension'") + return ModelInfo(model_id=llm.identifier, embedding_model=str(emb.identifier), embedding_dimension=dim)
🧹 Nitpick comments (10)
utilities/rag_utils.py (1)
186-199: Log exceptions with stack tracesUse LOGGER.exception to aid debugging.
- except Exception as e: - LOGGER.error(f"Error processing turn '{question}': {str(e)}") + except Exception: + LOGGER.exception("Error processing turn %r", question) turn_result = {tests/llama_stack/conftest.py (4)
211-216: Pass the string embedding model id (drop .identifier and type: ignore)Matches ModelInfo typing and avoids runtime/type ignores.
vector_store = llama_stack_client.vector_stores.create( name="test_vector_store", - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore + embedding_model=llama_stack_models.embedding_model, embedding_dimension=llama_stack_models.embedding_dimension, )
165-166: Include traceback on port‑forward setup failureLOGGER.exception emits the stack trace; easier triage.
- LOGGER.error(f"Failed to set up port forwarding: {e}") + LOGGER.exception("Failed to set up port forwarding")
266-268: Narrow exception type and log with traceback in downloaderAvoid catching bare Exception; capture request and file I/O errors; include stack trace.
- except (requests.exceptions.RequestException, Exception) as e: - LOGGER.warning(f"Failed to download and upload file {url}: {e}") - raise + except (requests.exceptions.RequestException, OSError): + LOGGER.exception("Failed to download and upload file %s", url) + raise
287-301: Optional: ensure files are indexed before tests proceedIf vector-store ingestion is async, consider polling file status or search readiness to reduce flakiness.
tests/llama_stack/rag/test_rag.py (5)
47-50: Use string embedding model id and remove type ignoreAligns with ModelInfo typing and avoids attribute errors.
- embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, - contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore - ) + embeddings_response = llama_stack_client.inference.embeddings( + model_id=llama_stack_models.embedding_model, + contents=["First chunk of text"], + output_dimension=llama_stack_models.embedding_dimension, + )
74-86: Fix vector DB register/embeddings to use string embedding model idRemove .identifier and type: ignore.
llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore - embedding_dimension=llama_stack_models.embedding_dimension, # type: ignore + embedding_model=llama_stack_models.embedding_model, + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", ) @@ - embeddings_response = llama_stack_client.inference.embeddings( - model_id=llama_stack_models.embedding_model.identifier, # type: ignore - contents=["First chunk of text"], - output_dimension=llama_stack_models.embedding_dimension, # type: ignore - ) + embeddings_response = llama_stack_client.inference.embeddings( + model_id=llama_stack_models.embedding_model, + contents=["First chunk of text"], + output_dimension=llama_stack_models.embedding_dimension, + )
174-178: Use string embedding model id for RAG DB registrationConsistent typing; fewer surprises.
llama_stack_client.vector_dbs.register( vector_db_id=vector_db_id, - embedding_model=llama_stack_models.embedding_model.identifier, # type: ignore - embedding_dimension=llama_stack_models.embedding_dimension, + embedding_model=llama_stack_models.embedding_model, + embedding_dimension=llama_stack_models.embedding_dimension, provider_id="milvus", )
297-306: Ensure response_fn never returns None (prevents .lower() crash in validator)create_response_function currently returns response.output_text, which can be None. Either keep your validator fix (recommended) or update the helper to coerce to str.
Proposed change in tests/llama_stack/utils.py:
- return response.output_text + return response.output_text or ""
331-351: Minor: consider minimal retry around vector store searchIf indexing is eventual, add a short retry/backoff to reduce flakes in CI.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/llama_stack/conftest.py(4 hunks)tests/llama_stack/rag/test_rag.py(9 hunks)tests/llama_stack/utils.py(2 hunks)utilities/rag_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/utils.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T11:39:37.356Z
Learnt from: jgarciao
PR: opendatahub-io/opendatahub-tests#582
File: tests/llama_stack/conftest.py:258-265
Timestamp: 2025-09-16T11:39:37.356Z
Learning: The llama_stack_client.vector_stores.delete method expects the parameter name `vector_store_id` instead of `id` when deleting vector stores. This is based on the OpenAPI specification where the path parameter is named `vector_store_id`.
Applied to files:
tests/llama_stack/conftest.py
📚 Learning: 2025-06-06T12:22:57.057Z
Learnt from: dbasunag
PR: opendatahub-io/opendatahub-tests#338
File: tests/model_registry/rbac/test_mr_rbac.py:24-53
Timestamp: 2025-06-06T12:22:57.057Z
Learning: In the opendatahub-tests repository, prefer keeping test parameterization configurations inline rather than extracting them to separate variables/constants, as it makes triaging easier by avoiding the need to jump between different parts of the file to understand the test setup.
Applied to files:
tests/llama_stack/rag/test_rag.py
🧬 Code graph analysis (3)
tests/llama_stack/conftest.py (3)
tests/llama_stack/utils.py (2)
create_llama_stack_distribution(20-40)wait_for_llama_stack_client_ready(44-55)utilities/rag_utils.py (2)
create_llama_stack_distribution(20-39)ModelInfo(11-16)utilities/constants.py (1)
Timeout(213-224)
tests/llama_stack/rag/test_rag.py (3)
tests/llama_stack/conftest.py (4)
llama_stack_client(135-166)vector_store(195-224)llama_stack_models(170-191)vector_store_with_docs(272-302)utilities/rag_utils.py (3)
validate_rag_agent_responses(84-223)validate_api_responses(226-305)ModelInfo(11-16)tests/llama_stack/utils.py (3)
get_torchtune_test_expectations(58-72)create_response_function(75-103)_response_fn(90-101)
utilities/rag_utils.py (1)
tests/llama_stack/conftest.py (1)
llama_stack_client(135-166)
🪛 Ruff (0.12.2)
tests/llama_stack/conftest.py
223-223: Do not catch blind exception: Exception
(BLE001)
264-264: Consider moving this statement to an else block
(TRY300)
utilities/rag_utils.py
277-277: Do not catch blind exception: Exception
(BLE001)
288-288: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
288-288: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (1)
tests/llama_stack/rag/test_rag.py (1)
16-23: Confirm intention: empty storage size disables storage overrideAn empty string is falsy and won’t set server_config["storage"]. If you intended to provision storage, pass a size (e.g., "10Gi").
|
Status of building tag latest: success. |
…les (opendatahub-io#582) * test: add openai compatible tests for responses, vector stores and files api * feat: Add fixture parameters for LLamaStackDistribution creation Add fixture parameters: - llama_stack_user_config_enabled and lama_stack_user_config_configmap to configure custom llama-stack user configuration - llama_stack_storage_size to configure storage size Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix possible bug when no parameters are provided to the fixture Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix possible bug when no parameters are provided to the fixture Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix parameter name when calling vector_stores.delete Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: address review suggestions --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Co-authored-by: Jorge Garcia Oncins <jgarciao@redhat.com> Co-authored-by: Jorge <jgarciao@users.noreply.github.com>
…les (opendatahub-io#582) * test: add openai compatible tests for responses, vector stores and files api * feat: Add fixture parameters for LLamaStackDistribution creation Add fixture parameters: - llama_stack_user_config_enabled and lama_stack_user_config_configmap to configure custom llama-stack user configuration - llama_stack_storage_size to configure storage size Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix possible bug when no parameters are provided to the fixture Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix possible bug when no parameters are provided to the fixture Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: fix parameter name when calling vector_stores.delete Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: address review suggestions --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Co-authored-by: Jorge Garcia Oncins <jgarciao@redhat.com> Co-authored-by: Jorge <jgarciao@users.noreply.github.com>
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
New Features
Tests