Enhance vector store testing using Dataset#1296
Enhance vector store testing using Dataset#1296jgarciao merged 8 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/lgtm', '/wip', '/build-push-pr-image', '/verified', '/hold', '/cherry-pick'} |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a path-based Dataset abstraction and test corpora for IBM finance Q1–Q4 2025, introduces a class-scoped Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/llama_stack/utils.py (1)
82-82: Loose typing onattributesparameter.
attributes: Anyis inconsistent withvector_store_create_file_from_path(line 265) which usesdict[str, str | int | float | bool] | None. Align the types for consistency and type-checker benefits.Fix type annotation
- attributes: Any, + attributes: dict[str, str | int | float | bool] | None = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/utils.py` at line 82, Update the loose typing for the attributes parameter to match vector_store_create_file_from_path: replace attributes: Any with attributes: dict[str, str | int | float | bool] | None in the function signature (and any related internal type hints) so the parameter uses the same precise union mapping as vector_store_create_file_from_path for consistency and better type checking.tests/llama_stack/vector_io/test_vector_stores.py (1)
199-199: UnhandledStopIterationif no vector-mode QA records exist.
next()on an empty generator raisesStopIteration, which pytest won't report as a clear assertion failure. While unlikely with controlled test data, consider providing a default or explicit assertion.Option: Use next() with default and assert
- vector_question = next(r.question for r in dataset.load_qa(retrieval_mode="vector")) + vector_records = dataset.load_qa(retrieval_mode="vector") + assert vector_records, f"Dataset has no QA records with retrieval_mode='vector'" + vector_question = vector_records[0].question🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/vector_io/test_vector_stores.py` at line 199, The test currently assigns vector_question = next(r.question for r in dataset.load_qa(retrieval_mode="vector")) which will raise StopIteration if no vector-mode QA records are returned; change this to use a safe default and an explicit assertion: call next(..., None) on the generator returned by dataset.load_qa(retrieval_mode="vector") to get a possibly None vector_question, then assert that vector_question is not None (with a descriptive message) before using it, so missing vector-mode results produce a clear test failure instead of an unhandled exception.tests/llama_stack/conftest.py (1)
836-850: Silent precedence when bothdatasetanddoc_sourcesare provided.The docstring states these parameters are "mutually exclusive," but providing both causes
datasetto silently win. Consider logging a warning or raising an error if both are non-None to catch test misconfiguration early.Optional: Add explicit check
+ if dataset and doc_sources: + LOGGER.warning( + "Both 'dataset' and 'doc_sources' provided to vector_store fixture; " + "using 'dataset' (doc_sources ignored)" + ) if dataset or doc_sources: try: if dataset:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 836 - 850, The code silently prefers dataset when both dataset and doc_sources are provided in the block that calls vector_store_upload_dataset and vector_store_upload_doc_sources; add an explicit check at the start of that branch to detect both non-None and either raise a ValueError (to fail fast) or emit a clear warning via the test logger indicating the mutually exclusive parameters were both supplied, then proceed only with one path (or bail out). Reference the dataset and doc_sources variables and the call sites vector_store_upload_dataset and vector_store_upload_doc_sources so you can add the guard immediately before those calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/dataset/corpus/finance/documents.json`:
- Line 47: The "publication_date" value for the Q4 2025 document is wrong
(1738022400); update the JSON object's "publication_date" field for the Q4 2025
entry to the correct Unix timestamp for the expected release date (e.g.,
1769558400 for Jan 28, 2026) so the test data matches Q4 2025 timing.
In `@tests/llama_stack/utils.py`:
- Line 293: Fix the typo in the log messages where the filename separator is
missing: update the LOGGER.info calls that reference uploaded_file.filename and
vector_store.id (the line with LOGGER.info(f"Adding uploaded file
(filename{uploaded_file.filename} to vector store {vector_store.id}") and the
similar call around line 303) to include "filename=" before the interpolated
uploaded_file.filename so the f-string becomes
"...(filename={uploaded_file.filename}..." to produce correct logs.
---
Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 836-850: The code silently prefers dataset when both dataset and
doc_sources are provided in the block that calls vector_store_upload_dataset and
vector_store_upload_doc_sources; add an explicit check at the start of that
branch to detect both non-None and either raise a ValueError (to fail fast) or
emit a clear warning via the test logger indicating the mutually exclusive
parameters were both supplied, then proceed only with one path (or bail out).
Reference the dataset and doc_sources variables and the call sites
vector_store_upload_dataset and vector_store_upload_doc_sources so you can add
the guard immediately before those calls.
In `@tests/llama_stack/utils.py`:
- Line 82: Update the loose typing for the attributes parameter to match
vector_store_create_file_from_path: replace attributes: Any with attributes:
dict[str, str | int | float | bool] | None in the function signature (and any
related internal type hints) so the parameter uses the same precise union
mapping as vector_store_create_file_from_path for consistency and better type
checking.
In `@tests/llama_stack/vector_io/test_vector_stores.py`:
- Line 199: The test currently assigns vector_question = next(r.question for r
in dataset.load_qa(retrieval_mode="vector")) which will raise StopIteration if
no vector-mode QA records are returned; change this to use a safe default and an
explicit assertion: call next(..., None) on the generator returned by
dataset.load_qa(retrieval_mode="vector") to get a possibly None vector_question,
then assert that vector_question is not None (with a descriptive message) before
using it, so missing vector-mode results produce a clear test failure instead of
an unhandled exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 3ed501ad-d4f6-4583-8f7d-4e706efe6371
📒 Files selected for processing (9)
tests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/dataset/README.mdtests/llama_stack/dataset/corpus/finance/documents.jsontests/llama_stack/dataset/ground_truth/finance_qa.jsonltests/llama_stack/datasets.pytests/llama_stack/utils.pytests/llama_stack/vector_io/test_vector_stores.pytests/llama_stack/vector_io/upgrade/test_upgrade_vector_store_rag.py
💤 Files with no reviewable changes (1)
- tests/llama_stack/constants.py
f63a80e to
f4de94e
Compare
|
/build-push-pr-image |
|
Status of building tag pr-1296: success. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/llama_stack/vector_io/test_vector_stores.py (1)
122-131:⚠️ Potential issue | 🟠 MajorAdd a bounded wait before exact completed-file count assertion to prevent flaky races.
Line 128 asserts an exact count immediately after listing
filter="completed". If ingestion/indexing visibility is asynchronous, this will intermittently fail despite successful uploads. Poll until expected count (with timeout), then assert.Suggested fix
+import time ... - completed_files = list( - unprivileged_llama_stack_client.vector_stores.files.list( - vector_store_id=store_id, - filter="completed", - ) - ) - assert len(completed_files) == len(dataset.documents), ( + expected = len(dataset.documents) + deadline = time.time() + 60 + while True: + completed_files = list( + unprivileged_llama_stack_client.vector_stores.files.list( + vector_store_id=store_id, + filter="completed", + ) + ) + if len(completed_files) == expected or time.time() >= deadline: + break + time.sleep(2) + + assert len(completed_files) == expected, ( f"Expected {len(dataset.documents)} completed vector store file(s) in {store_id!r} after upload, " f"found {len(completed_files)}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/vector_io/test_vector_stores.py` around lines 122 - 131, The test currently asserts exact completed file count immediately after calling unprivileged_llama_stack_client.vector_stores.files.list with filter="completed", which can race with asynchronous ingestion; change the test to poll until len(completed_files) == len(dataset.documents) (or until a short timeout, e.g. few seconds) by repeatedly calling unprivileged_llama_stack_client.vector_stores.files.list(vector_store_id=store_id, filter="completed") with a small sleep between attempts and then assert the count once satisfied (fail if timeout reached); locate this logic near the existing completed_files block in tests/llama_stack/vector_io/test_vector_stores.py and update the assertion to use the polling loop referencing store_id and dataset.documents.
🧹 Nitpick comments (1)
tests/llama_stack/vector_io/test_vector_stores.py (1)
18-30: Remove duplicated dataset sources in test parametrization to avoid drift bugs.
datasetis carried both invector_store["dataset"]and as a separate parametrized argument. A future edit can accidentally mismatch them and produce false failures (fixture uploads one dataset, assertions read another). Keep a single source of truth per test case.Suggested refactor
- "unprivileged_model_namespace, llama_stack_server_config, vector_store, dataset", + "unprivileged_model_namespace, llama_stack_server_config, vector_store", ... - {"vector_io_provider": "milvus", "dataset": IBM_2025_Q4_EARNINGS}, - IBM_2025_Q4_EARNINGS, + {"vector_io_provider": "milvus", "dataset": IBM_2025_Q4_EARNINGS}, ... def test_vector_stores_file_upload( self, unprivileged_llama_stack_client: LlamaStackClient, vector_store: VectorStore, - dataset: Dataset, ) -> None: + dataset: Dataset = vector_store.metadata["dataset"] # or expose via fixture return typeAlso applies to: 41-43, 53-55, 65-67, 77-79, 89-91
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/vector_io/test_vector_stores.py` around lines 18 - 30, The test parametrization duplicates the dataset by including it inside the vector_store dict and also as the separate "dataset" param, which risks drift; remove the "dataset": ... entries from each vector_store dict in tests/llama_stack/vector_io/test_vector_stores.py (the pytest.param entries that currently include a vector_store dict and a separate dataset argument) so the single source of truth is the standalone "dataset" parameter, and update the corresponding pytest.param tuples at the other occurrences (the subsequent cases mentioned) to rely only on the top-level dataset argument while leaving vector_store keys like "vector_io_provider" unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/vector_io/test_vector_stores.py`:
- Around line 158-161: The test forcibly sets search_modes = ["vector"] when
provider_id == "faiss", but then immediately does queries =
queries_by_mode[search_mode] which raises KeyError if there are no vector QA
entries; add an explicit assertion before the loop such as assert "vector" in
queries_by_mode when provider_id == "faiss" to fail the test with a clear
message (reference symbols: provider_id, search_modes, queries_by_mode,
search_mode).
---
Outside diff comments:
In `@tests/llama_stack/vector_io/test_vector_stores.py`:
- Around line 122-131: The test currently asserts exact completed file count
immediately after calling
unprivileged_llama_stack_client.vector_stores.files.list with
filter="completed", which can race with asynchronous ingestion; change the test
to poll until len(completed_files) == len(dataset.documents) (or until a short
timeout, e.g. few seconds) by repeatedly calling
unprivileged_llama_stack_client.vector_stores.files.list(vector_store_id=store_id,
filter="completed") with a small sleep between attempts and then assert the
count once satisfied (fail if timeout reached); locate this logic near the
existing completed_files block in
tests/llama_stack/vector_io/test_vector_stores.py and update the assertion to
use the polling loop referencing store_id and dataset.documents.
---
Nitpick comments:
In `@tests/llama_stack/vector_io/test_vector_stores.py`:
- Around line 18-30: The test parametrization duplicates the dataset by
including it inside the vector_store dict and also as the separate "dataset"
param, which risks drift; remove the "dataset": ... entries from each
vector_store dict in tests/llama_stack/vector_io/test_vector_stores.py (the
pytest.param entries that currently include a vector_store dict and a separate
dataset argument) so the single source of truth is the standalone "dataset"
parameter, and update the corresponding pytest.param tuples at the other
occurrences (the subsequent cases mentioned) to rely only on the top-level
dataset argument while leaving vector_store keys like "vector_io_provider"
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 18ebf898-e65b-4303-8e30-59c7877a6459
📒 Files selected for processing (4)
tests/llama_stack/conftest.pytests/llama_stack/dataset/corpus/finance/documents.jsontests/llama_stack/utils.pytests/llama_stack/vector_io/test_vector_stores.py
✅ Files skipped from review due to trivial changes (1)
- tests/llama_stack/dataset/corpus/finance/documents.json
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/llama_stack/conftest.py
- tests/llama_stack/utils.py
Introduces a lightweight dataset loader (datasets.py) with DatasetDocumentQA and DatasetDocumentMetadata dataclasses, a documents.json manifest for per-file attribute metadata, and QA records across IBM quarterly earnings press releases. Refactor the vector store fixture to support Dataset instances for document uploads, improving test setup flexibility. Update vector store upload functions to handle datasets and their associated attributes. Remove deprecated constants and adjust test cases to use the new dataset structure. Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Made-with: Cursor
ccc4497 to
8a51abe
Compare
|
/build-push-pr-image |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 813-815: Replace the silent-priority behavior where both dataset
and doc_sources are read from params (the lines with dataset: Dataset | None =
params.get("dataset") and doc_sources: list[str] | None =
params.get("doc_sources")) with an explicit exclusivity check: after reading
both values, if both are not None raise a ValueError (or pytest.fail) explaining
that only one of "dataset" or "doc_sources" may be provided; apply the same
check to the other occurrence handling these variables (the block around the
second occurrence) so tests fail fast on misconfiguration instead of silently
preferring dataset.
- Around line 725-737: The dataset fixture should validate that request.param
exists and is the expected Dataset type and fail fast with a clear error; update
the dataset fixture (the function named dataset that takes request:
FixtureRequest and returns Dataset) to first check that hasattr(request,
"param") (or catch AttributeError), then verify isinstance(request.param,
Dataset), and if either check fails raise a descriptive error (e.g.,
pytest.UsageError or ValueError) stating that the fixture must be
indirect-parametrized with a Dataset instance so failures are explicit.
In `@tests/llama_stack/dataset/README.md`:
- Around line 116-119: Rewrite the step list in
tests/llama_stack/dataset/README.md to remove the repetitive "Create" lead-in
for steps 2–4: keep step 1 as-is, then use a single lead-in sentence such as
"Then add the following files and configuration for the subject:" and list the
items `corpus/<subject>/documents.json` (with schema note),
`ground_truth/<subject>_qa.jsonl` (with schema note), and a new Dataset entry in
tests/llama_stack/datasets.py following the FINANCE_DATASET pattern; ensure each
bullet stays concise and preserves the original file names and guidance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 85014930-7c1f-4230-9483-768c1010e72b
📒 Files selected for processing (9)
tests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/dataset/README.mdtests/llama_stack/dataset/corpus/finance/documents.jsontests/llama_stack/dataset/ground_truth/finance_qa.jsonltests/llama_stack/datasets.pytests/llama_stack/utils.pytests/llama_stack/vector_io/test_vector_stores.pytests/llama_stack/vector_io/upgrade/test_upgrade_vector_store_rag.py
💤 Files with no reviewable changes (1)
- tests/llama_stack/constants.py
✅ Files skipped from review due to trivial changes (1)
- tests/llama_stack/dataset/corpus/finance/documents.json
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/llama_stack/vector_io/upgrade/test_upgrade_vector_store_rag.py
- tests/llama_stack/dataset/ground_truth/finance_qa.jsonl
- tests/llama_stack/vector_io/test_vector_stores.py
- tests/llama_stack/datasets.py
- tests/llama_stack/utils.py
|
Status of building tag pr-1296: success. |
Refactor the vector store fixture to support Dataset instances for document uploads, improving test setup flexibility. Update vector store upload functions to handle datasets and their associated attributes. Remove deprecated constants and adjust test cases to use the new dataset structure. Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Made-with: Cursor
8a51abe to
70609e5
Compare
|
/build-push-pr-image |
|
Status of building tag pr-1296: success. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/vector_io/test_vector_stores.py`:
- Around line 156-157: After filtering search_modes for FAISS, ensure we don't
silently proceed with an empty list: check the variable search_modes (after the
provider_id == "faiss" branch) and either assert it's non-empty or call
pytest.skip with a clear message so the test fails or is skipped rather than
silently passing; update the block that contains provider_id and search_modes to
include this check immediately after the filtering so the subsequent for loop
over search_modes is guarded.
- Line 202: The test risks a StopIteration when calling next(...) over
dataset.load_qa(retrieval_mode="vector"); update the test to safely handle empty
results by consuming the generator into a list or using next(..., None) and then
assert the result is not None with a clear failure message (reference:
dataset.load_qa, vector_question variable, retrieval_mode="vector") so the test
fails with a descriptive assertion instead of raising StopIteration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: fd21a939-0e1b-4ef8-b0c1-dfd725add207
📒 Files selected for processing (3)
tests/llama_stack/conftest.pytests/llama_stack/dataset/README.mdtests/llama_stack/vector_io/test_vector_stores.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/dataset/README.md
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Made-with: Cursor
using f-strings to fix structlog not applying %s/%d-style substitution Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Made-with: Cursor
ChristianZaccaria
left a comment
There was a problem hiding this comment.
/lgtm /approve
This is a great PR to move to a dataset-driven model, where document corpus and QA ground truth are defined once and reused across providers and tests. This also offers greater modularity for the tests.
Great work Jorge! - I only left one nit. Thanks!
|
Status of building tag latest: success. |
Refactors vector store tests to use questions and answers stored tests/llama_stack/dataset and using the Dataset class defined at tests/llama_stack/datasets.py
In the modified README.md files you'll find more information about how to use it
Summary by CodeRabbit
Tests
Documentation