[LEADS-389] Support user-defined metadata in evaluation data format for GDS quality grading and traceability#257
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds three new Pydantic models ( ChangesMetadata models, loading, propagation, and persistence
Sequence Diagram(s)sequenceDiagram
participant Runner as run_evaluation
participant Validator as DataValidator
participant API as api.evaluate
participant Pipeline as EvaluationPipeline
participant Persist as save_evaluation_data
Runner->>Validator: instantiate()
Runner->>Validator: load_evaluation_data()
Validator-->>Runner: evaluation_data
Runner->>Runner: dataset_metadata = validator.dataset_metadata
Runner->>API: evaluate(config, data, output_dir, original_data_path, dataset_metadata)
API->>Pipeline: run_evaluation(data, original_data_path, dataset_metadata)
Pipeline->>Persist: _save_amended_data(..., dataset_metadata)
Persist->>Persist: if dataset_metadata: wrap in {metadata, conversations}
Persist-->>Pipeline: amended output path
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
README.md (1)
323-337:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument dataset-level metadata in the input schema section as well.
This update documents conversation/turn metadata, but the new dataset-level metadata shape (root
metadata+conversations) is still not described in this section, which makes the new feature hard to discover and use correctly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 323 - 337, The README.md input schema section documents conversation-level and turn-level metadata but lacks documentation for the root dataset-level metadata structure. Add a new section in the input schema documentation that describes the top-level dataset fields, including the root `metadata` field (for dataset-level metadata) and the `conversations` field (as a list of conversation objects). This should be positioned before or alongside the existing Conversation Data Fields and Turn Data Fields sections to make the complete schema hierarchy clear and discoverable.src/lightspeed_evaluation/core/output/data_persistence.py (1)
65-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse structured logging instead of
The success and failure paths should emit logger-based structured events so they honor runtime log config and preserve traceback context.
Suggested change
+import logging from datetime import UTC, datetime from pathlib import Path from typing import Any, Optional @@ from lightspeed_evaluation.core.constants import DEFAULT_OUTPUT_DIR from lightspeed_evaluation.core.models import EvaluationData from lightspeed_evaluation.core.models.data import DatasetMetadata +logger = logging.getLogger(__name__) + @@ - print(f"💾 Amended evaluation data saved to: {amended_data_path}") + logger.info( + "amended_evaluation_data_saved", + extra={"amended_data_path": str(amended_data_path)}, + ) return str(amended_data_path) except (OSError, yaml.YAMLError) as e: - print(f"❌ Failed to save amended evaluation data: {e}") + logger.error( + "failed_to_save_amended_evaluation_data", + extra={ + "original_data_path": original_data_path, + "output_dir": output_dir, + }, + exc_info=e, + ) return NoneAs per coding guidelines,
src/lightspeed_evaluation/**/*.py: Use structured logging with appropriate log levels in Python code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_evaluation/core/output/data_persistence.py` around lines 65 - 70, Replace the print statements in the success and failure paths of the amended evaluation data saving logic with structured logging calls. The success message (currently using print with the 💾 emoji) should be replaced with a logger.info() call, and the failure message (currently using print with the ❌ emoji) in the except block catching OSError and yaml.YAMLError should be replaced with a logger.error() call that includes the exception details. This ensures the code honors runtime log configuration and preserves traceback context as per the coding guidelines for the lightspeed_evaluation module.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/api.py`:
- Around line 42-47: The evaluate() function at lines 42-47 is missing the
original_data_path parameter that needs to be passed through the pipeline. Add
original_data_path as an optional parameter to the evaluate() function signature
(similar to how dataset_metadata is already included), then ensure this path is
forwarded to the internal pipeline functions so that _save_amended_data in
pipeline.py can use it instead of returning early. The same parameter addition
is also needed at lines 67-71 for consistency across all related function
signatures that handle data evaluation.
In `@src/lightspeed_evaluation/core/system/validator.py`:
- Around line 223-227: The `self.dataset_metadata` instance variable is not
being reset between file loads in the DataValidator class, causing state leakage
when the same validator instance parses multiple files. Add a reset of
`self.dataset_metadata` at the beginning of the file loading process, before the
call to `self._extract_conversations_and_metadata(raw_data)` at line 223-227.
This same reset pattern must also be applied at the other affected locations
(lines 259-261 and 262-286) to ensure dataset_metadata is cleared at the start
of each load operation, preventing old metadata from contaminating subsequent
file parses.
- Around line 270-277: The code at line 272 unpacks `raw_data["metadata"]`
directly into `DatasetMetadata(**raw_data["metadata"])` without validating it is
a dictionary. When YAML contains `metadata: null` or any non-mapping value, the
`**` operator raises an uncaught `TypeError` that escapes the `except
ValidationError` block. Add a guard before the unpacking to check if
`raw_data["metadata"]` is a dictionary, and if not, either raise a
`DataValidationError` explicitly or handle the non-dict case appropriately so
all error paths are caught and converted to `DataValidationError`.
In `@tests/unit/core/models/test_data.py`:
- Line 750: Remove the pyright: ignore[reportCallIssue] suppression comments
from the model instantiation calls and instead use model_validate() with
dictionary inputs to test invalid payloads. In
tests/unit/core/models/test_data.py at lines 750, 802, and 854, replace direct
constructor calls that pass unknown fields (like
TurnMetadata(unknown_field="value")) with model_validate() calls that accept a
dictionary, allowing the validation logic to naturally catch and handle the
invalid input without requiring type-check suppressions.
---
Outside diff comments:
In `@README.md`:
- Around line 323-337: The README.md input schema section documents
conversation-level and turn-level metadata but lacks documentation for the root
dataset-level metadata structure. Add a new section in the input schema
documentation that describes the top-level dataset fields, including the root
`metadata` field (for dataset-level metadata) and the `conversations` field (as
a list of conversation objects). This should be positioned before or alongside
the existing Conversation Data Fields and Turn Data Fields sections to make the
complete schema hierarchy clear and discoverable.
In `@src/lightspeed_evaluation/core/output/data_persistence.py`:
- Around line 65-70: Replace the print statements in the success and failure
paths of the amended evaluation data saving logic with structured logging calls.
The success message (currently using print with the 💾 emoji) should be replaced
with a logger.info() call, and the failure message (currently using print with
the ❌ emoji) in the except block catching OSError and yaml.YAMLError should be
replaced with a logger.error() call that includes the exception details. This
ensures the code honors runtime log configuration and preserves traceback
context as per the coding guidelines for the lightspeed_evaluation module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56a58b73-8e0e-410f-8e6e-c741aba15806
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
README.mdsrc/lightspeed_evaluation/__init__.pysrc/lightspeed_evaluation/api.pysrc/lightspeed_evaluation/core/models/__init__.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/output/data_persistence.pysrc/lightspeed_evaluation/core/system/validator.pysrc/lightspeed_evaluation/pipeline/evaluation/pipeline.pysrc/lightspeed_evaluation/runner/evaluation.pytests/unit/core/models/test_data.pytests/unit/core/output/test_data_persistence.pytests/unit/core/system/test_validator.pytests/unit/runner/test_evaluation.pytests/unit/test_api.py
fe8027e to
f5f6721
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
VladimirKadlec
left a comment
There was a problem hiding this comment.
I suggest not defining the schema and allow any data to be entered.
Sure I will do the suggested changes, initially I thought we need to add GDS documentation fields to our framework. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/core/models/test_data.py (1)
727-839:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun Black on this file before merge.
CI is currently failing because this file is not Black-formatted. Please format it (for example:
uv run black tests/unit/core/models/test_data.py) to unblock merge.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/core/models/test_data.py` around lines 727 - 839, The test file containing TestTurnMetadata, TestConversationMetadata, and TestDatasetMetadata classes is not formatted according to Black's code style standards, which is causing CI to fail. Run the Black code formatter on this file to automatically fix all formatting issues and ensure compliance with the project's style guidelines.Source: Pipeline failures
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/unit/core/models/test_data.py`:
- Around line 727-839: The test file containing TestTurnMetadata,
TestConversationMetadata, and TestDatasetMetadata classes is not formatted
according to Black's code style standards, which is causing CI to fail. Run the
Black code formatter on this file to automatically fix all formatting issues and
ensure compliance with the project's style guidelines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: add30d99-afff-4436-8d48-aa883e0edb65
📒 Files selected for processing (14)
README.mdsrc/lightspeed_evaluation/__init__.pysrc/lightspeed_evaluation/api.pysrc/lightspeed_evaluation/core/models/__init__.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/output/data_persistence.pysrc/lightspeed_evaluation/core/system/validator.pysrc/lightspeed_evaluation/pipeline/evaluation/pipeline.pysrc/lightspeed_evaluation/runner/evaluation.pytests/unit/core/models/test_data.pytests/unit/core/output/test_data_persistence.pytests/unit/core/system/test_validator.pytests/unit/runner/test_evaluation.pytests/unit/test_api.py
✅ Files skipped from review due to trivial changes (2)
- README.md
- src/lightspeed_evaluation/core/models/init.py
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/unit/test_api.py
- tests/unit/runner/test_evaluation.py
- src/lightspeed_evaluation/runner/evaluation.py
- src/lightspeed_evaluation/init.py
- tests/unit/core/output/test_data_persistence.py
- src/lightspeed_evaluation/core/system/validator.py
- src/lightspeed_evaluation/api.py
- src/lightspeed_evaluation/core/output/data_persistence.py
|
@bsatapat-jpg @VladimirKadlec class TurnMetadata(BaseModel):
model_config = ConfigDict(extra="forbid")
data_source: Optional[str] = None
human_verified: Optional[bool] = None
verified_by: Optional[str] = None
persona: Optional[str] = None
additional_metadata: Optional[dict[str, Any]] = None |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/models/data.py (1)
16-16: ⚡ Quick winUse Google-style docstrings for new public metadata models.
Line 16, Line 45, and Line 77 currently use short summary docstrings; for public APIs in
src/**, please switch these to Google-style docstrings.As per coding guidelines,
src/**/*.py: “Use Google-style docstrings for all public APIs in Python.”Also applies to: 45-45, 77-77
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_evaluation/core/models/data.py` at line 16, The docstrings at lines 16, 45, and 77 in the data.py file are using short summary format instead of Google-style docstrings required for public APIs in src/**. Convert each of these docstrings to Google-style format by expanding the short summary docstring (for example, the one describing "Optional user-defined metadata for a single turn") to include proper sections like Args, Returns, Attributes, or Raises as appropriate for each public metadata model, following the Google Python style guide conventions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lightspeed_evaluation/core/models/data.py`:
- Line 16: The docstrings at lines 16, 45, and 77 in the data.py file are using
short summary format instead of Google-style docstrings required for public APIs
in src/**. Convert each of these docstrings to Google-style format by expanding
the short summary docstring (for example, the one describing "Optional
user-defined metadata for a single turn") to include proper sections like Args,
Returns, Attributes, or Raises as appropriate for each public metadata model,
following the Google Python style guide conventions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9d671b82-4603-4fcc-8344-00b779a5f810
📒 Files selected for processing (14)
README.mdsrc/lightspeed_evaluation/__init__.pysrc/lightspeed_evaluation/api.pysrc/lightspeed_evaluation/core/models/__init__.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/output/data_persistence.pysrc/lightspeed_evaluation/core/system/validator.pysrc/lightspeed_evaluation/pipeline/evaluation/pipeline.pysrc/lightspeed_evaluation/runner/evaluation.pytests/unit/core/models/test_data.pytests/unit/core/output/test_data_persistence.pytests/unit/core/system/test_validator.pytests/unit/runner/test_evaluation.pytests/unit/test_api.py
🚧 Files skipped from review as they are similar to previous changes (11)
- src/lightspeed_evaluation/init.py
- src/lightspeed_evaluation/core/models/init.py
- src/lightspeed_evaluation/api.py
- README.md
- tests/unit/runner/test_evaluation.py
- src/lightspeed_evaluation/core/output/data_persistence.py
- tests/unit/test_api.py
- tests/unit/core/output/test_data_persistence.py
- src/lightspeed_evaluation/runner/evaluation.py
- src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
- src/lightspeed_evaluation/core/system/validator.py
Updated the code to handle the schema. PTAL. |
asamal4
left a comment
There was a problem hiding this comment.
Thanks !!
I have suggested few changes, let me know WDYT ?
If we agree then I can update the doc later.
fc7bbbf to
dcfd2aa
Compare
Thanks for the feedback. I have done the below changes:
|
…or GDS quality grading and traceability
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
New Features
metadatasupport for dataset, conversation, and turn levels in evaluation inputs/outputs.metadata, while maintaining list-root backward compatibility.ConversationMetadata,DatasetMetadata, andTurnMetadataas top-level public attributes.Documentation
metadatafields and API-population behavior.Tests