Skip to content

[LEADS-389] Support user-defined metadata in evaluation data format for GDS quality grading and traceability#257

Merged
asamal4 merged 1 commit into
lightspeed-core:mainfrom
bsatapat-jpg:dev_1
Jun 18, 2026
Merged

[LEADS-389] Support user-defined metadata in evaluation data format for GDS quality grading and traceability#257
asamal4 merged 1 commit into
lightspeed-core:mainfrom
bsatapat-jpg:dev_1

Conversation

@bsatapat-jpg

@bsatapat-jpg bsatapat-jpg commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added explicit optional metadata support for dataset, conversation, and turn levels in evaluation inputs/outputs.
    • Evaluation now preserves dataset-level metadata through execution and amended output persistence.
    • YAML inputs/outputs now support dict-root format with optional dataset metadata, while maintaining list-root backward compatibility.
    • Exposed ConversationMetadata, DatasetMetadata, and TurnMetadata as top-level public attributes.
  • Documentation

    • Updated README “Input File Data Structure Details” to document the new metadata fields and API-population behavior.
  • Tests

    • Expanded unit tests for metadata model validation, YAML parsing across shapes, and metadata persistence/round-trips.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds three new Pydantic models (TurnMetadata, ConversationMetadata, DatasetMetadata) as optional fields on TurnData and EvaluationData. Extends DataValidator to parse both list-root (legacy) and dict-root (new) YAML formats, extracting dataset-level metadata. Propagates dataset_metadata through the runner → evaluate API → EvaluationPipelinesave_evaluation_data call chain, conditionally outputting YAML as a dict with metadata when present or as a list for backward compatibility. Updates public module exports and README docs.

Changes

Metadata models, loading, propagation, and persistence

Layer / File(s) Summary
Metadata models, field attachments, and module exports
src/lightspeed_evaluation/core/models/data.py, src/lightspeed_evaluation/core/models/__init__.py, src/lightspeed_evaluation/__init__.py, tests/unit/core/models/test_data.py
Defines TurnMetadata, ConversationMetadata, and DatasetMetadata Pydantic models with extra="forbid" and optional additional_metadata dict; attaches them as optional fields to TurnData and EvaluationData; re-exports all three from the models package and top-level module; includes unit tests for model initialization, field integration, and JSON round-trip serialization.
DataValidator: dual YAML root format and dataset_metadata extraction
src/lightspeed_evaluation/core/system/validator.py, tests/unit/core/system/test_validator.py
Adds dataset_metadata instance attribute to DataValidator; introduces _extract_conversations_and_metadata helper to validate and handle both list-root (legacy) and dict-root (new) YAML structures with optional metadata; refactors _load_and_parse_yaml to delegate root-shape handling; raises DataValidationError for invalid structures or missing required conversations key; comprehensive test suite covers metadata parsing, backward compatibility, and error conditions.
save_evaluation_data: conditional dict vs list YAML output
src/lightspeed_evaluation/core/output/data_persistence.py, tests/unit/core/output/test_data_persistence.py
Extends save_evaluation_data with optional dataset_metadata parameter; outputs a dict with metadata and conversations keys when provided, or a plain list for backward compatibility; includes tests validating both output shapes and round-trip serialization of dataset-level, conversation-level, and turn-level metadata.
Runner → API → Pipeline: threading dataset_metadata
src/lightspeed_evaluation/runner/evaluation.py, src/lightspeed_evaluation/api.py, src/lightspeed_evaluation/pipeline/evaluation/pipeline.py, tests/unit/runner/test_evaluation.py, tests/unit/test_api.py
Runner instantiates DataValidator, calls load_evaluation_data to extract evaluation_data, and captures dataset_metadata from the validator; passes both to evaluate() API call. evaluate() extends its signature with optional dataset_metadata and original_data_path parameters, forwarding them to pipeline.run_evaluation(). Pipeline extends its method signature and internal _save_amended_data helper to accept and forward dataset_metadata to save_evaluation_data(). Integration tests verify correct threading throughout the call chain.
README documentation
README.md
Adds optional metadata field rows to the Conversation and Turn data fields tables, documenting ConversationMetadata and TurnMetadata types with API-populated set to for Turn.

Sequence Diagram(s)

sequenceDiagram
  participant Runner as run_evaluation
  participant Validator as DataValidator
  participant API as api.evaluate
  participant Pipeline as EvaluationPipeline
  participant Persist as save_evaluation_data

  Runner->>Validator: instantiate()
  Runner->>Validator: load_evaluation_data()
  Validator-->>Runner: evaluation_data
  Runner->>Runner: dataset_metadata = validator.dataset_metadata
  Runner->>API: evaluate(config, data, output_dir, original_data_path, dataset_metadata)
  API->>Pipeline: run_evaluation(data, original_data_path, dataset_metadata)
  Pipeline->>Persist: _save_amended_data(..., dataset_metadata)
  Persist->>Persist: if dataset_metadata: wrap in {metadata, conversations}
  Persist-->>Pipeline: amended output path
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-evaluation#231: Both PRs modify EvaluationPipeline in src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (main: adding dataset_metadata/original_data_path forwarding into amended-output persistence; retrieved: refactoring the pipeline to use AgentDrivers instead of APIDataAmender), so the changes touch the same pipeline execution code paths.

Suggested reviewers

  • asamal4
  • VladimirKadlec
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: introducing user-defined metadata support in evaluation data format for GDS quality grading and traceability, which aligns with the comprehensive metadata infrastructure additions across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
README.md (1)

323-337: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document dataset-level metadata in the input schema section as well.

This update documents conversation/turn metadata, but the new dataset-level metadata shape (root metadata + conversations) is still not described in this section, which makes the new feature hard to discover and use correctly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 323 - 337, The README.md input schema section
documents conversation-level and turn-level metadata but lacks documentation for
the root dataset-level metadata structure. Add a new section in the input schema
documentation that describes the top-level dataset fields, including the root
`metadata` field (for dataset-level metadata) and the `conversations` field (as
a list of conversation objects). This should be positioned before or alongside
the existing Conversation Data Fields and Turn Data Fields sections to make the
complete schema hierarchy clear and discoverable.
src/lightspeed_evaluation/core/output/data_persistence.py (1)

65-70: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use structured logging instead of print in core persistence paths.

The success and failure paths should emit logger-based structured events so they honor runtime log config and preserve traceback context.

Suggested change
+import logging
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any, Optional
@@
 from lightspeed_evaluation.core.constants import DEFAULT_OUTPUT_DIR
 from lightspeed_evaluation.core.models import EvaluationData
 from lightspeed_evaluation.core.models.data import DatasetMetadata

+logger = logging.getLogger(__name__)
+
@@
-        print(f"💾 Amended evaluation data saved to: {amended_data_path}")
+        logger.info(
+            "amended_evaluation_data_saved",
+            extra={"amended_data_path": str(amended_data_path)},
+        )
         return str(amended_data_path)

     except (OSError, yaml.YAMLError) as e:
-        print(f"❌ Failed to save amended evaluation data: {e}")
+        logger.error(
+            "failed_to_save_amended_evaluation_data",
+            extra={
+                "original_data_path": original_data_path,
+                "output_dir": output_dir,
+            },
+            exc_info=e,
+        )
         return None

As per coding guidelines, src/lightspeed_evaluation/**/*.py: Use structured logging with appropriate log levels in Python code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/output/data_persistence.py` around lines 65 -
70, Replace the print statements in the success and failure paths of the amended
evaluation data saving logic with structured logging calls. The success message
(currently using print with the 💾 emoji) should be replaced with a
logger.info() call, and the failure message (currently using print with the ❌
emoji) in the except block catching OSError and yaml.YAMLError should be
replaced with a logger.error() call that includes the exception details. This
ensures the code honors runtime log configuration and preserves traceback
context as per the coding guidelines for the lightspeed_evaluation module.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/api.py`:
- Around line 42-47: The evaluate() function at lines 42-47 is missing the
original_data_path parameter that needs to be passed through the pipeline. Add
original_data_path as an optional parameter to the evaluate() function signature
(similar to how dataset_metadata is already included), then ensure this path is
forwarded to the internal pipeline functions so that _save_amended_data in
pipeline.py can use it instead of returning early. The same parameter addition
is also needed at lines 67-71 for consistency across all related function
signatures that handle data evaluation.

In `@src/lightspeed_evaluation/core/system/validator.py`:
- Around line 223-227: The `self.dataset_metadata` instance variable is not
being reset between file loads in the DataValidator class, causing state leakage
when the same validator instance parses multiple files. Add a reset of
`self.dataset_metadata` at the beginning of the file loading process, before the
call to `self._extract_conversations_and_metadata(raw_data)` at line 223-227.
This same reset pattern must also be applied at the other affected locations
(lines 259-261 and 262-286) to ensure dataset_metadata is cleared at the start
of each load operation, preventing old metadata from contaminating subsequent
file parses.
- Around line 270-277: The code at line 272 unpacks `raw_data["metadata"]`
directly into `DatasetMetadata(**raw_data["metadata"])` without validating it is
a dictionary. When YAML contains `metadata: null` or any non-mapping value, the
`**` operator raises an uncaught `TypeError` that escapes the `except
ValidationError` block. Add a guard before the unpacking to check if
`raw_data["metadata"]` is a dictionary, and if not, either raise a
`DataValidationError` explicitly or handle the non-dict case appropriately so
all error paths are caught and converted to `DataValidationError`.

In `@tests/unit/core/models/test_data.py`:
- Line 750: Remove the pyright: ignore[reportCallIssue] suppression comments
from the model instantiation calls and instead use model_validate() with
dictionary inputs to test invalid payloads. In
tests/unit/core/models/test_data.py at lines 750, 802, and 854, replace direct
constructor calls that pass unknown fields (like
TurnMetadata(unknown_field="value")) with model_validate() calls that accept a
dictionary, allowing the validation logic to naturally catch and handle the
invalid input without requiring type-check suppressions.

---

Outside diff comments:
In `@README.md`:
- Around line 323-337: The README.md input schema section documents
conversation-level and turn-level metadata but lacks documentation for the root
dataset-level metadata structure. Add a new section in the input schema
documentation that describes the top-level dataset fields, including the root
`metadata` field (for dataset-level metadata) and the `conversations` field (as
a list of conversation objects). This should be positioned before or alongside
the existing Conversation Data Fields and Turn Data Fields sections to make the
complete schema hierarchy clear and discoverable.

In `@src/lightspeed_evaluation/core/output/data_persistence.py`:
- Around line 65-70: Replace the print statements in the success and failure
paths of the amended evaluation data saving logic with structured logging calls.
The success message (currently using print with the 💾 emoji) should be replaced
with a logger.info() call, and the failure message (currently using print with
the ❌ emoji) in the except block catching OSError and yaml.YAMLError should be
replaced with a logger.error() call that includes the exception details. This
ensures the code honors runtime log configuration and preserves traceback
context as per the coding guidelines for the lightspeed_evaluation module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56a58b73-8e0e-410f-8e6e-c741aba15806

📥 Commits

Reviewing files that changed from the base of the PR and between dcb1360 and effe602.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • README.md
  • src/lightspeed_evaluation/__init__.py
  • src/lightspeed_evaluation/api.py
  • src/lightspeed_evaluation/core/models/__init__.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/output/data_persistence.py
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/runner/evaluation.py
  • tests/unit/core/models/test_data.py
  • tests/unit/core/output/test_data_persistence.py
  • tests/unit/core/system/test_validator.py
  • tests/unit/runner/test_evaluation.py
  • tests/unit/test_api.py

Comment thread src/lightspeed_evaluation/api.py
Comment thread src/lightspeed_evaluation/core/system/validator.py
Comment thread src/lightspeed_evaluation/core/system/validator.py Outdated
Comment thread tests/unit/core/models/test_data.py Outdated
@bsatapat-jpg bsatapat-jpg force-pushed the dev_1 branch 2 times, most recently from fe8027e to f5f6721 Compare June 16, 2026 05:46
@bsatapat-jpg

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@VladimirKadlec VladimirKadlec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not defining the schema and allow any data to be entered.

Comment thread src/lightspeed_evaluation/core/models/data.py Outdated
Comment thread src/lightspeed_evaluation/core/models/data.py
Comment thread src/lightspeed_evaluation/core/models/data.py
@bsatapat-jpg

Copy link
Copy Markdown
Collaborator Author

I suggest not defining the schema and allow any data to be entered.

Sure I will do the suggested changes, initially I thought we need to add GDS documentation fields to our framework.
Thanks for clarifying it :)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/core/models/test_data.py (1)

727-839: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run Black on this file before merge.

CI is currently failing because this file is not Black-formatted. Please format it (for example: uv run black tests/unit/core/models/test_data.py) to unblock merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/core/models/test_data.py` around lines 727 - 839, The test file
containing TestTurnMetadata, TestConversationMetadata, and TestDatasetMetadata
classes is not formatted according to Black's code style standards, which is
causing CI to fail. Run the Black code formatter on this file to automatically
fix all formatting issues and ensure compliance with the project's style
guidelines.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/unit/core/models/test_data.py`:
- Around line 727-839: The test file containing TestTurnMetadata,
TestConversationMetadata, and TestDatasetMetadata classes is not formatted
according to Black's code style standards, which is causing CI to fail. Run the
Black code formatter on this file to automatically fix all formatting issues and
ensure compliance with the project's style guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: add30d99-afff-4436-8d48-aa883e0edb65

📥 Commits

Reviewing files that changed from the base of the PR and between f5f6721 and ce8eb06.

📒 Files selected for processing (14)
  • README.md
  • src/lightspeed_evaluation/__init__.py
  • src/lightspeed_evaluation/api.py
  • src/lightspeed_evaluation/core/models/__init__.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/output/data_persistence.py
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/runner/evaluation.py
  • tests/unit/core/models/test_data.py
  • tests/unit/core/output/test_data_persistence.py
  • tests/unit/core/system/test_validator.py
  • tests/unit/runner/test_evaluation.py
  • tests/unit/test_api.py
✅ Files skipped from review due to trivial changes (2)
  • README.md
  • src/lightspeed_evaluation/core/models/init.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/unit/test_api.py
  • tests/unit/runner/test_evaluation.py
  • src/lightspeed_evaluation/runner/evaluation.py
  • src/lightspeed_evaluation/init.py
  • tests/unit/core/output/test_data_persistence.py
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/api.py
  • src/lightspeed_evaluation/core/output/data_persistence.py

VladimirKadlec
VladimirKadlec previously approved these changes Jun 17, 2026

@VladimirKadlec VladimirKadlec left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@asamal4

asamal4 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@bsatapat-jpg @VladimirKadlec
We already discussed this in refinement and agreed - We need a schema so that the fields are consistent, this will be helpful to process the data. Otherwise someone would add human_verified and others may add humanverified. Additionally we would add a schema free property. something like below - (just a sample, actual fields will be as per GDS)

  class TurnMetadata(BaseModel):
      model_config = ConfigDict(extra="forbid")

      data_source: Optional[str] = None
      human_verified: Optional[bool] = None
      verified_by: Optional[str] = None
      persona: Optional[str] = None
      additional_metadata: Optional[dict[str, Any]] = None

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/models/data.py (1)

16-16: ⚡ Quick win

Use Google-style docstrings for new public metadata models.

Line 16, Line 45, and Line 77 currently use short summary docstrings; for public APIs in src/**, please switch these to Google-style docstrings.

As per coding guidelines, src/**/*.py: “Use Google-style docstrings for all public APIs in Python.”

Also applies to: 45-45, 77-77

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/models/data.py` at line 16, The docstrings at
lines 16, 45, and 77 in the data.py file are using short summary format instead
of Google-style docstrings required for public APIs in src/**. Convert each of
these docstrings to Google-style format by expanding the short summary docstring
(for example, the one describing "Optional user-defined metadata for a single
turn") to include proper sections like Args, Returns, Attributes, or Raises as
appropriate for each public metadata model, following the Google Python style
guide conventions.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lightspeed_evaluation/core/models/data.py`:
- Line 16: The docstrings at lines 16, 45, and 77 in the data.py file are using
short summary format instead of Google-style docstrings required for public APIs
in src/**. Convert each of these docstrings to Google-style format by expanding
the short summary docstring (for example, the one describing "Optional
user-defined metadata for a single turn") to include proper sections like Args,
Returns, Attributes, or Raises as appropriate for each public metadata model,
following the Google Python style guide conventions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d671b82-4603-4fcc-8344-00b779a5f810

📥 Commits

Reviewing files that changed from the base of the PR and between ce8eb06 and 0a35088.

📒 Files selected for processing (14)
  • README.md
  • src/lightspeed_evaluation/__init__.py
  • src/lightspeed_evaluation/api.py
  • src/lightspeed_evaluation/core/models/__init__.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/output/data_persistence.py
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/runner/evaluation.py
  • tests/unit/core/models/test_data.py
  • tests/unit/core/output/test_data_persistence.py
  • tests/unit/core/system/test_validator.py
  • tests/unit/runner/test_evaluation.py
  • tests/unit/test_api.py
🚧 Files skipped from review as they are similar to previous changes (11)
  • src/lightspeed_evaluation/init.py
  • src/lightspeed_evaluation/core/models/init.py
  • src/lightspeed_evaluation/api.py
  • README.md
  • tests/unit/runner/test_evaluation.py
  • src/lightspeed_evaluation/core/output/data_persistence.py
  • tests/unit/test_api.py
  • tests/unit/core/output/test_data_persistence.py
  • src/lightspeed_evaluation/runner/evaluation.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/core/system/validator.py

@bsatapat-jpg

Copy link
Copy Markdown
Collaborator Author

@bsatapat-jpg @VladimirKadlec We already discussed this in refinement and agreed - We need a schema so that the fields are consistent, this will be helpful to process the data. Otherwise someone would add human_verified and others may add humanverified. Additionally we would add a schema free property. something like below - (just a sample, actual fields will be as per GDS)

  class TurnMetadata(BaseModel):
      model_config = ConfigDict(extra="forbid")

      data_source: Optional[str] = None
      human_verified: Optional[bool] = None
      verified_by: Optional[str] = None
      persona: Optional[str] = None
      additional_metadata: Optional[dict[str, Any]] = None

Updated the code to handle the schema. PTAL.
Thanks in advance

@asamal4 asamal4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !!
I have suggested few changes, let me know WDYT ?
If we agree then I can update the doc later.

Comment thread src/lightspeed_evaluation/core/models/data.py Outdated
Comment thread src/lightspeed_evaluation/core/models/data.py Outdated
Comment thread src/lightspeed_evaluation/core/models/data.py
Comment thread src/lightspeed_evaluation/core/models/data.py Outdated
Comment thread src/lightspeed_evaluation/core/models/data.py
@bsatapat-jpg bsatapat-jpg force-pushed the dev_1 branch 3 times, most recently from fc7bbbf to dcfd2aa Compare June 18, 2026 05:52
@bsatapat-jpg

Copy link
Copy Markdown
Collaborator Author

Thanks !! I have suggested few changes, let me know WDYT ? If we agree then I can update the doc later.

Thanks for the feedback. I have done the below changes:

  • Removed TurnMetadata entirely

  • Moved fields to ConversationMetadata

  • Removed notes

  • Updated scenario_category description

  • Added description and jtbd_source to DatasetMetadata as requested.

@bsatapat-jpg bsatapat-jpg requested a review from asamal4 June 18, 2026 06:02

@asamal4 asamal4 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asamal4 asamal4 merged commit e3882f1 into lightspeed-core:main Jun 18, 2026
17 checks passed
@bsatapat-jpg bsatapat-jpg deleted the dev_1 branch June 24, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants