chore: spec update 260626 by asamal4 · Pull Request #261 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-06-26T11:04:08Z

Description

(Release Readiness) Updated spec files as per latest development.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue #
Closes # LEADS-456

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Documentation
- Updated evaluation and agent-driver docs to better describe supported workflows, configuration options, and runtime behavior.
- Clarified how metrics, scoring, token tracking, and error handling work across turn-level and conversation-level evaluation.
- Added guidance for new storage/reporting options, including Langfuse support and visualization settings.
- Reworked formatting rules and structure across spec files for easier maintenance and clearer reference.

coderabbitai · 2026-06-26T11:04:21Z

Warning

Review limit reached

@asamal4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 12 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3ec5198-f788-45dc-be99-67107dbf8b8f

📥 Commits

Reviewing files that changed from the base of the PR and between 9d23554 and 8fbfaec.

📒 Files selected for processing (12)

.ai/spec/README.md
.ai/spec/how/agent-drivers.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/how/output-and-storage.md
.ai/spec/how/project-structure.md
.ai/spec/what/agent-drivers.md
.ai/spec/what/evaluation-pipeline.md
.ai/spec/what/llm-and-judges.md
.ai/spec/what/metrics.md
.ai/spec/what/output-and-reporting.md
.ai/spec/what/system-overview.md

Walkthrough

The PR updates .ai/spec docs to describe revised evaluation pipeline rules, agent driver behavior, metric and judge semantics, storage/reporting behavior, and related module maps. It also changes the spec README convention for behavioral rules from numbered lists to bullets.

Changes

Evaluation docs and module maps

Layer / File(s)	Summary
Execution model and pipeline `.ai/spec/what/system-overview.md`, `.ai/spec/what/evaluation-pipeline.md`	Evaluation levels, metric resolution, turn/conversation processing, error handling, and thread-pool defaults are rewritten as bullet rules.
Agent driver behavior and config `.ai/spec/what/agent-drivers.md`, `.ai/spec/how/agent-drivers.md`	HTTP and proposal drivers are split into separate behavioral and data-flow descriptions, with updated config tables, proposal lifecycle rules, and implementation mappings.
Metric and judge rules `.ai/spec/what/metrics.md`, `.ai/spec/what/llm-and-judges.md`, `.ai/spec/how/metrics-implementation.md`	Metric selection, prerequisites, dispatch, token tracking, embeddings, and proposal-status metric references are updated.
Storage and reporting `.ai/spec/what/output-and-reporting.md`, `.ai/spec/how/output-and-storage.md`	Reporting formats, storage backend configuration, and Langfuse lifecycle/score handling are expanded.
Spec conventions and module maps `.ai/spec/README.md`, `.ai/spec/how/configuration-and-models.md`, `.ai/spec/how/project-structure.md`	The behavioral-rule formatting convention switches to bullets, and module maps are revised for agent configs and package responsibilities.

Estimated review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lightspeed-core/lightspeed-evaluation#249: Introduced the earlier .ai/spec documentation baseline that this PR revises.

Suggested reviewers

VladimirKadlec
xmican10

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is generic and doesn't clearly describe the spec changes, so it doesn't convey the main update.	Use a more specific title naming the affected spec area or feature changes, such as the new agent, metrics, or storage documentation updates.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

.ai/spec/what/agent-drivers.md (1)
58-69: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Standardize required-field notation in configuration tables.

ProposalAgentConfig.namespace uses "(required)" in the Default column, while other required fields (e.g., HttpApiAgentConfig.model, Shared.default.agent) use "—". Use consistent notation across all configuration tables.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/agent-drivers.md around lines 58 - 69, Standardize the
required-field notation in ProposalAgentConfig so it matches the rest of the
configuration tables. Update the ProposalAgentConfig table entry for
agents.<id>.namespace to use the same required placeholder convention as
HttpApiAgentConfig.model and Shared.default.agent, keeping the rest of the
ProposalAgentConfig fields unchanged.
.ai/spec/what/system-overview.md (2)
21-24: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Clarify metric resolution precedence to avoid ambiguity.

The description "Override keys win, but non-overlapping system default keys are preserved" is accurate but could be more explicit about the merge order. Consider stating the precedence hierarchy explicitly (system defaults → level-specific overrides) to match the implementation in metrics-implementation.md.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/system-overview.md around lines 21 - 24, Clarify the metric
metadata merge order in the system overview so it explicitly states the
precedence hierarchy as system defaults first, then level-specific overrides,
with override keys taking priority while preserving non-overlapping default
keys. Update the metric resolution wording in the overview to match the behavior
described by the metrics metadata rules and remove any ambiguity about how
turn-level and conversation-level metadata are applied.
28-31: 🧹 Nitpick | 🔵 Trivial

Document missing conversation-level skip_on_failure override in system-overview.md

The distinction between metric evaluation failures (respect skip_on_failure) and Agent API errors (always cascade) is correctly and consistently documented across system-overview.md and evaluation-pipeline.md.

However, evaluation-pipeline.md documents a Conversation-level skip_on_failure override (per-conversation setting), which is absent from the configuration table and execution model description in system-overview.md. This omission should be addressed to ensure developers are aware of the per-conversation override capability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/system-overview.md around lines 28 - 31, The system-overview
documentation is missing the per-conversation skip-on-failure override described
in evaluation-pipeline.md. Update the configuration table and the execution
model section in system-overview.md to mention that conversation-level settings
can override the global skip_on_failure behavior for a specific conversation,
and align the wording with the existing conversation/turn evaluation rules.
.ai/spec/how/project-structure.md (1)
18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Fix redundant acronym: "CLI interface" → "CLI".

"CLI" already stands for "command-line interface"; appending "interface" is tautological.
📝 Suggested fix
-| `pipeline/evaluation/cli.py` | `CLIClient`, `KubeCLI` | Abstract CLI interface and Kubernetes (oc/kubectl) implementation |
+| `pipeline/evaluation/cli.py` | `CLIClient`, `KubeCLI` | Abstract CLI and Kubernetes (oc/kubectl) implementation |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/how/project-structure.md at line 18, The project-structure entry
for pipeline/evaluation/cli.py uses the redundant phrase “CLI interface”; update
that description to just “CLI” so it is concise and non-tautological. Adjust the
table row text referencing CLIClient and KubeCLI to keep the wording consistent
with the rest of the document while preserving the meaning of the abstract
command-line tooling section.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/what/agent-drivers.md:
- Around line 22-31: The ProposalDriver spec is inconsistent about the
auto_approve gate: it says approval happens once the Proposal CR exists, but the
implementation waits for the Analyzed=True condition before creating the
ProposalApproval CR. Update the ProposalDriver description to reflect the actual
polling condition and use the same terminology as the driver flow so the
documented lifecycle matches the behavior of ProposalDriver and its auto_approve
path.

In @.ai/spec/what/evaluation-pipeline.md:
- Around line 14-20: The “saved to storage atomically” wording in the
Conversation Processing section is too strong for the current behavior in
ConversationProcessor, since StorageError is only warned on and processing
continues. Reword the description to reflect best-effort batch persistence (for
example, “batch saved to storage”) unless the storage layer actually guarantees
atomic commits, and keep the terminology consistent with the turn-results
collection and save flow.

In @.ai/spec/what/output-and-reporting.md:
- Around line 52-54: The storage configuration table currently implies that all
SQL backends use storage[].host, which is misleading for SQLite. Update the
description in the output-and-reporting spec so storage[].host is documented as
the database host for remote SQL backends (postgres, mysql) or Langfuse host
URL, while making clear that SQLite uses storage[].database for the local file
path and does not require host. Keep the wording aligned with the existing
storage[].type and storage[].database entries so the backend-specific
requirements are unambiguous.

---

Nitpick comments:
In @.ai/spec/how/project-structure.md:
- Line 18: The project-structure entry for pipeline/evaluation/cli.py uses the
redundant phrase “CLI interface”; update that description to just “CLI” so it is
concise and non-tautological. Adjust the table row text referencing CLIClient
and KubeCLI to keep the wording consistent with the rest of the document while
preserving the meaning of the abstract command-line tooling section.

In @.ai/spec/what/agent-drivers.md:
- Around line 58-69: Standardize the required-field notation in
ProposalAgentConfig so it matches the rest of the configuration tables. Update
the ProposalAgentConfig table entry for agents.<id>.namespace to use the same
required placeholder convention as HttpApiAgentConfig.model and
Shared.default.agent, keeping the rest of the ProposalAgentConfig fields
unchanged.

In @.ai/spec/what/system-overview.md:
- Around line 21-24: Clarify the metric metadata merge order in the system
overview so it explicitly states the precedence hierarchy as system defaults
first, then level-specific overrides, with override keys taking priority while
preserving non-overlapping default keys. Update the metric resolution wording in
the overview to match the behavior described by the metrics metadata rules and
remove any ambiguity about how turn-level and conversation-level metadata are
applied.
- Around line 28-31: The system-overview documentation is missing the
per-conversation skip-on-failure override described in evaluation-pipeline.md.
Update the configuration table and the execution model section in
system-overview.md to mention that conversation-level settings can override the
global skip_on_failure behavior for a specific conversation, and align the
wording with the existing conversation/turn evaluation rules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 87b6b351-25e3-4bdf-b60a-d326d3101c9d

📥 Commits

Reviewing files that changed from the base of the PR and between 30f5c59 and 9d23554.

📒 Files selected for processing (12)

.ai/spec/README.md
.ai/spec/how/agent-drivers.md
.ai/spec/how/configuration-and-models.md
.ai/spec/how/metrics-implementation.md
.ai/spec/how/output-and-storage.md
.ai/spec/how/project-structure.md
.ai/spec/what/agent-drivers.md
.ai/spec/what/evaluation-pipeline.md
.ai/spec/what/llm-and-judges.md
.ai/spec/what/metrics.md
.ai/spec/what/output-and-reporting.md
.ai/spec/what/system-overview.md

bsatapat-jpg

LGTM. Thanks

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread .ai/spec/what/agent-drivers.md

Comment thread .ai/spec/what/evaluation-pipeline.md

Comment thread .ai/spec/what/output-and-reporting.md Outdated

chore: spec update 260626

8fbfaec

asamal4 force-pushed the spec-update-2606 branch from 9d23554 to 8fbfaec Compare June 26, 2026 11:49

bsatapat-jpg approved these changes Jun 26, 2026

View reviewed changes

asamal4 merged commit 27b4aa2 into lightspeed-core:main Jun 26, 2026
17 checks passed

coderabbitai Bot mentioned this pull request Jun 28, 2026

chore: consistent framework description and naming #267

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: spec update 260626#261

chore: spec update 260626#261
asamal4 merged 1 commit into
lightspeed-core:mainfrom
asamal4:spec-update-2606

asamal4 commented Jun 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsatapat-jpg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

asamal4 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsatapat-jpg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading