Skip to content

chore: spec update 260626#261

Merged
asamal4 merged 1 commit into
lightspeed-core:mainfrom
asamal4:spec-update-2606
Jun 26, 2026
Merged

chore: spec update 260626#261
asamal4 merged 1 commit into
lightspeed-core:mainfrom
asamal4:spec-update-2606

Conversation

@asamal4

@asamal4 asamal4 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Description

(Release Readiness) Updated spec files as per latest development.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Documentation
    • Updated evaluation and agent-driver docs to better describe supported workflows, configuration options, and runtime behavior.
    • Clarified how metrics, scoring, token tracking, and error handling work across turn-level and conversation-level evaluation.
    • Added guidance for new storage/reporting options, including Langfuse support and visualization settings.
    • Reworked formatting rules and structure across spec files for easier maintenance and clearer reference.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@asamal4, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 12 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3ec5198-f788-45dc-be99-67107dbf8b8f

📥 Commits

Reviewing files that changed from the base of the PR and between 9d23554 and 8fbfaec.

📒 Files selected for processing (12)
  • .ai/spec/README.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/how/output-and-storage.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/what/agent-drivers.md
  • .ai/spec/what/evaluation-pipeline.md
  • .ai/spec/what/llm-and-judges.md
  • .ai/spec/what/metrics.md
  • .ai/spec/what/output-and-reporting.md
  • .ai/spec/what/system-overview.md

Walkthrough

The PR updates .ai/spec docs to describe revised evaluation pipeline rules, agent driver behavior, metric and judge semantics, storage/reporting behavior, and related module maps. It also changes the spec README convention for behavioral rules from numbered lists to bullets.

Changes

Evaluation docs and module maps

Layer / File(s) Summary
Execution model and pipeline
.ai/spec/what/system-overview.md, .ai/spec/what/evaluation-pipeline.md
Evaluation levels, metric resolution, turn/conversation processing, error handling, and thread-pool defaults are rewritten as bullet rules.
Agent driver behavior and config
.ai/spec/what/agent-drivers.md, .ai/spec/how/agent-drivers.md
HTTP and proposal drivers are split into separate behavioral and data-flow descriptions, with updated config tables, proposal lifecycle rules, and implementation mappings.
Metric and judge rules
.ai/spec/what/metrics.md, .ai/spec/what/llm-and-judges.md, .ai/spec/how/metrics-implementation.md
Metric selection, prerequisites, dispatch, token tracking, embeddings, and proposal-status metric references are updated.
Storage and reporting
.ai/spec/what/output-and-reporting.md, .ai/spec/how/output-and-storage.md
Reporting formats, storage backend configuration, and Langfuse lifecycle/score handling are expanded.
Spec conventions and module maps
.ai/spec/README.md, .ai/spec/how/configuration-and-models.md, .ai/spec/how/project-structure.md
The behavioral-rule formatting convention switches to bullets, and module maps are revised for agent configs and package responsibilities.

Estimated review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • VladimirKadlec
  • xmican10
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is generic and doesn't clearly describe the spec changes, so it doesn't convey the main update. Use a more specific title naming the affected spec area or feature changes, such as the new agent, metrics, or storage documentation updates.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
.ai/spec/what/agent-drivers.md (1)

58-69: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Standardize required-field notation in configuration tables.

ProposalAgentConfig.namespace uses "(required)" in the Default column, while other required fields (e.g., HttpApiAgentConfig.model, Shared.default.agent) use "—". Use consistent notation across all configuration tables.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/agent-drivers.md around lines 58 - 69, Standardize the
required-field notation in ProposalAgentConfig so it matches the rest of the
configuration tables. Update the ProposalAgentConfig table entry for
agents.<id>.namespace to use the same required placeholder convention as
HttpApiAgentConfig.model and Shared.default.agent, keeping the rest of the
ProposalAgentConfig fields unchanged.
.ai/spec/what/system-overview.md (2)

21-24: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Clarify metric resolution precedence to avoid ambiguity.

The description "Override keys win, but non-overlapping system default keys are preserved" is accurate but could be more explicit about the merge order. Consider stating the precedence hierarchy explicitly (system defaults → level-specific overrides) to match the implementation in metrics-implementation.md.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/system-overview.md around lines 21 - 24, Clarify the metric
metadata merge order in the system overview so it explicitly states the
precedence hierarchy as system defaults first, then level-specific overrides,
with override keys taking priority while preserving non-overlapping default
keys. Update the metric resolution wording in the overview to match the behavior
described by the metrics metadata rules and remove any ambiguity about how
turn-level and conversation-level metadata are applied.

28-31: 🧹 Nitpick | 🔵 Trivial

Document missing conversation-level skip_on_failure override in system-overview.md

The distinction between metric evaluation failures (respect skip_on_failure) and Agent API errors (always cascade) is correctly and consistently documented across system-overview.md and evaluation-pipeline.md.

However, evaluation-pipeline.md documents a Conversation-level skip_on_failure override (per-conversation setting), which is absent from the configuration table and execution model description in system-overview.md. This omission should be addressed to ensure developers are aware of the per-conversation override capability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/what/system-overview.md around lines 28 - 31, The system-overview
documentation is missing the per-conversation skip-on-failure override described
in evaluation-pipeline.md. Update the configuration table and the execution
model section in system-overview.md to mention that conversation-level settings
can override the global skip_on_failure behavior for a specific conversation,
and align the wording with the existing conversation/turn evaluation rules.
.ai/spec/how/project-structure.md (1)

18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Fix redundant acronym: "CLI interface" → "CLI".

"CLI" already stands for "command-line interface"; appending "interface" is tautological.

📝 Suggested fix
-| `pipeline/evaluation/cli.py` | `CLIClient`, `KubeCLI` | Abstract CLI interface and Kubernetes (oc/kubectl) implementation |
+| `pipeline/evaluation/cli.py` | `CLIClient`, `KubeCLI` | Abstract CLI and Kubernetes (oc/kubectl) implementation |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/spec/how/project-structure.md at line 18, The project-structure entry
for pipeline/evaluation/cli.py uses the redundant phrase “CLI interface”; update
that description to just “CLI” so it is concise and non-tautological. Adjust the
table row text referencing CLIClient and KubeCLI to keep the wording consistent
with the rest of the document while preserving the meaning of the abstract
command-line tooling section.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.ai/spec/what/agent-drivers.md:
- Around line 22-31: The ProposalDriver spec is inconsistent about the
auto_approve gate: it says approval happens once the Proposal CR exists, but the
implementation waits for the Analyzed=True condition before creating the
ProposalApproval CR. Update the ProposalDriver description to reflect the actual
polling condition and use the same terminology as the driver flow so the
documented lifecycle matches the behavior of ProposalDriver and its auto_approve
path.

In @.ai/spec/what/evaluation-pipeline.md:
- Around line 14-20: The “saved to storage atomically” wording in the
Conversation Processing section is too strong for the current behavior in
ConversationProcessor, since StorageError is only warned on and processing
continues. Reword the description to reflect best-effort batch persistence (for
example, “batch saved to storage”) unless the storage layer actually guarantees
atomic commits, and keep the terminology consistent with the turn-results
collection and save flow.

In @.ai/spec/what/output-and-reporting.md:
- Around line 52-54: The storage configuration table currently implies that all
SQL backends use storage[].host, which is misleading for SQLite. Update the
description in the output-and-reporting spec so storage[].host is documented as
the database host for remote SQL backends (postgres, mysql) or Langfuse host
URL, while making clear that SQLite uses storage[].database for the local file
path and does not require host. Keep the wording aligned with the existing
storage[].type and storage[].database entries so the backend-specific
requirements are unambiguous.

---

Nitpick comments:
In @.ai/spec/how/project-structure.md:
- Line 18: The project-structure entry for pipeline/evaluation/cli.py uses the
redundant phrase “CLI interface”; update that description to just “CLI” so it is
concise and non-tautological. Adjust the table row text referencing CLIClient
and KubeCLI to keep the wording consistent with the rest of the document while
preserving the meaning of the abstract command-line tooling section.

In @.ai/spec/what/agent-drivers.md:
- Around line 58-69: Standardize the required-field notation in
ProposalAgentConfig so it matches the rest of the configuration tables. Update
the ProposalAgentConfig table entry for agents.<id>.namespace to use the same
required placeholder convention as HttpApiAgentConfig.model and
Shared.default.agent, keeping the rest of the ProposalAgentConfig fields
unchanged.

In @.ai/spec/what/system-overview.md:
- Around line 21-24: Clarify the metric metadata merge order in the system
overview so it explicitly states the precedence hierarchy as system defaults
first, then level-specific overrides, with override keys taking priority while
preserving non-overlapping default keys. Update the metric resolution wording in
the overview to match the behavior described by the metrics metadata rules and
remove any ambiguity about how turn-level and conversation-level metadata are
applied.
- Around line 28-31: The system-overview documentation is missing the
per-conversation skip-on-failure override described in evaluation-pipeline.md.
Update the configuration table and the execution model section in
system-overview.md to mention that conversation-level settings can override the
global skip_on_failure behavior for a specific conversation, and align the
wording with the existing conversation/turn evaluation rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 87b6b351-25e3-4bdf-b60a-d326d3101c9d

📥 Commits

Reviewing files that changed from the base of the PR and between 30f5c59 and 9d23554.

📒 Files selected for processing (12)
  • .ai/spec/README.md
  • .ai/spec/how/agent-drivers.md
  • .ai/spec/how/configuration-and-models.md
  • .ai/spec/how/metrics-implementation.md
  • .ai/spec/how/output-and-storage.md
  • .ai/spec/how/project-structure.md
  • .ai/spec/what/agent-drivers.md
  • .ai/spec/what/evaluation-pipeline.md
  • .ai/spec/what/llm-and-judges.md
  • .ai/spec/what/metrics.md
  • .ai/spec/what/output-and-reporting.md
  • .ai/spec/what/system-overview.md

Comment thread .ai/spec/what/agent-drivers.md
Comment thread .ai/spec/what/evaluation-pipeline.md
Comment thread .ai/spec/what/output-and-reporting.md Outdated
@asamal4 asamal4 force-pushed the spec-update-2606 branch from 9d23554 to 8fbfaec Compare June 26, 2026 11:49

@bsatapat-jpg bsatapat-jpg left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@asamal4 asamal4 merged commit 27b4aa2 into lightspeed-core:main Jun 26, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants