feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop by ryaneggz · Pull Request #901 · mifunedev/orchestra

ryaneggz · 2026-03-25T01:21:12Z

Summary

Adds spec and implementation plan for the Evaluator Agent feature — a GAN-inspired generator/evaluator feedback loop that separates generation from evaluation using dedicated evaluator assistants
Introduces EvaluatorConfig on the Assistant schema with structured grading criteria (named dimensions, weights, thresholds) and a configurable generate-evaluate-critique-regenerate orchestration loop
Spec covers backend schemas, LLMController orchestration, default evaluator prompt, frontend display, Alembic migration, and Playwright MCP integration for UI testing

Closes #899

Spec & Plan

Spec: .claude/specs/evaluator-agent.md
Plan: .claude/plans/evaluator-agent.md

Implementation Phases

Schema & Data Layer (evaluation models + migration)
Orchestration Loop (generate-evaluate-critique-regenerate in LLMController)
Default Evaluator Prompt (skeptical-posture system prompt)
Frontend Display (EvaluationBadge component)
Testing & Documentation (unit/integration tests + example notebook)

Test plan

Unit tests for EvaluationCriterion, EvaluatorConfig, EvaluationResult schema validation
Unit tests for loop termination at max iterations
Integration test for end-to-end evaluator rejection and generator retry
Integration test for Playwright MCP evaluator interaction
Frontend tests for EvaluationBadge rendering states
Manual test: create generator + evaluator pair, run conversation, verify scores display

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

coderabbitai · 2026-03-25T01:21:19Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50e2859e-02ab-4ff6-82a7-19bd16f9ef70

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/evaluator-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chore: add evaluator agent spec and implementation plan

0941543

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop#901

feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop#901
ryaneggz wants to merge 1 commit intodevelopmentfrom
feat/evaluator-agent

ryaneggz commented Mar 25, 2026

Uh oh!

coderabbitai Bot commented Mar 25, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ryaneggz commented Mar 25, 2026

Summary

Spec & Plan

Implementation Phases

Test plan

Uh oh!

coderabbitai Bot commented Mar 25, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant