Skip to content

feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop#901

Draft
ryaneggz wants to merge 1 commit intodevelopmentfrom
feat/evaluator-agent
Draft

feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop#901
ryaneggz wants to merge 1 commit intodevelopmentfrom
feat/evaluator-agent

Conversation

@ryaneggz
Copy link
Copy Markdown
Collaborator

Summary

  • Adds spec and implementation plan for the Evaluator Agent feature — a GAN-inspired generator/evaluator feedback loop that separates generation from evaluation using dedicated evaluator assistants
  • Introduces EvaluatorConfig on the Assistant schema with structured grading criteria (named dimensions, weights, thresholds) and a configurable generate-evaluate-critique-regenerate orchestration loop
  • Spec covers backend schemas, LLMController orchestration, default evaluator prompt, frontend display, Alembic migration, and Playwright MCP integration for UI testing

Closes #899

Spec & Plan

Implementation Phases

  1. Schema & Data Layer (evaluation models + migration)
  2. Orchestration Loop (generate-evaluate-critique-regenerate in LLMController)
  3. Default Evaluator Prompt (skeptical-posture system prompt)
  4. Frontend Display (EvaluationBadge component)
  5. Testing & Documentation (unit/integration tests + example notebook)

Test plan

  • Unit tests for EvaluationCriterion, EvaluatorConfig, EvaluationResult schema validation
  • Unit tests for loop termination at max iterations
  • Integration test for end-to-end evaluator rejection and generator retry
  • Integration test for Playwright MCP evaluator interaction
  • Frontend tests for EvaluationBadge rendering states
  • Manual test: create generator + evaluator pair, run conversation, verify scores display

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50e2859e-02ab-4ff6-82a7-19bd16f9ef70

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/evaluator-agent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Evaluator Agent — GAN-inspired generator/evaluator feedback loop

1 participant