Summary
Implement a Sprint Contract negotiation protocol between generator and evaluator agents. Before each sprint (work chunk), the generator proposes a SprintContract containing features to build and testable acceptance criteria. The evaluator reviews, can counter-propose, and once both sides agree, the contract is persisted and used as the grading rubric.
This mirrors Anthropic's harness pattern where sprint contracts contained 27+ testable criteria per feature chunk, bridging the gap between high-level product specs and testable implementation.
Key Components
SprintContract schema with TestCriterion entries, status lifecycle (proposed -> negotiating -> agreed -> in_progress -> completed/failed)
- Negotiation flow in LLM controller: generator proposes, evaluator reviews, configurable max rounds (default 3)
- File-based communication via StateBackend (
/contracts/sprint-N.md)
- Contract-aware evaluation: evaluator grades each criterion individually, reports "17/27 criteria passed" style results
- Frontend panels:
ContractPanel.tsx for viewing contracts, CriteriaChecklist.tsx for pass/fail indicators
Dependencies
- Requires evaluator-agent (hard dependency for negotiation and grading)
- Best used with planner-agent (soft dependency for generating high-level specs)
Spec
See .claude/specs/sprint-contracts.md for full implementation details.
File Changes
| File |
Action |
backend/src/schemas/entities/sprint_contract.py |
New |
backend/src/controllers/llm.py |
Modify |
frontend/src/components/panels/ContractPanel.tsx |
New |
frontend/src/components/lists/CriteriaChecklist.tsx |
New |
backend/tests/unit/test_sprint_contract.py |
New |
backend/tests/integration/test_sprint_contract_flow.py |
New |
Summary
Implement a Sprint Contract negotiation protocol between generator and evaluator agents. Before each sprint (work chunk), the generator proposes a
SprintContractcontaining features to build and testable acceptance criteria. The evaluator reviews, can counter-propose, and once both sides agree, the contract is persisted and used as the grading rubric.This mirrors Anthropic's harness pattern where sprint contracts contained 27+ testable criteria per feature chunk, bridging the gap between high-level product specs and testable implementation.
Key Components
SprintContractschema withTestCriterionentries, status lifecycle (proposed->negotiating->agreed->in_progress->completed/failed)/contracts/sprint-N.md)ContractPanel.tsxfor viewing contracts,CriteriaChecklist.tsxfor pass/fail indicatorsDependencies
Spec
See
.claude/specs/sprint-contracts.mdfor full implementation details.File Changes
backend/src/schemas/entities/sprint_contract.pybackend/src/controllers/llm.pyfrontend/src/components/panels/ContractPanel.tsxfrontend/src/components/lists/CriteriaChecklist.tsxbackend/tests/unit/test_sprint_contract.pybackend/tests/integration/test_sprint_contract_flow.py