-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
π― Goal
π Context
β Scope
β Out of Scope
π Deliverables
βOpen questions
π― Goal
- Explore and prototype approaches to a self-improving coding agent across the stack (model, harness, tools, IDE/CLI), with initial focus on interface/data layer and benchmarks
π Context
- Near-term priority: define the interface with Platform (data structures, caching, agent output schema)
- Research threads: open-endedness, multi-agent, tool discovery, and evals (SWE-bench, Toolformer, others)
- Team owns full stack (model β infra), enabling deeper optimization than competitors
β Scope
- Specify and implement the interface layer between agents and Platform (schemas, caching, contracts)
- Baseline coding agent with harness + tool calls on foundation models
- Set up internal eval harness and initial benchmarks (e.g., SWE-bench, Toolformer)
- Literature review and hypotheses list for agent improvement (training-free and training-in-loop paths)
β Out of Scope
- Full productized IDE beyond internal needs
- Commitment to a single research approach (e.g., only self-play/open-endedness)
- Training a bespoke model as the only path; keep options open
- Public release/marketing features
π Deliverables
- Interface spec: data schemas, API contracts, caching strategy, agent output structure
- Minimal agent prototype integrated with CLI/IDE for internal use
- Eval setup: runnable benchmarks + reporting (QF, SWE-bench or equivalent)
- Literature review doc and shared reading list; summarized insights and next-step experiments
- Week-by-week plan to split work across IDE, harness, Platform interface, and research
Metadata
Metadata
Assignees
Labels
No labels