Skip to content

goal: (research) open-endedness, dq & self-play for coding agentΒ #211

@hermit46

Description

@hermit46

🎯 Goal

πŸ“– Context

βœ… Scope

❌ Out of Scope

πŸ›  Deliverables

❓Open questions

🎯 Goal

  • Explore and prototype approaches to a self-improving coding agent across the stack (model, harness, tools, IDE/CLI), with initial focus on interface/data layer and benchmarks

πŸ“– Context

  • Near-term priority: define the interface with Platform (data structures, caching, agent output schema)
  • Research threads: open-endedness, multi-agent, tool discovery, and evals (SWE-bench, Toolformer, others)
  • Team owns full stack (model β†’ infra), enabling deeper optimization than competitors

βœ… Scope

  • Specify and implement the interface layer between agents and Platform (schemas, caching, contracts)
  • Baseline coding agent with harness + tool calls on foundation models
  • Set up internal eval harness and initial benchmarks (e.g., SWE-bench, Toolformer)
  • Literature review and hypotheses list for agent improvement (training-free and training-in-loop paths)

❌ Out of Scope

  • Full productized IDE beyond internal needs
  • Commitment to a single research approach (e.g., only self-play/open-endedness)
  • Training a bespoke model as the only path; keep options open
  • Public release/marketing features

πŸ›  Deliverables

  • Interface spec: data schemas, API contracts, caching strategy, agent output structure
  • Minimal agent prototype integrated with CLI/IDE for internal use
  • Eval setup: runnable benchmarks + reporting (QF, SWE-bench or equivalent)
  • Literature review doc and shared reading list; summarized insights and next-step experiments
  • Week-by-week plan to split work across IDE, harness, Platform interface, and research

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions