This document tracks the key engineering and research problems that must be solved to make the Agent Chronos architecture practical.
This repository is not only about implementation. It is also about identifying, refining, and solving the hardest problems behind the architecture.
You can contribute by:
- proposing a new problem
- refining an existing problem
- suggesting solution directions
- building a small prototype or experiment
- documenting trade-offs or implementation constraints
When opening an issue, please reference the relevant problem ID when possible.
OPEN— important and unresolvedREFINING— problem is known, but needs clearer framingEXPERIMENTING— candidate solutions are being exploredPARTIALLY SOLVED— a partial solution exists, but is incompleteSOLVED— sufficiently resolved for the current architecture stage
Status: OPEN
How should the 8-phase architecture be mapped to a real workflow runtime?
The architecture is phase-driven, but phase definitions alone are not enough. A real system needs executable control flow, transitions, rollback rules, and state persistence.
- Should the runtime be graph-based, event-driven, or queue-based?
- Should phases be first-class runtime states?
- How should side-flows interrupt and re-enter the main flow?
- orchestration design proposals
- runtime state diagrams
- prototype implementations in an existing workflow framework
Status: OPEN
Can Git serve as the primary state carrier for the system, or is an additional metadata store required?
Git is central to this architecture, but some runtime information may not fit cleanly into commits and branches alone.
- What belongs in Git, and what belongs outside Git?
- Should logs, checkpoints, and experience entries live in the repository?
- How do we prevent Git history from becoming noisy or unmanageable?
- Git-only vs hybrid-state comparisons
- repository structure proposals
- minimal prototypes using Git branches / worktrees as runtime state
Status: OPEN
What is the correct isolation unit for a clone during Phase 7?
The architecture depends on safe parallel work. Weak isolation causes interference; overly strong isolation increases complexity and cost.
- Should clones use branches, worktrees, containers, or temporary sandboxes?
- What is the minimum isolation needed for safe parallel function implementation?
- How should clone lifecycle creation and cleanup work?
- experiments comparing branch/worktree/container approaches
- cost and complexity analysis
- cleanup and workspace management designs
Status: OPEN
How can the system enforce that a clone only edits its allowed area?
The architecture assumes strict local modification boundaries. Without enforcement, the system collapses back into uncontrolled editing.
- How should allowed edit regions be declared?
- Can AST-based checks enforce function-level edit boundaries?
- How should legitimate local helper additions be handled?
- static analysis prototypes
- whitelist / patch validation designs
- examples of acceptable vs invalid changes
Status: OPEN
What format should contracts use so they are both readable by humans and actionable by tools?
Contract-first development is one of the architecture’s core ideas. If contracts are too vague, they cannot support safe parallelization.
- Should contracts be Markdown, JSON, YAML, OpenAPI-like specs, or a mixed format?
- How should function signatures, inputs, outputs, errors, and invariants be represented?
- How should front-end/back-end alignment be encoded?
- contract schema proposals
- example contract documents
- contract parsers or validators
Status: OPEN
How much implementation scaffolding should be generated automatically from contracts?
If contracts are precise enough, they may be able to generate framework code, stubs, interfaces, and test skeletons.
- What artifacts can be safely generated?
- How do we keep generated code aligned with updated contracts?
- Should generated code be overwritten, patched, or versioned separately?
- code generation experiments
- stub generation tools
- regeneration / diff strategies
Status: OPEN
How should function and module dependencies be identified and represented?
Batch planning and checkpoint design depend on a reliable dependency graph.
- Can dependency graphs be inferred from contracts automatically?
- How should shared utilities and implicit dependencies be handled?
- What granularity is appropriate: module-level, function-level, or both?
- dependency extraction proposals
- visualization prototypes
- graph generation from contract documents
Status: OPEN
How should the system split implementation work into batches?
Chronos depends on batch-based parallelization. Poor batch design leads to idle time, conflicts, or fragile integration points.
- Should batches be manually designed, automatically derived, or hybrid?
- What optimization target matters most: speed, safety, reviewability, or integration risk?
- How large should a batch be?
- batch scheduling heuristics
- dependency-aware partitioning strategies
- case studies on batch sizing
Status: OPEN
What makes a checkpoint useful rather than bureaucratic?
Checkpoints are supposed to reduce rework cost. Too many checkpoints slow the system down; too few allow defects to accumulate.
- What kinds of checkpoints are essential?
- Which checkpoints should be automated?
- What objective criteria should decide checkpoint pass/fail?
- checkpoint taxonomy
- checkpoint evaluation criteria
- lightweight checkpoint process proposals
Status: OPEN
How should batch experience be represented, stored, and reused?
The architecture relies on batch-to-batch learning without real-time contamination. The experience pool is central to that design.
- Should experience entries be free-form summaries or structured records?
- How should entries be tagged by scope, validity, and confidence?
- How do we prevent noisy or low-quality experience from propagating?
- experience entry schema
- scoring / ranking mechanisms
- retrieval strategies for batch-relevant experience
Status: OPEN
How can the system distinguish useful experience from misleading experience?
A bad experience pool may amplify wrong assumptions across later batches.
- Should experience entries require review before being stored?
- Should later outcomes be allowed to invalidate earlier experience?
- How should contradictory experience be handled?
- quality filters
- review workflows
- confidence and expiry models for experience entries
Status: OPEN
How can G / M / S change severity be made more objective?
The architecture treats change handling as a first-class side-flow. If severity classification is vague, change handling becomes inconsistent.
- What measurable criteria define G, M, and S?
- Can affected artifact count or dependency impact help classify changes?
- How should user insistence interact with technical risk?
- classification rubrics
- example change scenarios
- decision trees for change routing
Status: OPEN
How should L1 / L2 / L3 defect classification be made reliable and actionable?
Rollback policy depends on distinguishing implementation defects from contract or architecture defects.
- What signals indicate a contract defect rather than an implementation defect?
- When should a defect force rework of completed functions?
- How should rollback scope be computed?
- defect classification guidelines
- defect examples with routing decisions
- rollback scope algorithms
Status: OPEN
How can review agents avoid becoming superficial or rubber-stamping reviewers?
The architecture assumes review is meaningful. Weak review undermines all later governance mechanisms.
- What should review agents check automatically?
- What should require human review?
- How can review quality be measured?
- review checklists
- review rubric design
- reviewer calibration experiments
Status: OPEN
Where should human intervention be optional, recommended, or mandatory?
A fully autonomous interpretation may be unsafe or unrealistic, but excessive human intervention weakens the automation value.
- Which phases should always allow human override?
- Which side-flow decisions should require human approval?
- How should human feedback be stored and propagated?
- HITL policy proposals
- governance boundary recommendations
- practical examples from existing agent systems
Status: OPEN
Which existing workflow or agent frameworks are the best substrate for implementing Chronos?
This architecture does not need to be implemented from scratch. Reusing an existing framework may dramatically reduce MVP cost.
- Is LangGraph the best fit for phase/state control?
- Is CrewAI better for role-based collaboration?
- Is a visual orchestration layer like n8n or Langflow useful for early prototyping?
- comparison documents
- proof-of-concept adaptations
- framework selection criteria
Status: OPEN
What is the smallest version of Chronos that is still meaningfully Chronos?
A full implementation is too large for an early-stage project. A clear MVP boundary is necessary to attract contributors and validate ideas.
- Which phases are essential for a first MVP?
- Which mechanisms can be simulated manually at first?
- What should be excluded from v0.1?
- MVP proposals
- phased rollout plans
- “core vs optional” architecture breakdowns
Status: OPEN
How should the architecture be evaluated fairly?
Claims about cost reduction, traceability, and reduced rework need evidence.
- What baseline should Chronos be compared against?
- What metrics matter most: token cost, elapsed time, defect rate, rollback rate, integration failures?
- What tasks should be used for benchmarking?
- benchmark design
- evaluation metrics
- experiment plans comparing Chronos-style workflows with simpler agent workflows
Status: OPEN
What repository layout best supports phase artifacts, generated outputs, logs, contracts, and code?
The architecture depends on artifact clarity. A poor repository structure will make governance harder, not easier.
- Should each phase own its own directory?
- How should temp files be stored and cleaned?
- How should generated artifacts be separated from canonical artifacts?
- repository layout proposals
- sample project structures
- file lifecycle rules
Status: OPEN
How can the project remain open to low-barrier contributors while still maintaining rigor?
This project should allow contribution through problem discovery, design analysis, and validation work, not only code.
- What issue templates best support high-quality problem proposals?
- How should accepted problems be tracked?
- How should non-code contributors be credited?
- contribution process design
- issue template suggestions
- contributor credit guidelines
If you think an important problem is missing, open an issue with:
- a clear title
- the problem statement
- why it matters
- where it appears in the architecture
- possible solution directions (optional)
High-quality problem discovery is a real contribution to this project.