Open Problems

This document tracks the key engineering and research problems that must be solved to make the Agent Chronos architecture practical.

This repository is not only about implementation. It is also about identifying, refining, and solving the hardest problems behind the architecture.

How to use this file

You can contribute by:

proposing a new problem
refining an existing problem
suggesting solution directions
building a small prototype or experiment
documenting trade-offs or implementation constraints

When opening an issue, please reference the relevant problem ID when possible.

Problem status

OPEN — important and unresolved
REFINING — problem is known, but needs clearer framing
EXPERIMENTING — candidate solutions are being explored
PARTIALLY SOLVED — a partial solution exists, but is incomplete
SOLVED — sufficiently resolved for the current architecture stage

P1. Workflow runtime mapping

Status: OPEN

How should the 8-phase architecture be mapped to a real workflow runtime?

Why it matters

The architecture is phase-driven, but phase definitions alone are not enough. A real system needs executable control flow, transitions, rollback rules, and state persistence.

Open questions

Should the runtime be graph-based, event-driven, or queue-based?
Should phases be first-class runtime states?
How should side-flows interrupt and re-enter the main flow?

Useful contributions

orchestration design proposals
runtime state diagrams
prototype implementations in an existing workflow framework

P2. Git as state source of truth

Status: OPEN

Can Git serve as the primary state carrier for the system, or is an additional metadata store required?

Why it matters

Git is central to this architecture, but some runtime information may not fit cleanly into commits and branches alone.

Open questions

What belongs in Git, and what belongs outside Git?
Should logs, checkpoints, and experience entries live in the repository?
How do we prevent Git history from becoming noisy or unmanageable?

Useful contributions

Git-only vs hybrid-state comparisons
repository structure proposals
minimal prototypes using Git branches / worktrees as runtime state

P3. Clone isolation model

Status: OPEN

What is the correct isolation unit for a clone during Phase 7?

Why it matters

The architecture depends on safe parallel work. Weak isolation causes interference; overly strong isolation increases complexity and cost.

Open questions

Should clones use branches, worktrees, containers, or temporary sandboxes?
What is the minimum isolation needed for safe parallel function implementation?
How should clone lifecycle creation and cleanup work?

Useful contributions

experiments comparing branch/worktree/container approaches
cost and complexity analysis
cleanup and workspace management designs

P4. Boundary enforcement for clone edits

Status: OPEN

How can the system enforce that a clone only edits its allowed area?

Why it matters

The architecture assumes strict local modification boundaries. Without enforcement, the system collapses back into uncontrolled editing.

Open questions

How should allowed edit regions be declared?
Can AST-based checks enforce function-level edit boundaries?
How should legitimate local helper additions be handled?

Useful contributions

static analysis prototypes
whitelist / patch validation designs
examples of acceptable vs invalid changes

P5. Machine-readable contract format

Status: OPEN

What format should contracts use so they are both readable by humans and actionable by tools?

Why it matters

Contract-first development is one of the architecture’s core ideas. If contracts are too vague, they cannot support safe parallelization.

Open questions

Should contracts be Markdown, JSON, YAML, OpenAPI-like specs, or a mixed format?
How should function signatures, inputs, outputs, errors, and invariants be represented?
How should front-end/back-end alignment be encoded?

Useful contributions

contract schema proposals
example contract documents
contract parsers or validators

P6. Contract-to-scaffold generation

Status: OPEN

How much implementation scaffolding should be generated automatically from contracts?

Why it matters

If contracts are precise enough, they may be able to generate framework code, stubs, interfaces, and test skeletons.

Open questions

What artifacts can be safely generated?
How do we keep generated code aligned with updated contracts?
Should generated code be overwritten, patched, or versioned separately?

Useful contributions

code generation experiments
stub generation tools
regeneration / diff strategies

P7. Dependency graph generation

Status: OPEN

How should function and module dependencies be identified and represented?

Why it matters

Batch planning and checkpoint design depend on a reliable dependency graph.

Open questions

Can dependency graphs be inferred from contracts automatically?
How should shared utilities and implicit dependencies be handled?
What granularity is appropriate: module-level, function-level, or both?

Useful contributions

dependency extraction proposals
visualization prototypes
graph generation from contract documents

P8. Batch partitioning strategy

Status: OPEN

How should the system split implementation work into batches?

Why it matters

Chronos depends on batch-based parallelization. Poor batch design leads to idle time, conflicts, or fragile integration points.

Open questions

Should batches be manually designed, automatically derived, or hybrid?
What optimization target matters most: speed, safety, reviewability, or integration risk?
How large should a batch be?

Useful contributions

batch scheduling heuristics
dependency-aware partitioning strategies
case studies on batch sizing

P9. Checkpoint design

Status: OPEN

What makes a checkpoint useful rather than bureaucratic?

Why it matters

Checkpoints are supposed to reduce rework cost. Too many checkpoints slow the system down; too few allow defects to accumulate.

Open questions

What kinds of checkpoints are essential?
Which checkpoints should be automated?
What objective criteria should decide checkpoint pass/fail?

Useful contributions

checkpoint taxonomy
checkpoint evaluation criteria
lightweight checkpoint process proposals

P10. Experience pool structure

Status: OPEN

How should batch experience be represented, stored, and reused?

Why it matters

The architecture relies on batch-to-batch learning without real-time contamination. The experience pool is central to that design.

Open questions

Should experience entries be free-form summaries or structured records?
How should entries be tagged by scope, validity, and confidence?
How do we prevent noisy or low-quality experience from propagating?

Useful contributions

experience entry schema
scoring / ranking mechanisms
retrieval strategies for batch-relevant experience

P11. Experience quality control

Status: OPEN

How can the system distinguish useful experience from misleading experience?

Why it matters

A bad experience pool may amplify wrong assumptions across later batches.

Open questions

Should experience entries require review before being stored?
Should later outcomes be allowed to invalidate earlier experience?
How should contradictory experience be handled?

Useful contributions

quality filters
review workflows
confidence and expiry models for experience entries

P12. Change request classification

Status: OPEN

How can G / M / S change severity be made more objective?

Why it matters

The architecture treats change handling as a first-class side-flow. If severity classification is vague, change handling becomes inconsistent.

Open questions

What measurable criteria define G, M, and S?
Can affected artifact count or dependency impact help classify changes?
How should user insistence interact with technical risk?

Useful contributions

classification rubrics
example change scenarios
decision trees for change routing

P13. Defect classification and rollback policy

Status: OPEN

How should L1 / L2 / L3 defect classification be made reliable and actionable?

Why it matters

Rollback policy depends on distinguishing implementation defects from contract or architecture defects.

Open questions

What signals indicate a contract defect rather than an implementation defect?
When should a defect force rework of completed functions?
How should rollback scope be computed?

Useful contributions

defect classification guidelines
defect examples with routing decisions
rollback scope algorithms

P14. Review agent reliability

Status: OPEN

How can review agents avoid becoming superficial or rubber-stamping reviewers?

Why it matters

The architecture assumes review is meaningful. Weak review undermines all later governance mechanisms.

Open questions

What should review agents check automatically?
What should require human review?
How can review quality be measured?

Useful contributions

review checklists
review rubric design
reviewer calibration experiments

P15. Human-in-the-loop policy

Status: OPEN

Where should human intervention be optional, recommended, or mandatory?

Why it matters

A fully autonomous interpretation may be unsafe or unrealistic, but excessive human intervention weakens the automation value.

Open questions

Which phases should always allow human override?
Which side-flow decisions should require human approval?
How should human feedback be stored and propagated?

Useful contributions

HITL policy proposals
governance boundary recommendations
practical examples from existing agent systems

P16. Architecture-to-framework adaptation

Status: OPEN

Which existing workflow or agent frameworks are the best substrate for implementing Chronos?

Why it matters

This architecture does not need to be implemented from scratch. Reusing an existing framework may dramatically reduce MVP cost.

Open questions

Is LangGraph the best fit for phase/state control?
Is CrewAI better for role-based collaboration?
Is a visual orchestration layer like n8n or Langflow useful for early prototyping?

Useful contributions

comparison documents
proof-of-concept adaptations
framework selection criteria

P17. MVP scope definition

Status: OPEN

What is the smallest version of Chronos that is still meaningfully Chronos?

Why it matters

A full implementation is too large for an early-stage project. A clear MVP boundary is necessary to attract contributors and validate ideas.

Open questions

Which phases are essential for a first MVP?
Which mechanisms can be simulated manually at first?
What should be excluded from v0.1?

Useful contributions

MVP proposals
phased rollout plans
“core vs optional” architecture breakdowns

P18. Evaluation methodology

Status: OPEN

How should the architecture be evaluated fairly?

Why it matters

Claims about cost reduction, traceability, and reduced rework need evidence.

Open questions

What baseline should Chronos be compared against?
What metrics matter most: token cost, elapsed time, defect rate, rollback rate, integration failures?
What tasks should be used for benchmarking?

Useful contributions

benchmark design
evaluation metrics
experiment plans comparing Chronos-style workflows with simpler agent workflows

P19. Repository and artifact structure

Status: OPEN

What repository layout best supports phase artifacts, generated outputs, logs, contracts, and code?

Why it matters

The architecture depends on artifact clarity. A poor repository structure will make governance harder, not easier.

Open questions

Should each phase own its own directory?
How should temp files be stored and cleaned?
How should generated artifacts be separated from canonical artifacts?

Useful contributions

repository layout proposals
sample project structures
file lifecycle rules

P20. Contributor-friendly research process

Status: OPEN

How can the project remain open to low-barrier contributors while still maintaining rigor?

Why it matters

This project should allow contribution through problem discovery, design analysis, and validation work, not only code.

Open questions

What issue templates best support high-quality problem proposals?
How should accepted problems be tracked?
How should non-code contributors be credited?

Useful contributions

contribution process design
issue template suggestions
contributor credit guidelines

How to propose a new problem

If you think an important problem is missing, open an issue with:

a clear title
the problem statement
why it matters
where it appears in the architecture
possible solution directions (optional)

High-quality problem discovery is a real contribution to this project.

FilesExpand file tree

open-problems.md

Latest commit

History

open-problems.md

File metadata and controls

Open Problems

How to use this file

Problem status

P1. Workflow runtime mapping

Why it matters

Open questions

Useful contributions

P2. Git as state source of truth

Why it matters

Open questions

Useful contributions

P3. Clone isolation model

Why it matters

Open questions

Useful contributions

P4. Boundary enforcement for clone edits

Why it matters

Open questions

Useful contributions

P5. Machine-readable contract format

Why it matters

Open questions

Useful contributions

P6. Contract-to-scaffold generation

Why it matters

Open questions

Useful contributions

P7. Dependency graph generation

Why it matters

Open questions

Useful contributions

P8. Batch partitioning strategy

Why it matters

Open questions

Useful contributions

P9. Checkpoint design

Why it matters

Open questions

Useful contributions

P10. Experience pool structure

Why it matters

Open questions

Useful contributions

P11. Experience quality control

Why it matters

Open questions

Useful contributions

P12. Change request classification

Why it matters

Open questions

Useful contributions

P13. Defect classification and rollback policy

Why it matters

Open questions

Useful contributions

P14. Review agent reliability

Why it matters

Open questions

Useful contributions

P15. Human-in-the-loop policy

Why it matters

Open questions

Useful contributions

P16. Architecture-to-framework adaptation

Why it matters

Open questions

Useful contributions

P17. MVP scope definition

Why it matters

Open questions

Useful contributions

P18. Evaluation methodology

Why it matters

Open questions