AI-First Software Engineering

This repository is a technical book and a working research artifact. The book is developed via an autonomous edit loop operating inside a Git repository under explicit governance and evaluation rules. Its state-runner objective is to use the Copilot Python SDK as the primary LLM integration while preserving deterministic kernel behavior, governance constraints, and ledger/resource accounting.

What “AI-first software engineering” means (in this book)

AI-first software engineering treats the harness as the primary design surface.

Model: the reasoning component that proposes plans and edits.
Harness: the engineered environment that makes work predictable (tools, constraints, evaluation gates, traces, state).

The working hypothesis is that most reliability gains in real repositories come from harness design: better tool contracts, tighter constraints, stronger verification, and better observability.

Repository layout

book/chapters/: chapter drafts (structured skeletons).
book/patterns/: reusable engineering patterns.
book/glossary.md: operational definitions used throughout.
evals/: declarative evaluation rules (quality, drift, and style guardrails).
state/: iteration state (ledger, version map, metrics).

Governance is defined in:

CONSTITUTION.md (immutable principles)
AGENTS.md (operational rules for autonomous work)

Agent loop (operational)

Work is performed in bounded iterations:

Plan (what will change; how it will be evaluated)
Generate artifact (apply minimal diffs to targeted files)
Self-evaluate (check against explicit criteria; record evidence)
Refine (fix gaps; reduce ambiguity)
Commit (commit the iteration)
Log trace (write/update state artifacts)

How evaluation is run

Evaluations are defined as contracts in evals/*.yaml. A harness can implement them mechanically.

For a minimal local check today (structure only), run:

python - <<'PY'
from pathlib import Path

required = [
 '## Thesis',
 '## Why This Matters',
 '## System Breakdown',
 '## Concrete Example 1',
 '## Concrete Example 2',
 '## Trade-offs',
 '## Failure Modes',
 '## Research Directions',
]

bad = []
for p in sorted(Path('book/chapters').glob('*.md')):
 text = p.read_text(encoding='utf-8')
 missing = [h for h in required if h not in text]
 if missing:
  bad.append((p.as_posix(), missing))

if bad:
 for path, missing in bad:
  print('FAIL', path)
  for h in missing:
   print('  missing:', h)
 raise SystemExit(1)

print('OK: chapter skeleton headings present')
PY

This check is intentionally narrow; the stricter requirements (tone, drift, evidence, and “no vague claims”) are captured in the YAML rules and are expected to be enforced by the harness.

Deterministic kernel (Planner → Writer → Critic)

The repository includes a deterministic refinement kernel at state/kernel.py.

By default, the kernel is file-driven: it prepares inputs under state/role_io/<chapter-id>/iter_XX/in/ and expects role outputs under .../out/. You can develop new iterations by following the deterministic loop:

Scaffold the next iteration I/O layout for the target chapter:

python state/role_io_templates.py --chapter-id 01-paradigm-shift

Populate the role outputs (planner, writer, and critic) either manually or with your own tooling. The kernel requires:

state/role_io/01-paradigm-shift/iter_XX/out/planner.json
state/role_io/01-paradigm-shift/iter_XX/out/writer.md
state/role_io/01-paradigm-shift/iter_XX/out/critic.json

Execute the kernel, which validates the provided outputs and enforces the evaluation contracts:

python state/kernel.py --chapter-id 01-paradigm-shift

Optional: LLM-powered role outputs

Passing --llm lets the kernel auto-generate any missing planner, writer, or critic outputs by calling the configured LLM provider. Each run will still enforce the same deterministic eval gates and record raw prompt/response traces under out/_llm_trace/.

For example, with the Copilot SDK:

export COPILOT_API_KEY=...              # optional BYOK (or set KERNEL_LLM_API_KEY_ENV)
export KERNEL_LLM_PROVIDER=copilot
export KERNEL_LLM_MODEL=YOUR_MODEL_NAME

python state/kernel.py --chapter-id 01-paradigm-shift --llm

Diagrams

Agent loop

flowchart TD
 A[Plan] --> B[Generate Artifact]
 B --> C[Self-evaluate]
 C --> D{Meets gates?}
 D -- No --> E[Refine]
 E --> C
 D -- Yes --> F[Commit]
 F --> G[Log trace / update state]

Book system architecture

flowchart LR
 subgraph Governance
  CON[CONSTITUTION.md]
  AG[AGENTS.md]
 end

 subgraph Content
  CH[book/chapters/*]
  PAT[book/patterns/*]
  GLO[book/glossary.md]
 end

 subgraph Evaluation
  E1[evals/chapter-quality.yaml]
  E2[evals/style-guard.yaml]
  E3[evals/drift-detection.yaml]
 end

 subgraph State
  L[state/ledger.json]
  V[state/version_map.json]
  M[state/metrics.json]
 end

 CON --> Content
 AG --> Content
 Content --> Evaluation
 Evaluation --> State

Copilot autopilot execution

The Copilot autopilot mode reads the prompts/chapter-revision/execute.md playbook and dispatches the iterative kernel run for each chapter.
Each pass instructs kernel.py to execute the deterministic planner → writer → critic loop, logging artifacts under state/role_io/<chapter-id>/iter_XX.
If the kernel or prompt feedback identifies a code issue, the parent Copilot orchestrator spawns a dedicated subagent (e.g., code-reviewer, test) to investigate, fix, and re-validate before the next pass.
Once a pass finishes without blockers, the orchestrator records the commit, updates the ledger, and moves to the next chapter.

flowchart TD
    A[Copilot autopilot]
    B[Load prompts/chapter-revision/execute.md]
    C[Invoke python state/kernel.py --llm --chapter-id]
    D[Planner → Writer → Critic loop]
    E{Code issue reported?}
    F["Launch subagent to fix (code-reviewer, test, etc.)"]
    G[Record commit + ledger update]

    A --> B --> C --> D --> E
    E -- Yes --> F --> D
    E -- No --> G
    F --> D

Name		Name	Last commit message	Last commit date
Latest commit History 1,860 Commits
.github/instructions		.github/instructions
book		book
docs		docs
evals		evals
overrides/partials		overrides/partials
prompts		prompts
site		site
state		state
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CONSTITUTION.md		CONSTITUTION.md
DEVELOPMENT.md		DEVELOPMENT.md
README.md		README.md
ROADMAP.md		ROADMAP.md
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-First Software Engineering

What “AI-first software engineering” means (in this book)

Repository layout

Agent loop (operational)

How evaluation is run

Deterministic kernel (Planner → Writer → Critic)

Optional: LLM-powered role outputs

Diagrams

Agent loop

Book system architecture

Copilot autopilot execution

About

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-First Software Engineering

What “AI-first software engineering” means (in this book)

Repository layout

Agent loop (operational)

How evaluation is run

Deterministic kernel (Planner → Writer → Critic)

Optional: LLM-powered role outputs

Diagrams

Agent loop

Book system architecture

Copilot autopilot execution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages