Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
7dfd6b2
Implement autonomous-coding V2 planner-builder-evaluator harness
xerexesx Mar 25, 2026
14fa83a
Merge pull request #1 from xerexesx/codex/upgrade-autonomous-coding-t…
xerexesx Mar 25, 2026
5ecd081
Add files via upload
xerexesx Mar 25, 2026
e346125
docs(v3.1): finalize traceability matrix with evidence and statuses
xerexesx Mar 25, 2026
c262070
Merge pull request #2 from xerexesx/codex/migrate-v2-to-v3.1-producti…
xerexesx Mar 25, 2026
de0fef7
Update orchestrator.py
xerexesx Mar 25, 2026
c6e2377
Add files via upload
xerexesx Mar 26, 2026
efb87e4
autonomous-coding: deliver v3.2 robustness and traceability updates
xerexesx Mar 26, 2026
ea2c4a4
Merge pull request #3 from xerexesx/codex/fix-issues-in-review_v3_1_a…
xerexesx Mar 26, 2026
6e7bc2f
Add files via upload
xerexesx Mar 26, 2026
d1196fa
autonomous-coding: release v3.3 and close v3.2 review findings
xerexesx Mar 26, 2026
ef52866
Merge pull request #4 from xerexesx/codex/fix-issues-from-review_v3_2
xerexesx Mar 26, 2026
ea02e79
Add V3.3 production review report for autonomous coding harness
xerexesx Mar 26, 2026
ddb3991
Merge pull request #5 from xerexesx/codex/review-upgrade-to-version-3.3
xerexesx Mar 26, 2026
4de0643
autonomous-coding: release v3.4 with contract negotiation hardening
xerexesx Mar 26, 2026
b2eae72
Merge pull request #6 from xerexesx/codex/fix-issues-from-review_v3_3
xerexesx Mar 26, 2026
0772039
feat(v3.5): add best-effort token/cost telemetry with resume-safe run…
xerexesx Mar 26, 2026
ce3ac5d
Merge pull request #7 from xerexesx/codex/add-cost-observability-feature
xerexesx Mar 26, 2026
51717f1
Update traceability matrix version to 3.5.1
xerexesx Mar 26, 2026
c3671dc
Merge pull request #8 from xerexesx/codex/restructure-traceability_ma…
xerexesx Mar 26, 2026
4ac2942
autonomous-coding: ship v3.5.2 negotiation enrichment
xerexesx Mar 26, 2026
5c28fe0
Merge pull request #9 from xerexesx/codex/add-enhanced-negotiation-fe…
xerexesx Mar 26, 2026
3a11970
docs(autonomous-coding): rewrite README and add dedicated changelog
xerexesx Mar 26, 2026
11c214d
Merge pull request #10 from xerexesx/codex/refactor-readme-and-create…
xerexesx Mar 26, 2026
c653b78
feat(autonomous-coding): add selectable auth modes (api_key/cli/auto)
xerexesx Mar 27, 2026
8d81cc4
Merge pull request #11 from xerexesx/codex/add-claude-cli-authenticat…
xerexesx Mar 27, 2026
bcc7025
autonomous-coding: default Playwright MCP to headless and align docs
xerexesx Mar 27, 2026
eb987eb
Merge pull request #12 from xerexesx/codex/validate-headless-mode-usa…
xerexesx Mar 27, 2026
b613570
feat(autonomous-coding): add --target-tests flag with default warning
xerexesx Mar 27, 2026
d4c78e7
Merge pull request #13 from xerexesx/codex/add-flag-argument-for-test…
xerexesx Mar 27, 2026
85e67e5
fix(autonomous-coding): apply --target-tests to all modes
xerexesx Mar 27, 2026
11ee985
Merge branch 'main' into codex/add-flag-argument-for-test-count-e9qou9
xerexesx Mar 27, 2026
7d082cb
Merge pull request #14 from xerexesx/codex/add-flag-argument-for-test…
xerexesx Mar 27, 2026
fb5c501
Update autonomous_agent_demo.py
xerexesx Mar 27, 2026
9ff0322
Update CHANGELOG.md
xerexesx Mar 27, 2026
3557293
Update autonomous_agent_demo.py
xerexesx Mar 27, 2026
e4744eb
Update README.md
xerexesx Mar 27, 2026
281bfd2
Update README.md
xerexesx Mar 27, 2026
d4b68cb
Create autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
7d0f894
Update autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
41edfae
Update test_cli.py
xerexesx Mar 27, 2026
32cbb0d
Update README.md
xerexesx Mar 27, 2026
848f6fa
Update autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
431c6ab
Delete .github/workflows/autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
31ae5a7
Create autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
d8a854f
Update autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
c5d4a9f
Update autonomous-coding-audit-tests.yml
xerexesx Mar 27, 2026
bb45ec9
release autonomous-coding 3.6.3
xerexesx Mar 28, 2026
7905447
keep 3.6.3 traceability consolidated
xerexesx Mar 28, 2026
9858262
rename autonomous-coding CLI modes
xerexesx Mar 28, 2026
7226dcb
fix(autonomous-coding): harden dry-run coverage
xerexesx Mar 28, 2026
55ba733
ci: avoid unrelated repo checks on autonomous-coding PRs
xerexesx Mar 28, 2026
6d2a6a1
autonomous-coding: add provider selection for 3.7
xerexesx Mar 28, 2026
0a253a1
autonomous-coding: update 3.7 traceability
xerexesx Mar 28, 2026
9b5273a
autonomous-coding: fix retarget merge regression
xerexesx Mar 28, 2026
1c44dd8
autonomous-coding: make Codex CLI test portable
xerexesx Mar 28, 2026
37154e3
Merge pull request #19 from xerexesx/codex/version-3.7
xerexesx Mar 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions .github/workflows/autonomous-coding-audit-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: autonomous-coding-audit-tests

on:
pull_request:
paths:
- .github/workflows/autonomous-coding-audit-tests.yml
- .github/workflows/autonomous-coding-live-smoke.yml
- autonomous-coding/**
push:
branches:
- main
paths:
- .github/workflows/autonomous-coding-audit-tests.yml
- .github/workflows/autonomous-coding-live-smoke.yml
- autonomous-coding/**

jobs:
audit-and-unit:
runs-on: ubuntu-latest
defaults:
run:
working-directory: autonomous-coding
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"
cache-dependency-path: autonomous-coding/requirements.txt

- name: Install dependencies and audit tooling
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt pip-audit

- name: Dependency audit
run: |
# Temporary ignore for known issue with no published fix yet.
pip-audit -r requirements.txt --ignore-vuln CVE-2026-4539

- name: Run unit and integration tests
run: pytest tests test_security.py tests/test_security_hook.py -q -m "not live"

offline-cli-smoke:
runs-on: ubuntu-latest
defaults:
run:
working-directory: autonomous-coding
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"
cache-dependency-path: autonomous-coding/requirements.txt

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Run orchestrated dry-run smoke
run: |
python autonomous_agent_demo.py \
--mode orchestrated \
--project-dir ./ci_dry_run_smoke \
--dry-run \
--max-rounds 2 \
--llm-contract-review

- name: Verify orchestrated dry-run artifacts
shell: bash
run: |
set -euo pipefail
python - <<'PY'
import json
from pathlib import Path

project_dir = Path("generations/ci_dry_run_smoke")
acceptance = json.loads((project_dir / "planning" / "acceptance_criteria.json").read_text())
negotiation = json.loads((project_dir / "planning" / "sprint_contract_negotiation_round_02.json").read_text())
run_state = json.loads((project_dir / "state" / "run_state.json").read_text())
qa_report = json.loads((project_dir / "qa" / "qa_report_round_01.json").read_text())

assert acceptance["criteria"][0]["id"] == "AC-DRYRUN-001"
assert negotiation["review_mode"] == "llm_assisted"
assert "LLM_RESPONSE_INVALID" in negotiation["reason_codes"]
assert run_state["status"] == "blocked"
assert qa_report["result"] == "blocked"
PY

legacy-dry-run-contract:
runs-on: ubuntu-latest
defaults:
run:
working-directory: autonomous-coding
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"
cache-dependency-path: autonomous-coding/requirements.txt

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Verify legacy dry-run fails fast with stable contract
shell: bash
run: |
set -euo pipefail
set +e
output=$(python autonomous_agent_demo.py --mode legacy --project-dir ./ci_legacy_dry_run --max-iterations 0 --dry-run 2>&1)
status=$?
set -e
echo "$output"
test "$status" -eq 2
echo "$output" | grep --fixed-strings -- "--dry-run is not supported with --mode legacy"
47 changes: 47 additions & 0 deletions .github/workflows/autonomous-coding-live-smoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: autonomous-coding-live-smoke

on:
workflow_dispatch:

jobs:
live-phase-smoke:
runs-on: ubuntu-latest
permissions:
contents: read
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
AUTONOMOUS_CODING_ENABLE_LIVE_TESTS: "1"
AUTONOMOUS_CODING_LIVE_MODEL: claude-opus-4-6
AUTONOMOUS_CODING_LIVE_ARTIFACT_ROOT: ${{ github.workspace }}/autonomous-coding/live-smoke-artifacts
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"
cache-dependency-path: autonomous-coding/requirements.txt

- uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
pip install -r autonomous-coding/requirements.txt

- name: Run manual live smoke suite
shell: bash
run: |
set -euo pipefail
mkdir -p autonomous-coding/live-smoke-artifacts
pytest autonomous-coding/tests/test_live_phase_smoke.py -q -m live | tee autonomous-coding/live-smoke-artifacts/pytest-live.log

- name: Upload live smoke artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: autonomous-coding-live-smoke-artifacts
path: autonomous-coding/live-smoke-artifacts
if-no-files-found: warn
4 changes: 2 additions & 2 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ name: build
on:
pull_request:
paths:
- .github/**
- .github/workflows/reusable_build_step.yaml
- computer-use-demo/**
push:
branches:
- main
paths:
- .github/**
- .github/workflows/reusable_build_step.yaml
- computer-use-demo/**
jobs:
build-amd64:
Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,11 @@ name: tests
on:
pull_request:
paths:
- .github/**
- computer-use-demo/**
push:
branches:
- main
paths:
- .github/**
- computer-use-demo/**
jobs:
ruff:
Expand Down
2 changes: 1 addition & 1 deletion agents/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ From this foundation, you can add domain-specific tools, optimize performance, o
- Python 3.8+
- Claude API key (set as `ANTHROPIC_API_KEY` environment variable)
- `anthropic` Python library
- `mcp` Python library
- `mcp` Python library
12 changes: 12 additions & 0 deletions autonomous-coding/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Autonomous Coding Quickstart Notes

Scope: `autonomous-coding/`

- `orchestrated` is the primary harness path (`autonomous_agent_demo.py` default mode).
- Preserve resumability semantics of `state/run_state.json`; do not silently reinterpret completed state.
- Preserve security boundaries in `client.py` + `security.py` (sandbox, allowlist, hook checks).
- Planner/Builder/Evaluator artifact contracts are schema-backed in `schemas/`; keep them deterministic.
- Sprint contracts are required artifacts (`planning/sprint_contract_round_XX.json`) and must remain schema-backed.
- If adding new JSON artifacts, add a schema + unit tests.
- Keep `feature_list.json` as backlog/verification ledger; evaluator is authority for pass/fail outcomes.
- Prefer Playwright MCP for evaluator browser QA; Puppeteer remains fallback only.
122 changes: 122 additions & 0 deletions autonomous-coding/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Changelog

Toutes les evolutions notables du module `autonomous-coding` sont listees ici.

## [3.7.0] - 2026-03-28

### Added
- Nouveau flag CLI `--provider {claude,openai}` pour choisir explicitement le provider d'execution.
- Resolver provider/auth dedie pour distinguer Claude runtime et OpenAI/Codex runtime.
- Backend OpenAI base sur Codex CLI non interactif avec preflight credentials et verification Git.
- Couverture de tests dediee au dispatch provider, au preflight OpenAI, et a l'absence de shared session pour les backends qui ne le supportent pas.

### Changed
- Le preflight d'authentification tient maintenant compte du provider selectionne.
- Le partage de session est desactive par capacite backend et non plus uniquement par egalite de modele.
- La documentation CLI/README precise que le support OpenAI actuel passe par Codex CLI et non par une integration API OpenAI directe.

## [3.6.3] - 2026-03-28

### Added

- Couverture de régression dédiée pour la barrière Bash, l’échec explicite du planner, la traversée `--project-dir` et la robustesse de `progress.py`.
- Configuration locale `pyrightconfig.json` pour rendre le type-check du module `autonomous-coding` exécutable dans l’environnement de développement courant.
- Nouvelle couverture hybride du contrat `--dry-run` : tests offline dédiés, smoke CLI orchestrated, et workflow live manuel pour les chemins SDK/LLM réels.

### Changed

- Durcissement réel de `security.py` sans changer le concept actuel : chemins hors projet bloqués, `./init.sh` uniquement, `sleep` borné, et installations explicites de paquets refusées.
- `PlannerPhase` échoue désormais explicitement si les artefacts obligatoires sont absents, invalides ou restent en placeholder ; le dry-run génère des artefacts de planification valides.
- Les chemins relatifs `--project-dir` conservent la normalisation sous `generations/` mais refusent désormais toute évasion via `..`.
- `progress.py` retombe proprement sur un fallback sûr quand `feature_list.json` ou `run_state.json` ont une structure inattendue.
- Alignement des marqueurs de version runtime, prompts actifs et documentation utilisateur sur `V3.6.3`.
- Contrat CLI des modes clarifié : `v2` est retiré, `v1` devient `legacy`, `v3_1` devient `orchestrated`, et `orchestrated` devient le mode par défaut. Les alias `v1`/`v3_1` restent acceptés temporairement avec warning.
- `legacy --dry-run` est maintenant rejeté explicitement avec code de sortie stable ; seul `orchestrated --dry-run` reste vendu comme smoke test offline.
- `create_client()` n’autorise plus les outils navigateur dans le contrat planner/contract-reviewer ; ils restent limités à builder/evaluator/orchestrator.

## [3.6.1] - 2026-03-27

### Added
- Nouveau flag CLI `--target-tests` pour definir explicitement le volume cible de tests dans les prompts de planification/initialisation (tous modes).

### Changed
- Priorite renforcee sur Playwright MCP en mode **headless par defaut** dans la configuration client (`client.py`).
- Clarification des consignes QA/prompts pour imposer Playwright headless en chemin principal, avec Puppeteer seulement en fallback.
- Mise a jour de la documentation versionnee (README + traceability consolidee) pour refleter cette politique navigateur.
- Si `--target-tests` n'est pas fourni, le runtime affiche un warning explicite et applique la valeur par defaut `200`.

## [3.6.0] - 2026-03-27

### Added
- Nouveau flag CLI `--auth-mode {api_key,cli,auto}` pour choisir explicitement la methode d'authentification.
- Detection best-effort des credentials Claude CLI (session/token/fichier credentials) pour preflight explicite.
- Couverture de tests dediee a la validation auth (`tests/test_client_auth.py`) et au parsing/erreurs CLI.

### Changed
- Preflight d'authentification centralise dans `client.py` (`validate_auth_configuration`) avec messages d'erreur actionnables.
- Integration de l'auth mode dans les chemins V3.1 (orchestrator) et V1 compat (`run_autonomous_agent`).
- Documentation mise a jour pour inclure les modes `cli` et `auto`.

## [3.5.2] - 2026-03-26

### Added
- Enrichissement de l'artefact de negociation (`reason_codes`, `confidence_score`, `actionable_suggestions`, `review_mode`).
- Option `--llm-contract-review` pour arbitrage de contrat assiste modele.

### Changed
- Alignement des marqueurs de version runtime/telemetrie sur V3.5.2.
- Conservation de compatibilite des statuts historiques (`approved|changes_requested`).

## [3.5.0] - 2026-03-26

### Added
- Telemetrie best-effort token/cout consolidee dans `state/run_state.json`.
- Resume d'usage LLM au niveau appel et phase.

### Changed
- Reprise (`--resume`) compatible avec cumul de metriques existantes.
- Schema `schemas/run_state.schema.json` etendu pour valider `llm_usage`.

## [3.4.0] - 2026-03-26

### Added
- Artefact de negociation de sprint `planning/sprint_contract_negotiation_round_XX.json`.

### Changed
- Durcissement progression des tests d'acceptation (normalisation, filtrage, deduplication).
- Validation plus stricte des propositions malformed.

## [3.3.0] - 2026-03-26

### Added
- Matrice de tracabilite dediee a la version 3.3.
- Contrat de chemin des artefacts de proposition sprint.

### Changed
- Filtrage des features deja tentees lors de la progression des rounds.
- Correctifs de prompts/typing et robustesse du parser de proposition.

## [3.2.0] - 2026-03-26

### Added
- Contrats de sprint par round avec cap/filtrage configurable.
- Integration explicite des propositions builder dans la preparation du round suivant.

### Changed
- Fallback deterministe `blocked` si rapport QA non conforme au schema.
- Avertissement explicite pour `--mode v2` deprecie.

## [3.1.0] - 2026-03-26

### Added
- Chemin principal en session continue (planner/builder/evaluator).
- Artefacts de contrat de sprint obligatoires.

### Changed
- Suppression du fallback silencieux builder vide (erreur explicite).
- Reprise renforcee (pas de redemarrage implicite d'un run termine).

---

## Notes
- Les dates exactes historiques anterieures peuvent etre affinees si vous souhaitez un changelog "calendaire" precis base sur l'historique Git.
Loading