chore: preflight gates + skills refresh #146

mikeyobrien · 2026-02-02T02:33:05Z

Summary

Introduces preflight/backpressure improvements (auto-preflight, new gates for coverage, cargo audit, performance regression, verifier quality) plus spec completeness/acceptance criteria checks.
Updates orchestration/robot flows and skills/presets (Self-Healer hat, decision confidence protocol, skill consolidation, SOP alignment, preset trimming, human-in-the-loop refinements).
Expands CLI/tooling & TUI behavior (doctor command, tutorial onboarding, skill discovery, token/tsx parsing fixes, task status normalization, TUI width/iteration buffer fixes).
Broadens test coverage across CLI/core/adapters and adds/refreshes docs and templates.
Adds --yolo to Codex CLI invocation and adjusts tests.

Testing

cargo test

Commits (last 71)

299d45b feat(adapters): add --yolo to codex cli
758cce2 feat(robot): human-in-the-loop improvements - separate queue, TUI metadata, event naming
7c99179 refactor: decouple ralph-telegram from ralph-core via RobotService trait (KISS item feat(telemetry): Add per-iteration telemetry capture #9)
bb2129f refactor: remove unused ralph-adapters dependency from ralph-tui (KISS item feat: Add Intelligent Project Onboarding prompt and configuration #13)
720cdf6 refactor: unify TDD skills into single test-driven-development skill (KISS item feat: add production-ready features with async logging, security, and Rich output #2)
a6c2588 fix: resolve 26 clippy warnings across workspace (unblock backpressure)
eb622e1 refactor: reduce PDD and code-task-generator constraint density (KISS item feat: Implement shell command and filesystem tools for ACP adapter #11)
cf79fc3 refactor: reduce code-assist constraint density from 111 to 32 MUSTs (KISS item feat: Implement shell command and filesystem tools for ACP adapter #11)
877989c refactor: consolidate ralph-loop and ralph-diagnostics into ralph-operations skill (KISS item fix(core): resolve test failures and linting issues #10)
794d142 chore: remove standalone codebase-summary skill (KISS item feat: add windows support and fix integration tests #12)
7216417 feat(config): add extra_instructions field for shared hat instruction fragments
95eb069 feat: gate session recorder/player behind recording feature flag (KISS docs: add ASCII architecture diagrams for completion detection and loop prevention #7)
23ecdbb refactor: remove chaos_mode dead code (KISS item feat: add completion marker detection and loop prevention #6)
8f57421 fix: handle wide character display widths in TUI ContentPane
0fba205 fix: resolve events path from current-events marker for interact.human
09a54d8 refactor(presets): remove 13 redundant/experimental presets, slim to core 15
0096624 fix: preserve TUI iteration buffers across task.start reset
16442fd feat(code-task-generator): align CTG SOP with Level 5 spec-driven pipeline
43a58b0 feat(pdd): align PDD SOP with Level 5 spec-driven pipeline
655cdca feat: add spec acceptance criteria verification gate to backpressure
0b136fc feat: add spec-to-test stub generation skill and acceptance criteria parser
3a1819d feat: add spec completeness validator to preflight checks
b8082e9 feat: add Self-Healer hat, improve tests, fix UTF-8 boundary, make preflight opt-in
5c801a3 feat(presets): add confidence scoring thresholds
ad2d0c0 docs: define decision journal format
1f92695 fix(core): resolve parent skills dirs
2304aa2 fix(cli): resolve configured skill dirs in parent workspace
2ed48c0 fix(cli): discover parent skills dir for nested roots
f38f3fa fix(cli): discover nested skills dirs
5797294 test(cli): cover user skill discovery
32ff3b0 feat(cli): add skill list and improve discovery
d98b221 feat(cli): add tutorial onboarding command
1320131 test(cli): cover run_command dry-run and continue error
45475eb test(cli): expand loops command coverage
eddd321 fix(cli): allow partial merge queue id resolution
eaadda4 test(core): cover event loop helper branches
8bc8569 test: cover termination reasons and merge queue resolution
013af96 test: expand event loop and loops coverage
9f706cb test: add preset run coverage and shared cwd guard
147a2a9 test(cli): expand coverage for bot/web/loop runner
7d9fa33 test(cli,adapters): expand coverage for config and pty
a4a84c9 test: expand pty streaming and planning session coverage
7d19b5c test(cli): add loops and sop runner coverage
d6e6f2a test(cli): add memory formatting tests
08c3987 fix(cli): normalize task status filtering
e78b5c5 test(cli): cover bot token resolution
fde3383 fix(cli): normalize token and tsx version parsing
76ebd53 test(adapters,cli): cover pty executor paths
1eb3275 test(cli): expand coverage for bot/web/loop_runner
7f80ffb test: add cli run/web integration coverage
35267f4 test(cli,adapters): expand unit coverage for helpers
3261dbb test(cli): expand web and bot coverage
8157eb7 test(cli,core): stabilize cwd handling and completion tests
8998d62 test: extend loop runner and pty executor coverage
f85ce55 test: extend coverage for cli helpers
0df20f4 Add troubleshooting links to common errors
a77c0cd docs(api): add Rust examples for agents/config/metrics/security
b0de08d docs: detail backend configuration
fa6bcf7 Update getting started quick start tutorial
44c618c feat(cli): add doctor diagnostics command
40116f2 Add GitHub issue templates
0e3d503 Fix installation guide for Rust CLI
f1003f7 Add performance regression backpressure gate
25168b7 Add verifier quality report backpressure
f4635f0 feat: add persistent mode to suppress LOOP_COMPLETE termination
737cf79 Add self-healing hat to confession-loop preset
e0d3374 chore(presets): add decision confidence protocol
b6e9794 feat: add coverage backpressure gate
637fd7c feat: add cargo audit backpressure gate
49b1079 feat: auto-preflight before ralph run
d4c8410 feat: add ralph preflight command

Integrates the existing `ralph preflight` checks into the loop start sequence so issues are caught before orchestration begins. - Run preflight checks after loop context setup, before loop execution - Configurable via features.preflight (enabled by default) - --skip-preflight CLI flag overrides config - Critical failures block the loop; warnings logged but allowed - Strict mode treats warnings as failures - Worktree cleanup on preflight failure for non-primary loops - Deferred worktree registry registration until preflight passes

When `persistent: true` is set in event_loop config, the loop stays alive after LOOP_COMPLETE instead of terminating. A task.resume event is injected to keep the loop idling until new work arrives. Hard limits (max_iterations, max_runtime, max_cost) still terminate normally. Closes task-1769829997-dba7

…eflight opt-in - Add 🩹 Self-Healer hat with automated recovery strategies (rollback, skip, reduce scope, fallback, escalate) - Fix doctor.rs to strip Windows .exe/.cmd/.bat/.com extensions from backend names - Add comprehensive tests for loops, preflight, and task CLI - Make preflight checks opt-in (enabled: false by default) and skip outside git repos - Fix UTF-8 boundary issue in event loop content truncation - Require completion event to be last in JSONL batch - Update backpressure docs to include mutation testing (warning-only)

Adds a new 'specs' preflight check that validates .spec.md files have Given/When/Then acceptance criteria — a prerequisite for the Level 5 spec-driven pipeline where specs automatically generate acceptance tests. The check: - Recursively scans the specs directory for .spec.md files - Skips specs marked as 'status: implemented' (already reviewed) - Detects acceptance criteria in bold, plain text, and list formats - Warns (non-blocking) when specs lack testable criteria - Integrates into existing preflight infrastructure (ralph preflight --check specs) Includes 13 unit tests covering all paths: empty dirs, complete specs, incomplete specs, implemented specs, subdirectories, and all three Given/When/Then format variants.

…parser Add structured Given/When/Then parser (extract_acceptance_criteria) to ralph-core that returns AcceptanceCriterion triples from spec content. Create spec-to-test skill that teaches AI agents to generate test stubs with 1:1 mapping to spec criteria. Wire into spec-driven.yml implementer hat as step 0 (red phase of TDD). 13 new tests for the parser.

Add `specs: pass/fail` as a new optional backpressure dimension that verifies spec acceptance criteria are satisfied by passing tests. When reported as `specs: fail`, it blocks build.done events (like performance regression). When omitted, it does not block (backwards compatible). Changes: - BackpressureEvidence: new `specs_verified: Option<bool>` field - QualityReport: new `specs_verified: Option<bool>` field with failed_dimensions - EventParser: parse `specs: pass/fail` from build.done and `quality.specs` from verify payloads - Event loop: include specs status in backpressure rejection logs - Instructions: mention specs in backpressure check list - 9 new tests covering all spec evidence parsing paths

Update `ralph plan` (PDD SOP) to output artifacts in the directory structure expected by spec-driven and pdd-to-code-assist presets: - Default output dir: specs/{task_name}/ (was .sop/planning/) - Flat layout: design.md, plan.md, requirements.md (was nested design/, implementation/) - Renamed idea-honing.md → requirements.md (matches preset expectations) - Added Given-When-Then acceptance criteria section to design template - Updated Ralph Integration step to suggest spec-driven presets - Synced .claude/skills/pdd/SKILL.md with bundled SOP

…eline Update `ralph task` (Code Task Generator SOP) to output artifacts in the directory structure expected by spec-driven presets: - Default output dir: specs/{task_name}/tasks/ (was .ralph/tasks/) - Reference design.md (was design/detailed-design.md) matching flat layout - Updated Ralph Integration step to suggest spec-driven presets - Updated examples to show specs/ directory paths - Synced .claude/skills/code-task-generator/SKILL.md with bundled SOP

The task.start event handler did *self = Self::new() which wiped iteration buffers, current_view, and following_latest state. This caused the header to show "iter 1/0" and all previous iteration output to disappear (garbled display). Now preserves iterations, current_view, following_latest, and new_iteration_alert across the reset, matching the existing pattern for hat_map and loop_started. Adds regression test to prevent reoccurrence.

…core 15 Audit of 27 presets identified 13 as redundant, experimental, or aspirational. Removed to reduce user confusion and present a clear, opinionated set. Removed (with rationale): - feature-minimal: stripped version of feature.yml, confusing duplication - tdd-red-green: code-assist.yml covers TDD with more flexibility - adversarial-review: niche security review, review.yml suffices - socratic-learning: teaching experiment, not a real workflow - mob-programming: interesting concept but untested/unused - scientific-method: overlaps with debug.yml hypothesis-driven approach - code-archaeology: research.yml covers legacy code exploration - performance-optimization: niche, debug + profiling covers it - api-design: feature.yml or spec-driven.yml covers API work - documentation-first: docs.yml covers documentation-driven work - incident-response: aspirational, no production monitoring integration - migration-safety: aspirational, very niche - confession-loop: experimental quality pattern, code-assist has scoring - planning.yml: web UI specific, not embedded Remaining 15 presets: bugfix, code-assist, debug, deploy, docs, feature, gap-analysis, hatless-baseline, merge-loop, pdd-to-code-assist, pr-review, refactor, research, review, spec-driven Updated: presets.rs, sync-embedded-files.sh, docs/guide/presets.md, presets/index.json, and all tests referencing removed presets.

wait_for_response() and the Telegram message handler both used the default events.jsonl path instead of reading the current-events marker to find the active timestamped events file. This caused interact.human to send questions via Telegram but never receive responses — it was watching the wrong file, and responses were written to the wrong file.

ContentPane::render() always advanced x by 1 per character, but Unicode wide characters (emoji, CJK, etc.) occupy 2 terminal columns. This caused cascading misalignment where text after any wide character appeared garbled with dropped/shifted characters. Uses unicode-width to determine actual display width, resets trailing cells for wide characters, and wraps before the edge when a wide character would straddle the right boundary.

Chaos mode was an experimental feature that was never fully implemented — the loop_runner only logged a TODO and immediately returned ChaosModeComplete. Removes ~560 lines of unused code across 10 files: - Delete chaos_mode.rs (254 LOC) - Remove ChaosModeConfig, ResearchFocus, ChaosOutput from config.rs - Remove ChaosModeComplete/ChaosModeMaxIterations TerminationReason variants - Remove triggers_chaos_mode() method - Remove --chaos and --chaos-max-iterations CLI args - Clean up match arms in display, summary_writer, loop_runner, bench - Update tests to remove chaos-related assertions

…ISS #7) Session recording modules (cli_capture, session_recorder, session_player) and their dependents (replay_backend, smoke_runner) are now conditionally compiled behind `#[cfg(feature = "recording")]`. This reduces the default binary size by ~1,147 LOC when recording is not needed. Workspace-internal crates enable the feature explicitly, so all existing functionality and tests continue to work unchanged.

… fragments Enable YAML anchors to de-duplicate instruction blocks across hats. HatConfig.extra_instructions is a Vec<String> that gets merged into instructions during config normalization. Also: derive Default for PreflightConfig (clippy derivable_impls), remove unused default_false helper. KISS item #3 — prep for hat config de-duplication.

Redundant with code-assist phase 2.2 which already performs project analysis during implementation. Most projects already have AGENTS.md and README.md files, making standalone documentation generation unnecessary. Removes 314 lines of SOP guidance.

…rations skill (KISS item #10) Merged two overlapping skills (531 lines) into a single "ralph-operations" skill (213 lines), eliminating ~318 lines of duplication. One reference point for loop lifecycle management, diagnostics analysis, and troubleshooting.

…(KISS item #11) 111 "You MUST" directives created constraint overload, reducing LLM compliance. Consolidated to 32 focused constraints by: - Extracting repeated rules to Important Notes (doc/code separation, snippet labeling) - Lifting shared Code Phase constraints to phase-wide section - Removing obvious/implied behaviors (mkdir before use, handle errors) - Condensing verbose Troubleshooting into concise paragraphs - Trusting agent judgment for non-critical style decisions All critical invariants preserved: TDD cycle, no broken commits, no push, convention compliance, CODEASSIST.md integration, separation of concerns. Net reduction: 463 → 214 lines (-54%), 111 → 32 MUSTs (-71%)

… item #11) PDD: 85 → 19 MUSTs (-78%), 298 → 147 lines (-51%) Code-Task-Generator: 57 → 8 MUSTs (-86%), 349 → 159 lines (-54%) Applied same simplification pattern as code-assist (cf79fc3): - Extracted repeated cross-step rules to Important Notes section - Removed obvious/implied behaviors (create dirs, use tools) - Condensed verbose troubleshooting into concise paragraphs - Converted "because this could..." rationale into short descriptions - Trusted agent judgment for non-critical decisions All critical invariants preserved: user-driven flow, one-question-at-a-time requirements, user approval before generation, Given-When-Then acceptance criteria, code task format spec, Ralph integration offering. Combined Item #11 totals: 253 → 59 MUSTs across all 3 SOPs (-77%).

Fixes accumulated clippy pedantic warnings that fail under -D warnings: - preflight.rs: unnecessary raw string hashes (r#"" → r"") in 16 test literals - event_loop/tests.rs: bool_assert_comparison, unnecessary_map_or, cloned_ref_to_slice_refs - bot.rs: manual_string_new ("".to_string() → String::new()) - memory.rs: useless_format and manual_div_ceil - content.rs: single-char string pattern (.contains("x") → .contains('x'))

…(KISS item #2) Consolidate spec-to-test (247 lines) and test-generation (127 lines) into one test-driven-development skill (134 lines) with three input modes: - Mode A: From Spec (.spec.md) — replaces spec-to-test - Mode B: From Task (.code-task.md) — replaces code-assist phase 4 guidance - Mode C: From Description — replaces test-generation Updated references in spec-driven.yml presets, integration tests, and skill_registry.rs test fixtures. Net reduction: ~240 lines.

…S item #13) ralph-tui declared ralph-adapters as a dependency but never imported or used any types from it. Removing this dead dependency cleans up the crate dependency graph and avoids pulling in ~6K LOC of backend adapter code (plus transitive deps like portable-pty, vt100, termimad) when building the TUI.

…ait (KISS item #9) Introduce a RobotService trait in ralph-proto that abstracts the human-in-the-loop communication surface (send_question, wait_for_response, send_checkin, shutdown_flag, stop). The EventLoop now holds an Option<Box<dyn RobotService>> instead of a concrete TelegramService, and the CLI layer (loop_runner.rs) creates and injects the service. This removes ralph-telegram from ralph-core's dependency graph entirely, keeping the core event loop decoupled from any specific communication platform.

…adata, event naming Core event handling: - Add separate human_pending queue in EventBus for human.* events - Rename interact.human → human.interact for consistency with human.response/human.guidance - Route human events to Ralph hat when no other pending events - Update RobotService trait docs to reflect new event naming TUI improvements: - Track per-iteration hat/backend metadata for accurate review display - Show max_iterations in header (e.g., [iter 3/50]) - Add human interaction state tracking to TuiState - Update header widget to use iteration metadata when reviewing past iterations - Add prepare_tui_iteration helper in loop_runner Documentation: - Update AGENTS.md with corrected event names (human.interact, human.response) - Update ralph-telegram README and robot-interaction-skill.md - Update presets (bugfix.yml, code-assist.yml) and ralph.bot.yml config

claude · 2026-02-02T03:02:21Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

mikeyobrien added 30 commits January 31, 2026 11:06

Add playwriter and tmux terminal skills

bddcae0

feat: add ralph preflight command

d4c8410

feat: add cargo audit backpressure gate

637fd7c

feat: add coverage backpressure gate

b6e9794

chore(presets): add decision confidence protocol

e0d3374

Add self-healing hat to confession-loop preset

737cf79

Add verifier quality report backpressure

25168b7

Add performance regression backpressure gate

f1003f7

Fix installation guide for Rust CLI

0e3d503

Add GitHub issue templates

40116f2

feat(cli): add doctor diagnostics command

44c618c

Update getting started quick start tutorial

fa6bcf7

docs: detail backend configuration

b0de08d

docs(api): add Rust examples for agents/config/metrics/security

a77c0cd

Add troubleshooting links to common errors

0df20f4

test: extend coverage for cli helpers

f85ce55

test: extend loop runner and pty executor coverage

8998d62

test(cli,core): stabilize cwd handling and completion tests

8157eb7

test(cli): expand web and bot coverage

3261dbb

test(cli,adapters): expand unit coverage for helpers

35267f4

test: add cli run/web integration coverage

7f80ffb

test(cli): expand coverage for bot/web/loop_runner

1eb3275

test(adapters,cli): cover pty executor paths

76ebd53

fix(cli): normalize token and tsx version parsing

fde3383

test(cli): cover bot token resolution

e78b5c5

fix(cli): normalize task status filtering

08c3987

test(cli): add memory formatting tests

d6e6f2a

test(cli): add loops and sop runner coverage

7d19b5c

mikeyobrien added 24 commits February 1, 2026 08:38

feat(presets): add confidence scoring thresholds

5c801a3

feat(adapters): add --yolo to codex cli

299d45b

mikeyobrien changed the title ~~feat: preflight/backpressure gates, skills updates, and CLI improvements~~ chore: preflight gates + skills refresh Feb 2, 2026

fix(ci): satisfy rustfmt and clippy

bf3617e

mikeyobrien added 2 commits February 1, 2026 21:10

fix: drain pty output after exit in tui mode

d2aceaa

fix: appease clippy in pty drain loop

17f972e

mikeyobrien merged commit 92be62f into main Feb 2, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: preflight gates + skills refresh #146

chore: preflight gates + skills refresh #146

Uh oh!

mikeyobrien commented Feb 2, 2026

Uh oh!

claude bot commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: preflight gates + skills refresh #146

chore: preflight gates + skills refresh #146

Uh oh!

Conversation

mikeyobrien commented Feb 2, 2026

Summary

Testing

Commits (last 71)

Uh oh!

claude bot commented Feb 2, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants