-
Notifications
You must be signed in to change notification settings - Fork 173
chore: preflight gates + skills refresh #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Integrates the existing `ralph preflight` checks into the loop start sequence so issues are caught before orchestration begins. - Run preflight checks after loop context setup, before loop execution - Configurable via features.preflight (enabled by default) - --skip-preflight CLI flag overrides config - Critical failures block the loop; warnings logged but allowed - Strict mode treats warnings as failures - Worktree cleanup on preflight failure for non-primary loops - Deferred worktree registry registration until preflight passes
When `persistent: true` is set in event_loop config, the loop stays alive after LOOP_COMPLETE instead of terminating. A task.resume event is injected to keep the loop idling until new work arrives. Hard limits (max_iterations, max_runtime, max_cost) still terminate normally. Closes task-1769829997-dba7
…eflight opt-in - Add 🩹 Self-Healer hat with automated recovery strategies (rollback, skip, reduce scope, fallback, escalate) - Fix doctor.rs to strip Windows .exe/.cmd/.bat/.com extensions from backend names - Add comprehensive tests for loops, preflight, and task CLI - Make preflight checks opt-in (enabled: false by default) and skip outside git repos - Fix UTF-8 boundary issue in event loop content truncation - Require completion event to be last in JSONL batch - Update backpressure docs to include mutation testing (warning-only)
Adds a new 'specs' preflight check that validates .spec.md files have Given/When/Then acceptance criteria — a prerequisite for the Level 5 spec-driven pipeline where specs automatically generate acceptance tests. The check: - Recursively scans the specs directory for .spec.md files - Skips specs marked as 'status: implemented' (already reviewed) - Detects acceptance criteria in bold, plain text, and list formats - Warns (non-blocking) when specs lack testable criteria - Integrates into existing preflight infrastructure (ralph preflight --check specs) Includes 13 unit tests covering all paths: empty dirs, complete specs, incomplete specs, implemented specs, subdirectories, and all three Given/When/Then format variants.
…parser Add structured Given/When/Then parser (extract_acceptance_criteria) to ralph-core that returns AcceptanceCriterion triples from spec content. Create spec-to-test skill that teaches AI agents to generate test stubs with 1:1 mapping to spec criteria. Wire into spec-driven.yml implementer hat as step 0 (red phase of TDD). 13 new tests for the parser.
Add `specs: pass/fail` as a new optional backpressure dimension that verifies spec acceptance criteria are satisfied by passing tests. When reported as `specs: fail`, it blocks build.done events (like performance regression). When omitted, it does not block (backwards compatible). Changes: - BackpressureEvidence: new `specs_verified: Option<bool>` field - QualityReport: new `specs_verified: Option<bool>` field with failed_dimensions - EventParser: parse `specs: pass/fail` from build.done and `quality.specs` from verify payloads - Event loop: include specs status in backpressure rejection logs - Instructions: mention specs in backpressure check list - 9 new tests covering all spec evidence parsing paths
Update `ralph plan` (PDD SOP) to output artifacts in the directory
structure expected by spec-driven and pdd-to-code-assist presets:
- Default output dir: specs/{task_name}/ (was .sop/planning/)
- Flat layout: design.md, plan.md, requirements.md (was nested design/, implementation/)
- Renamed idea-honing.md → requirements.md (matches preset expectations)
- Added Given-When-Then acceptance criteria section to design template
- Updated Ralph Integration step to suggest spec-driven presets
- Synced .claude/skills/pdd/SKILL.md with bundled SOP
…eline
Update `ralph task` (Code Task Generator SOP) to output artifacts in
the directory structure expected by spec-driven presets:
- Default output dir: specs/{task_name}/tasks/ (was .ralph/tasks/)
- Reference design.md (was design/detailed-design.md) matching flat layout
- Updated Ralph Integration step to suggest spec-driven presets
- Updated examples to show specs/ directory paths
- Synced .claude/skills/code-task-generator/SKILL.md with bundled SOP
The task.start event handler did *self = Self::new() which wiped iteration buffers, current_view, and following_latest state. This caused the header to show "iter 1/0" and all previous iteration output to disappear (garbled display). Now preserves iterations, current_view, following_latest, and new_iteration_alert across the reset, matching the existing pattern for hat_map and loop_started. Adds regression test to prevent reoccurrence.
…core 15 Audit of 27 presets identified 13 as redundant, experimental, or aspirational. Removed to reduce user confusion and present a clear, opinionated set. Removed (with rationale): - feature-minimal: stripped version of feature.yml, confusing duplication - tdd-red-green: code-assist.yml covers TDD with more flexibility - adversarial-review: niche security review, review.yml suffices - socratic-learning: teaching experiment, not a real workflow - mob-programming: interesting concept but untested/unused - scientific-method: overlaps with debug.yml hypothesis-driven approach - code-archaeology: research.yml covers legacy code exploration - performance-optimization: niche, debug + profiling covers it - api-design: feature.yml or spec-driven.yml covers API work - documentation-first: docs.yml covers documentation-driven work - incident-response: aspirational, no production monitoring integration - migration-safety: aspirational, very niche - confession-loop: experimental quality pattern, code-assist has scoring - planning.yml: web UI specific, not embedded Remaining 15 presets: bugfix, code-assist, debug, deploy, docs, feature, gap-analysis, hatless-baseline, merge-loop, pdd-to-code-assist, pr-review, refactor, research, review, spec-driven Updated: presets.rs, sync-embedded-files.sh, docs/guide/presets.md, presets/index.json, and all tests referencing removed presets.
wait_for_response() and the Telegram message handler both used the default events.jsonl path instead of reading the current-events marker to find the active timestamped events file. This caused interact.human to send questions via Telegram but never receive responses — it was watching the wrong file, and responses were written to the wrong file.
ContentPane::render() always advanced x by 1 per character, but Unicode wide characters (emoji, CJK, etc.) occupy 2 terminal columns. This caused cascading misalignment where text after any wide character appeared garbled with dropped/shifted characters. Uses unicode-width to determine actual display width, resets trailing cells for wide characters, and wraps before the edge when a wide character would straddle the right boundary.
Chaos mode was an experimental feature that was never fully implemented — the loop_runner only logged a TODO and immediately returned ChaosModeComplete. Removes ~560 lines of unused code across 10 files: - Delete chaos_mode.rs (254 LOC) - Remove ChaosModeConfig, ResearchFocus, ChaosOutput from config.rs - Remove ChaosModeComplete/ChaosModeMaxIterations TerminationReason variants - Remove triggers_chaos_mode() method - Remove --chaos and --chaos-max-iterations CLI args - Clean up match arms in display, summary_writer, loop_runner, bench - Update tests to remove chaos-related assertions
…ISS #7) Session recording modules (cli_capture, session_recorder, session_player) and their dependents (replay_backend, smoke_runner) are now conditionally compiled behind `#[cfg(feature = "recording")]`. This reduces the default binary size by ~1,147 LOC when recording is not needed. Workspace-internal crates enable the feature explicitly, so all existing functionality and tests continue to work unchanged.
… fragments Enable YAML anchors to de-duplicate instruction blocks across hats. HatConfig.extra_instructions is a Vec<String> that gets merged into instructions during config normalization. Also: derive Default for PreflightConfig (clippy derivable_impls), remove unused default_false helper. KISS item #3 — prep for hat config de-duplication.
Redundant with code-assist phase 2.2 which already performs project analysis during implementation. Most projects already have AGENTS.md and README.md files, making standalone documentation generation unnecessary. Removes 314 lines of SOP guidance.
…rations skill (KISS item #10) Merged two overlapping skills (531 lines) into a single "ralph-operations" skill (213 lines), eliminating ~318 lines of duplication. One reference point for loop lifecycle management, diagnostics analysis, and troubleshooting.
…(KISS item #11) 111 "You MUST" directives created constraint overload, reducing LLM compliance. Consolidated to 32 focused constraints by: - Extracting repeated rules to Important Notes (doc/code separation, snippet labeling) - Lifting shared Code Phase constraints to phase-wide section - Removing obvious/implied behaviors (mkdir before use, handle errors) - Condensing verbose Troubleshooting into concise paragraphs - Trusting agent judgment for non-critical style decisions All critical invariants preserved: TDD cycle, no broken commits, no push, convention compliance, CODEASSIST.md integration, separation of concerns. Net reduction: 463 → 214 lines (-54%), 111 → 32 MUSTs (-71%)
… item #11) PDD: 85 → 19 MUSTs (-78%), 298 → 147 lines (-51%) Code-Task-Generator: 57 → 8 MUSTs (-86%), 349 → 159 lines (-54%) Applied same simplification pattern as code-assist (cf79fc3): - Extracted repeated cross-step rules to Important Notes section - Removed obvious/implied behaviors (create dirs, use tools) - Condensed verbose troubleshooting into concise paragraphs - Converted "because this could..." rationale into short descriptions - Trusted agent judgment for non-critical decisions All critical invariants preserved: user-driven flow, one-question-at-a-time requirements, user approval before generation, Given-When-Then acceptance criteria, code task format spec, Ralph integration offering. Combined Item #11 totals: 253 → 59 MUSTs across all 3 SOPs (-77%).
Fixes accumulated clippy pedantic warnings that fail under -D warnings:
- preflight.rs: unnecessary raw string hashes (r#"" → r"") in 16 test literals
- event_loop/tests.rs: bool_assert_comparison, unnecessary_map_or, cloned_ref_to_slice_refs
- bot.rs: manual_string_new ("".to_string() → String::new())
- memory.rs: useless_format and manual_div_ceil
- content.rs: single-char string pattern (.contains("x") → .contains('x'))
…(KISS item #2) Consolidate spec-to-test (247 lines) and test-generation (127 lines) into one test-driven-development skill (134 lines) with three input modes: - Mode A: From Spec (.spec.md) — replaces spec-to-test - Mode B: From Task (.code-task.md) — replaces code-assist phase 4 guidance - Mode C: From Description — replaces test-generation Updated references in spec-driven.yml presets, integration tests, and skill_registry.rs test fixtures. Net reduction: ~240 lines.
…S item #13) ralph-tui declared ralph-adapters as a dependency but never imported or used any types from it. Removing this dead dependency cleans up the crate dependency graph and avoids pulling in ~6K LOC of backend adapter code (plus transitive deps like portable-pty, vt100, termimad) when building the TUI.
…ait (KISS item #9) Introduce a RobotService trait in ralph-proto that abstracts the human-in-the-loop communication surface (send_question, wait_for_response, send_checkin, shutdown_flag, stop). The EventLoop now holds an Option<Box<dyn RobotService>> instead of a concrete TelegramService, and the CLI layer (loop_runner.rs) creates and injects the service. This removes ralph-telegram from ralph-core's dependency graph entirely, keeping the core event loop decoupled from any specific communication platform.
…adata, event naming Core event handling: - Add separate human_pending queue in EventBus for human.* events - Rename interact.human → human.interact for consistency with human.response/human.guidance - Route human events to Ralph hat when no other pending events - Update RobotService trait docs to reflect new event naming TUI improvements: - Track per-iteration hat/backend metadata for accurate review display - Show max_iterations in header (e.g., [iter 3/50]) - Add human interaction state tracking to TuiState - Update header widget to use iteration metadata when reviewing past iterations - Add prepare_tui_iteration helper in loop_runner Documentation: - Update AGENTS.md with corrected event names (human.interact, human.response) - Update ralph-telegram README and robot-interaction-skill.md - Update presets (bugfix.yml, code-assist.yml) and ralph.bot.yml config
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
--yoloto Codex CLI invocation and adjusts tests.Testing
cargo testCommits (last 71)
recordingfeature flag (KISS docs: add ASCII architecture diagrams for completion detection and loop prevention #7)