From 4d83a99ad50158d880e8afb40de72a199d3dd97a Mon Sep 17 00:00:00 2001 From: Paul Duvall Date: Sun, 11 Jan 2026 10:34:55 -0500 Subject: [PATCH 1/4] docs: add Agentic Loops and CheckPoint patterns to experiments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add two new experimental pattern entries to NOTES.md: - Agentic Loops: autonomous agent loops using Claude Code Ralph Wiggum plugin - CheckPoint: systematic validation gates for quality checks after tasks Also fix broken links across experiments documentation: - Update WisprFlow URL (whisperflow.com → wisprflow.ai) - Fix internal README.md anchor references Co-Authored-By: Claude --- experiments/NOTES.md | 100 ++++++++++++++++++++++++++++++++++++++++-- experiments/README.md | 90 +++++++++++++++++++++++-------------- 2 files changed, 153 insertions(+), 37 deletions(-) diff --git a/experiments/NOTES.md b/experiments/NOTES.md index ef1064c..7f22355 100644 --- a/experiments/NOTES.md +++ b/experiments/NOTES.md @@ -19,7 +19,7 @@ This file tracks patterns under exploration that may eventually be formalized in - Hands-free code review and exploration **Tools to Evaluate**: -- [WisprFlow](https://whisperflow.com/) - Voice-to-text for coding +- [WisprFlow](https://wisprflow.ai/) - Voice-to-text for coding - [Talon Voice](https://talonvoice.com/) - Voice control for development - [Voice Control for VSCode](https://marketplace.visualstudio.com/items?itemName=pokey.cursorless) - VSCode voice extensions - Native OS voice control (macOS Voice Control, Windows Speech Recognition) @@ -40,8 +40,8 @@ This file tracks patterns under exploration that may eventually be formalized in **Related Patterns**: - [Tool Integration](../README.md#tool-integration) - Voice as input tool for AI -- [Custom Commands](#custom-commands) - Voice-triggered slash commands -- [Event Automation](#event-automation) - Voice input as lifecycle event +- [Developer Lifecycle](../README.md#developer-lifecycle) - Voice-triggered workflow commands +- [Context Persistence](../README.md#context-persistence) - Voice input as context source **Anti-patterns to Avoid**: - Over-reliance on voice for precise code editing (better for high-level commands) @@ -50,6 +50,100 @@ This file tracks patterns under exploration that may eventually be formalized in --- +### Agentic Loops + +**Status**: Early exploration +**Date Added**: 2025-01-11 + +**Description**: Autonomous agent loops that continuously execute tasks, evaluate results, and iterate until completion or user intervention. Examples include the [Claude Code Ralph Wiggum plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md). + +**Potential Use Cases**: +- Autonomous code generation with self-correction +- Continuous refactoring until quality thresholds are met +- Iterative test writing until coverage goals are achieved +- Self-healing pipelines that retry with different approaches +- Long-running research tasks with progressive refinement + +**Tools to Evaluate**: +- [Ralph Wiggum Plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md) - Claude Code agentic loop implementation +- Claude Code Task tool with background agents +- Custom loop implementations with exit conditions + +**Research Questions**: +1. What are effective exit conditions to prevent infinite loops? +2. How do you balance autonomy vs. user oversight in agentic loops? +3. What metrics indicate loop progress vs. thrashing? +4. How should loops handle conflicting or contradictory results? +5. What's the optimal checkpoint frequency for long-running loops? + +**Next Steps**: +- [ ] Document /ralph-loop behavior and configuration options +- [ ] Identify common loop patterns (retry, refinement, exploration) +- [ ] Define safety guardrails (max iterations, timeout, resource limits) +- [ ] Test loop effectiveness for different task types +- [ ] Measure token usage and cost implications of loops + +**Related Patterns**: +- [Parallel Agents](../README.md#parallel-agents) - Multiple loops running concurrently +- [Developer Lifecycle](../README.md#developer-lifecycle) - Triggering loops on events +- [CheckPoint](#checkpoint) - Validation gates within loops + +**Anti-patterns to Avoid**: +- Unbounded loops without termination conditions (runaway costs) +- Loops that ignore previous iteration context (repeated failures) +- Over-automation without human checkpoints for critical decisions +- Single-threaded loops for parallelizable tasks + +--- + +### CheckPoint + +**Status**: Early exploration +**Date Added**: 2025-01-11 + +**Description**: A systematic validation gate that runs a series of quality checks (refactoring, security, code quality, performance, architecture, documentation) after each development task to ensure continuous quality. + +**Potential Use Cases**: +- Post-commit quality validation before pushing +- Pre-merge checks in pull request workflows +- Continuous compliance verification during development +- Architecture drift detection after feature additions +- Documentation freshness validation + +**Tools to Evaluate**: +- Claude Code slash commands (/xsecurity, /xquality, /xrefactor, etc.) +- Pre-commit hooks with multi-check orchestration +- Custom checkpoint scripts with configurable check suites +- CI/CD pipeline quality gates + +**Research Questions**: +1. What's the optimal set of checks to run after each task? +2. How do you balance thoroughness vs. developer velocity? +3. Should checkpoints be blocking or advisory? +4. How do you handle check failures mid-workflow? +5. Can AI assistants auto-remediate checkpoint failures? + +**Next Steps**: +- [ ] Define standard checkpoint check categories +- [ ] Create configurable checkpoint profiles (quick, standard, thorough) +- [ ] Implement checkpoint as Claude Code custom command +- [ ] Measure impact on code quality metrics over time +- [ ] Document checkpoint integration with CI/CD pipelines + +**Related Patterns**: +- [Code Quality Prerequisites](../README.md#code-quality-prerequisites) - CI/CD quality enforcement +- [Security Sandbox](../README.md#security-sandbox) - Security-focused checks +- [Agentic Loops](#agentic-loops) - Checkpoints as loop exit conditions +- [Guided Refactoring](../README.md#guided-refactoring) - Code improvement checks + +**Anti-patterns to Avoid**: +- Running all checks on every minor change (developer fatigue) +- Checkpoint failures without actionable remediation guidance +- Skipping checkpoints under time pressure (quality debt) +- One-size-fits-all checks regardless of change scope + +--- + ## Notes Template When adding new pattern explorations, copy this template: diff --git a/experiments/README.md b/experiments/README.md index d3fcb6d..4d2b5b7 100644 --- a/experiments/README.md +++ b/experiments/README.md @@ -1474,52 +1474,72 @@ Simon's caveat: *"They can't prove something is impossible—just because the co ### Centralized Rules **Maturity**: Advanced -**Description**: Enforce organization-wide AI rules through a central gateway service or shared SDK library rather than distributing configuration files to each repository. +**Description**: Enforce organization-wide AI rules through a central Git repository that syncs to standard AI assistant configuration files (CLAUDE.md, AGENTS.md, .cursorrules) with automatic language and framework detection. -**Related Patterns**: [Codified Rules](../README.md#codified-rules), [Policy Generation](../README.md#policy-generation), [Security Orchestration](../README.md#security-orchestration) +**Related Patterns**: [Codified Rules](../README.md#codified-rules), [Progressive Disclosure](#progressive-disclosure), [Security Orchestration](../README.md#security-orchestration) #### Core Implementation -Centralize AI rules in a three-layer architecture: +**Sync-based Architecture** (Recommended): -1. **Gateway service** - Internal service that owns all org rules, calls AI providers -2. **Wrapper library** - Shared SDK package that embeds org rules in system prompts -3. **CLI/editor layer** - Developer tools that call gateway or wrapper, never AI providers directly - -**Gateway pattern**: ``` -Developer tool → Internal gateway → AI provider - ↓ - Org rules applied - Input/output filtered - Usage logged +Central Rules Repository (Git) + ├── base/universal-rules.md + ├── languages/ (python.md, typescript.md, go.md) + └── frameworks/ (react.md, django.md, fastapi.md) + ↓ + [sync-ai-rules.sh] + ↓ + Project Repository + ├── CLAUDE.md (auto-generated) + ├── AGENTS.md (auto-generated) + └── .cursorrules (auto-generated) ``` -**Wrapper library pattern**: -``` -Developer tool → @yourorg/ai-client → AI provider - ↓ - Org rules in system prompt - Consistent across all repos +**How it works**: + +1. **Central repository** stores organization rules organized by language/framework +2. **Sync script** detects project language (Python, TypeScript, Go) and framework (React, Django, FastAPI) +3. **Auto-generates** standard config files (CLAUDE.md, .cursorrules, etc.) with relevant rules +4. **Works offline** - no API calls, no internet dependencies after initial sync + +**Example sync**: +```bash +# One-time setup per project +curl -O https://yourorg.com/sync-ai-rules.sh +chmod +x sync-ai-rules.sh + +# Run sync (manual or via pre-commit hook) +./sync-ai-rules.sh + +# Generates CLAUDE.md with: +# - Universal org rules +# - Python-specific rules (auto-detected from pyproject.toml) +# - FastAPI rules (auto-detected from dependencies) ``` -**Governance capabilities**: -- Input filters (block secrets, enforce read-only paths) -- Output filters (scan for banned APIs, license violations) -- Policy-as-code integration (OPA rules before/after AI calls) -- Centralized audit logging (repo, task type, tokens, files touched) +**Key benefits**: +- ✅ **Works with existing AI tools** - Claude Code, Cursor, Gemini all read standard config files +- ✅ **Offline-friendly** - No API gateway, no internet dependencies +- ✅ **Simple** - Single bash script, no Node.js services to deploy +- ✅ **Language-aware** - Auto-detects Python/TypeScript/Go and pulls relevant rules +- ✅ **Version-controlled** - Rules in Git, changes are auditable -**Benefits over distributed config**: -- Change rules once, all tools updated -- Enforceable guardrails (not just suggestions) -- Aggregate metrics across teams -- Model switching without repo changes +**Alternative: Gateway Pattern** (for advanced use cases): -Complete Example: See [examples/centralized-rules/](examples/centralized-rules/) for working gateway, wrapper library, and CLI implementations. +For organizations needing input/output filtering, policy enforcement, or usage logging, see [examples/centralized-rules/gateway-strategy/](examples/centralized-rules/gateway-strategy/) for API gateway approach with: +- Request/response filtering +- Policy-as-code integration (OPA/Cedar) +- Centralized audit logging +- Usage metrics aggregation + +Complete Examples: +- **[Sync Strategy](examples/centralized-rules/sync-strategy/)** - Simple Git-based sync (recommended) +- **[Gateway Strategy](examples/centralized-rules/gateway-strategy/)** - Advanced API gateway pattern #### Anti-pattern: Scattered Configuration -Copying AI rule files into every repository: +Copying AI rule files into every repository without central source: ``` repo-a/.cursorrules # v1.2 of org rules @@ -1529,9 +1549,11 @@ repo-c/.ai/rules/ # Custom fork, diverged **Problems**: - Rules drift across repositories -- No enforcement (developers can ignore or modify) -- No visibility into AI usage patterns -- Model/rule changes require updating every repo +- Manual updates required for every repo +- No consistency enforcement +- Difficult to track which repos have current rules + +**Solution**: Use centralized sync approach where rules are maintained in one place and automatically distributed to projects. --- From ccdd8d4b93e7cbc6d7aa8d9447c6404449f84a45 Mon Sep 17 00:00:00 2001 From: Paul Duvall Date: Sun, 11 Jan 2026 10:37:02 -0500 Subject: [PATCH 2/4] docs: fix Security Sandbox description in CheckPoint pattern Correct the related pattern description from "Security-focused checks" to "Running agents in isolated environments" to accurately reflect what the Security Sandbox pattern is about. Co-Authored-By: Claude --- experiments/NOTES.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/experiments/NOTES.md b/experiments/NOTES.md index 7f22355..c94c3b9 100644 --- a/experiments/NOTES.md +++ b/experiments/NOTES.md @@ -132,7 +132,7 @@ This file tracks patterns under exploration that may eventually be formalized in **Related Patterns**: - [Code Quality Prerequisites](../README.md#code-quality-prerequisites) - CI/CD quality enforcement -- [Security Sandbox](../README.md#security-sandbox) - Security-focused checks +- [Security Sandbox](../README.md#security-sandbox) - Running agents in isolated environments - [Agentic Loops](#agentic-loops) - Checkpoints as loop exit conditions - [Guided Refactoring](../README.md#guided-refactoring) - Code improvement checks From 9c9a04f23d95743c084d6e9fa6356231f1a0c461 Mon Sep 17 00:00:00 2001 From: Paul Duvall Date: Sun, 11 Jan 2026 10:40:05 -0500 Subject: [PATCH 3/4] docs: refine Agentic Loops pattern based on Ralph Wiggum plugin Update pattern to accurately describe long autonomous coding sessions: - Add Core Mechanics section (stop hook, file persistence, completion promise) - Refine use cases to match actual capabilities (greenfield, TDD, auto-verify) - Update research questions around completion promises and iteration limits - Improve anti-patterns with specific failure modes - Fix CheckPoint's related pattern description Co-Authored-By: Claude --- experiments/NOTES.md | 55 ++++++++++++++++++++++++-------------------- 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/experiments/NOTES.md b/experiments/NOTES.md index c94c3b9..5ad40d9 100644 --- a/experiments/NOTES.md +++ b/experiments/NOTES.md @@ -55,44 +55,49 @@ This file tracks patterns under exploration that may eventually be formalized in **Status**: Early exploration **Date Added**: 2025-01-11 -**Description**: Autonomous agent loops that continuously execute tasks, evaluate results, and iterate until completion or user intervention. Examples include the [Claude Code Ralph Wiggum plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md). +**Description**: Enable long autonomous coding sessions where AI iteratively improves work until explicit completion criteria are met. Uses a stop hook to intercept exit attempts and feed the same prompt back, allowing Claude to self-correct through test failures, error messages, and its own code. See the [Claude Code Ralph Wiggum plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md). + +**Core Mechanics**: +- **Stop hook** intercepts exit attempts and re-injects the original prompt +- **File persistence** allows each iteration to see previous work +- **Completion promise** (e.g., `COMPLETE`) signals success +- **Iteration limits** provide safety bounds (e.g., `--max-iterations 50`) **Potential Use Cases**: -- Autonomous code generation with self-correction -- Continuous refactoring until quality thresholds are met -- Iterative test writing until coverage goals are achieved -- Self-healing pipelines that retry with different approaches -- Long-running research tasks with progressive refinement +- Greenfield projects you can start and walk away from +- TDD workflows: write failing tests → implement → run tests → fix → repeat +- Multi-phase feature builds with clear success criteria +- Tasks with automatic verification (tests, linters, type checkers) **Tools to Evaluate**: -- [Ralph Wiggum Plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md) - Claude Code agentic loop implementation -- Claude Code Task tool with background agents -- Custom loop implementations with exit conditions +- [Ralph Wiggum Plugin](https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md) - Official Claude Code agentic loop implementation +- Custom stop hooks with iteration tracking +- Prompt templates with completion promises **Research Questions**: -1. What are effective exit conditions to prevent infinite loops? -2. How do you balance autonomy vs. user oversight in agentic loops? -3. What metrics indicate loop progress vs. thrashing? -4. How should loops handle conflicting or contradictory results? -5. What's the optimal checkpoint frequency for long-running loops? +1. How do you craft effective completion promises that prevent false positives? +2. What iteration limits balance thoroughness vs. cost for different task types? +3. How should prompts structure incremental goals for multi-phase work? +4. When should loops include explicit fallback/escape instructions? +5. What metrics distinguish productive iteration from thrashing? **Next Steps**: -- [ ] Document /ralph-loop behavior and configuration options -- [ ] Identify common loop patterns (retry, refinement, exploration) -- [ ] Define safety guardrails (max iterations, timeout, resource limits) -- [ ] Test loop effectiveness for different task types -- [ ] Measure token usage and cost implications of loops +- [ ] Test /ralph-loop with various task types (API builds, test suites, refactoring) +- [ ] Document effective prompt templates with completion promises +- [ ] Measure iteration counts and API costs for common workflows +- [ ] Define prompt patterns for self-correction (TDD cycles, debug loops) +- [ ] Identify tasks unsuitable for agentic loops (design decisions, unclear criteria) **Related Patterns**: - [Parallel Agents](../README.md#parallel-agents) - Multiple loops running concurrently - [Developer Lifecycle](../README.md#developer-lifecycle) - Triggering loops on events -- [CheckPoint](#checkpoint) - Validation gates within loops +- [CheckPoint](#checkpoint) - Validation criteria within loop iterations **Anti-patterns to Avoid**: -- Unbounded loops without termination conditions (runaway costs) -- Loops that ignore previous iteration context (repeated failures) -- Over-automation without human checkpoints for critical decisions -- Single-threaded loops for parallelizable tasks +- Missing iteration limits (runaway costs, infinite loops) +- Vague completion criteria ("make it good" vs. explicit success metrics) +- Tasks requiring human judgment or design decisions +- Prompts without self-correction guidance (test → fix → retry cycles) --- @@ -133,7 +138,7 @@ This file tracks patterns under exploration that may eventually be formalized in **Related Patterns**: - [Code Quality Prerequisites](../README.md#code-quality-prerequisites) - CI/CD quality enforcement - [Security Sandbox](../README.md#security-sandbox) - Running agents in isolated environments -- [Agentic Loops](#agentic-loops) - Checkpoints as loop exit conditions +- [Agentic Loops](#agentic-loops) - Long autonomous coding sessions with self-correction - [Guided Refactoring](../README.md#guided-refactoring) - Code improvement checks **Anti-patterns to Avoid**: From b42e90b10786fcfbf6ee7731c49f68a49c365f6e Mon Sep 17 00:00:00 2001 From: Paul Duvall Date: Sun, 11 Jan 2026 10:44:06 -0500 Subject: [PATCH 4/4] docs: add maintainability anti-pattern to Agentic Loops Warn against generating large codebases you don't understand or know how to maintain - a key risk of long autonomous coding sessions. Co-Authored-By: Claude --- experiments/NOTES.md | 1 + 1 file changed, 1 insertion(+) diff --git a/experiments/NOTES.md b/experiments/NOTES.md index 5ad40d9..5f1ed72 100644 --- a/experiments/NOTES.md +++ b/experiments/NOTES.md @@ -98,6 +98,7 @@ This file tracks patterns under exploration that may eventually be formalized in - Vague completion criteria ("make it good" vs. explicit success metrics) - Tasks requiring human judgment or design decisions - Prompts without self-correction guidance (test → fix → retry cycles) +- Generating large codebases you don't understand or know how to maintain ---