Skip to content

docs: codify pipeline learnings and align pipeline metadata#334

Merged
NickBorgers merged 9 commits intomainfrom
docs/pipeline-learnings
Apr 7, 2026
Merged

docs: codify pipeline learnings and align pipeline metadata#334
NickBorgers merged 9 commits intomainfrom
docs/pipeline-learnings

Conversation

@NickBorgers
Copy link
Copy Markdown
Collaborator

@NickBorgers NickBorgers commented Apr 7, 2026

Summary

  • Add docs/agentic-pipeline-learnings.md, distilling merged pipeline PR lessons into prescriptive current-contract rules and guardrails
  • Align .github/workflows/review-coverage-evaluator.yml with the current six-stage review pipeline description
  • Update docs/index.md and AGENTS.md so the new pipeline-learnings doc is discoverable and listed in contributor workflow context

Test plan

  • Doc renders correctly on GitHub
  • PR numbers link to real merged PRs
  • Pipeline itself reviews this PR (meta-validation)
  • Merge probe against origin/main completed without conflicts

🤖 Generated with Claude Code

Distill ~100 merged PRs into prescriptive rules covering review pipeline
architecture, CI runtime infrastructure, and OpenClaw runtime/cron recovery.
Each rule cites the PRs that taught us the failure mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the review-cycle-1 Review cycle 1 started label Apr 7, 2026
**Why:** Docker silently fails (or errors only at startup) when a bind-mount source path doesn't exist. Always `mkdir -p ~/.config/gh ~/.claude ~/.codex` before launching the devcontainer.
**Evidence:** #77, #78

### 2.2 Inside containers, use `ANTHROPIC_API_KEY` (not `CLAUDE_CODE_OAUTH_TOKEN`)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would honestly drop this, the bigger lesson is "don't have your pipelines do authentication for LLM token consumption". We really moved to a model of using LiteLLM noauth, in an enterprise I'd say use SPIFFE mTLS transparent to the agent

@github-actions github-actions bot added the security-review-started Security review has started label Apr 7, 2026
@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Security & Infrastructure Review

Approved - No security or infrastructure issues

@github-actions github-actions bot added the psych-review-started Psych research review has started label Apr 7, 2026
@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Psychological Research Evidence Review

Approved - No user-facing behavioral changes to evaluate against ADHD research.

@github-actions github-actions bot added docs-review-started Docs review has started design-review-started Design review has started prompt-review-started Prompt engineering review has started all-reviews-started All required review jobs have started labels Apr 7, 2026
**Evidence:** #112, #199, #263

### 3.2 Heartbeat enforces spec drift, not just job existence
**Why:** Heartbeat compares live cron jobs against canonical specs in `setup/cron/` and patches drift via `CronUpdate`. `pull-main` triggers a fast re-apply when specs change.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence contradicts the canonical cron docs. setup/cron/pull-main.md explicitly says cron spec re-application is handled by heartbeat drift correction on its next cycle, not by pull-main itself, because the isolated session cannot reliably call CronList/CronUpdate. Please align this rule with HEARTBEAT.md, setup/cron/pull-main.md, and docs/openclaw-integration.md.

**Why:** Docker silently fails (or errors only at startup) when a bind-mount source path doesn't exist. Always `mkdir -p ~/.config/gh ~/.claude ~/.codex` before launching the devcontainer.
**Evidence:** #77, #78

### 2.2 Inside containers, use `ANTHROPIC_API_KEY` (not `CLAUDE_CODE_OAUTH_TOKEN`)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not describe the current CI implementation. I cannot find any workflow or action in this repo that forwards ANTHROPIC_API_KEY into the container; the Codex jobs pass OPENAI_API_KEY=fake-key and rely on baked credentials/config instead. If this file is meant to be prescriptive about what we do now, this rule needs to be rewritten to match the current auth path or explicitly framed as historical context.

**Why:** Closing and reopening a PR previously bypassed the review gate because the `all-reviews-passed` check evaluated before reviews actually re-ran.
**Evidence:** #92, #99

### 3.7 Cron specs declare `delivery`, `best-effort-deliver`, and `timeout-seconds` explicitly
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see this reflected in the actual cron specs. setup/cron/reminder-check.md and setup/cron/pull-main.md declare timeout-seconds, but neither spec declares delivery or best-effort-deliver, and the heartbeat/openclaw docs describe the contract in terms of no to plus payload.kind. As written this introduces spec drift in a spec-critical doc.

**Why:** BuildKit's default attestations produce OCI manifest lists without valid platform metadata, breaking `docker push`.
**Evidence:** #133

### 2.6 Wrap `docker run` directly for run steps; reserve `devcontainers/ci@v0.3` for build-and-push only
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism here is also out of sync with the repo. The current run path does reserve devcontainers/ci@v0.3 for build/push, but .github/actions/run-devcontainer/action.yml uses @devcontainers/cli up/exec, not a direct docker run wrapper. Since the doc claims to capture the exact rule we now follow, the wording should match the implemented abstraction.

@@ -0,0 +1,158 @@
# Agentic Pipeline Learnings
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new documentation page, but docs/index.md still lists the old doc set only. Add an entry for docs/agentic-pipeline-learnings.md on the docs home page so the published documentation remains complete and discoverable.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Documentation Consistency Review

Updates Needed

docs/index.md — Add a new documentation entry linking to docs/agentic-pipeline-learnings.md (for example in the main Documentation list, with a short description such as pipeline/review lessons). This PR adds a new docs page, but the docs home page still advertises the pre-PR set only, so the published documentation is now incomplete and the new page is not discoverable from the index.

Conclusion

Updates needed (non-blocking)

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Design Review

Blocking Issues

  • Several CI runtime rules are not actually "what we now do": section 2.2 says container auth is ANTHROPIC_API_KEY, but the current reviewer jobs pass OPENAI_API_KEY=fake-key and rely on baked credentials/config; section 2.6 says run steps wrap docker run, but the repo now uses .github/actions/run-devcontainer/action.yml with @devcontainers/cli up/exec. In a spec-critical doc, those differences are blocking drift.
  • Section 3.2 contradicts the canonical cron sources. This doc says pull-main triggers a fast cron spec re-apply, while setup/cron/pull-main.md, HEARTBEAT.md, and docs/openclaw-integration.md all say heartbeat performs drift correction on its next cycle because the isolated pull-main session cannot reliably call CronList/CronUpdate.
  • Section 3.7 is currently false. setup/cron/reminder-check.md and setup/cron/pull-main.md declare timeout-seconds, but they do not declare delivery or best-effort-deliver; the current contract is expressed as no direct-delivery to plus payload.kind. Shipping this would introduce docs-as-spec inconsistency.

Worth Considering

  • Scope check: PASS.

Conclusion

Needs revision — the doc mixes valid historical lessons with present-tense rules that do not match the repo’s current workflows and cron specs.

**Why:** Closing and reopening a PR previously bypassed the review gate because the `all-reviews-passed` check evaluated before reviews actually re-ran.
**Evidence:** #92, #99

### 3.7 Cron specs declare `delivery`, `best-effort-deliver`, and `timeout-seconds` explicitly
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule contradicts the current cron contract elsewhere in the repo. setup/cron/reminder-check.md, setup/cron/pull-main.md, HEARTBEAT.md, docs/architecture.md, and docs/openclaw-integration.md all define the canonical comparison fields as name, durable, schedule, prompt, sessionTarget, model, no direct-delivery to, payload.kind, and timeout-seconds. They do not require delivery or best-effort-deliver. Please either scope this lesson to older direct-delivery cron designs or rewrite it to match the current isolated-cron spec so future reviewers do not enforce nonexistent fields.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Prompt Engineering Review

Issues Found

  • docs/agentic-pipeline-learnings.md:143 — The new rule says cron specs must declare delivery, best-effort-deliver, and timeout-seconds, but the rest of the repo defines the current canonical cron contract differently: HEARTBEAT.md, docs/architecture.md, docs/openclaw-integration.md, setup/cron/reminder-check.md, and setup/cron/pull-main.md all treat the authoritative fields as name, durable, schedule, prompt, sessionTarget, model, no direct-delivery to, payload.kind, and timeout-seconds. As written, this creates a cross-doc contradiction that can mislead future reviewers into enforcing nonexistent fields. Fix by either scoping this lesson to an older direct-delivery cron design or rewriting it to match the current isolated-cron spec.

Conclusion

Approved with suggestions

@github-actions github-actions bot added the merge-decision-started Merge decision review has started label Apr 7, 2026
@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Merge Decision

Decision: GO-CLEAN

Rationale: The review feedback identified doc drift rather than structural problems, and the required corrections were mechanical and directly supported by the repo’s current workflows and cron specs. I applied those fixes, rebased the branch, and a merge probe against origin/main completed cleanly with no conflicts.

Applied fixes: Aligned docs/agentic-pipeline-learnings.md with the implemented CI/runtime contract (run-devcontainer, current reviewer env/auth pattern, heartbeat-owned cron drift correction, isolated cron registration fields) and added docs/agentic-pipeline-learnings.md to docs/index.md for discoverability.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Security & Infrastructure Review

Approved - No security or infrastructure issues

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Psychological Research Evidence Review

Approved - No user-facing behavioral changes to evaluate against ADHD research.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Prompt Engineering Review

Approved - All prompts remain clear, complete, and consistent.

**Why:** Untrusted PR code on homelab runners with credential access is a privilege escalation. Building devcontainer images from the PR ref is also a Dockerfile injection vector.
**Evidence:** #110

### 2.5 Set `BUILDX_NO_DEFAULT_ATTESTATIONS=1` when pushing devcontainer images
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot verify this from the current repo. rg BUILDX_NO_DEFAULT_ATTESTATIONS only hits this new doc; none of the workflows or actions set the env var before image pushes. A prescriptive doc should not state this as current practice unless the implementation is actually present.

**Why:** Runtime CLI downloads stalled on slow networks; missing model defaults caused fallback to mismatched models. Downgrading three of six review stages to `gpt-5-mini` cut review cost ~45% with no measurable quality loss.
**Evidence:** #110, #130, #145, #292

### 2.8 Auto-detect and strip `SSH_AUTH_SOCK` mount in CI
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section appears stale relative to the current implementation. .devcontainer/devcontainer.json no longer defines an ${localEnv:SSH_AUTH_SOCK} mount, and .github/actions/run-devcontainer/action.yml just copies devcontainer.json without stripping any SSH mount. Document the current behavior here or remove the rule.


## 1. Review Pipeline Architecture

### 1.1 Reviewers are read-only; only `merge-decision` pushes
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with the live workflow: fix-test-failures in .github/workflows/codex-code-review.yml still has contents: write and explicitly tells Codex to Commit and push your fixes (.github/workflows/codex-code-review.yml:621-752). If the intent is that only post-review merge resolution can push, this needs narrower wording; as written, the rule is false today.


## 1. Review Pipeline Architecture

### 1.1 Reviewers are read-only; only `merge-decision` pushes
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overstates the current single-writer model. The parallel review jobs are read-only, but the workflow still has a separate fix-test-failures writer path that can commit and push before reviews run (.github/workflows/codex-code-review.yml:612-767). Please reword this to distinguish review jobs from the pre-review auto-fix job, or the doc will contradict the actual pipeline.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Merge Decision

Decision: GO-CLEAN

Rationale: The remaining review feedback was limited to narrow docs-as-spec drift, and the fixes were mechanical rewrites to match the live workflow and clarify source-of-truth boundaries. I applied those corrections, rebased the branch, and a non-destructive merge simulation against origin/main completed cleanly with no conflicts.

Applied fixes: Clarified docs/agentic-pipeline-learnings.md as CI/review guidance rather than OpenClaw runtime spec, tightened section 1.9 so it explicitly describes the current design/* classifier gap as a guardrail instead of an already-enforced invariant, rewrote section 2.4 to match the current external-PR and main-ref build behavior, and updated docs/openclaw-integration.md so docs/ is no longer described as runtime behavior spec without qualification.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Security & Infrastructure Review

Approved - No security or infrastructure issues

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Psychological Research Evidence Review

Approved - No user-facing behavioral changes to evaluate against ADHD research.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Prompt Engineering Review

Approved - All prompts remain clear, complete, and consistent.


The boundary is clean: we write markdown and bash scripts; OpenClaw turns that into a running agent that talks to people.

Most files under `docs/` are runtime behavior spec, but not all of them. Pipeline-focused references such as `docs/agentic-pipeline-learnings.md` document contributor/CI guardrails rather than the OpenClaw runtime contract.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new distinction makes sense, but it leaves AGENTS.md slightly out of sync: the ## Architecture section still describes docs/ as if the whole directory were runtime behavior spec. Ask the merge-decision agent to update that bullet so it matches the new mixed-runtime-vs-pipeline split.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Documentation Consistency Review

Updates Needed

  • AGENTS.md: In the ## Architecture section, update the - **Docs**: docs/ — prompt architecture, interaction patterns, and behavior specs bullet so it no longer implies the entire docs/ directory is runtime-spec material. This PR explicitly adds docs/agentic-pipeline-learnings.md as a contributor/CI guardrail doc, and docs/openclaw-integration.md now says not all files under docs/ are part of the OpenClaw runtime contract. Without that wording fix, AGENTS.md contradicts the new docs split.

Conclusion

Updates needed (non-blocking)


The boundary is clean: we write markdown and bash scripts; OpenClaw turns that into a running agent that talks to people.

Most files under `docs/` are runtime behavior spec, but not all of them. Pipeline-focused references such as `docs/agentic-pipeline-learnings.md` document contributor/CI guardrails rather than the OpenClaw runtime contract.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This carve-out now conflicts with the canonical contributor guidance in AGENTS.md:198, which still says "Docs define agent behavior — changing a doc IS changing the system." Since this PR explicitly classifies docs/agentic-pipeline-learnings.md as non-runtime docs/, the AGENTS wording needs to be narrowed as well; otherwise contributors are left with two incompatible rules about whether docs/ changes are product-behavior changes.

**Evidence:** #247

### 3.5 Pass event IDs through Actions; fetch bodies inside the container
**Why:** Multi-line comment bodies in Actions outputs cause shell-escaping failures and context loss. Pass only `COMMENT_ID`, `ISSUE_NUMBER`, `RUN_ID`, then `gh api` the body inside the devcontainer.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like a current pipeline-wide contract, but .github/workflows/codex-code-review.yml still passes PR_BODY_B64 / ISSUE_BODY_B64 through workflow outputs into the reviewer containers (for example at lines 1007-1016 and 1427-1436). Either scope this rule to the event-driven codex.yml path that already follows it, or align the review workflow before documenting it as current contract.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Design Review

Blocking Issues

  • Scope check: PASS.
  • docs/openclaw-integration.md:255 now says some docs/ files are contributor/CI guidance rather than runtime contract, but AGENTS.md:198 still says "Docs define agent behavior — changing a doc IS changing the system." Because this PR explicitly introduces a non-runtime docs/ exception, those two sources now give incompatible instructions about how docs/ changes should be treated.
  • docs/agentic-pipeline-learnings.md:137-138 is written as current contract, but .github/workflows/codex-code-review.yml:1007-1016 and .github/workflows/codex-code-review.yml:1427-1436 still pass PR/issue bodies through workflow outputs into reviewer containers. Either scope that rule to the flows that actually use ID-only handoff or align the review workflow before documenting it as the current contract.

Worth Considering

  • If docs/agentic-pipeline-learnings.md is intentionally non-runtime, tighten the terminology around spec-critical, docs_only, and "runtime spec" so contributors are not left to infer which review rules are about runtime behavior versus full-pipeline scrutiny.

Conclusion

Needs revision — the new pipeline contract doc introduces unresolved source-of-truth drift instead of cleanly codifying the current system.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Merge Decision

Decision: GO-CLEAN

Rationale: The branch was already close to merge-ready; the only remaining issues were narrow docs-as-spec wording mismatches, and the fixes were mechanical rewrites to match the live workflow and repo boundary definitions. I also ran the required non-destructive merge probe against origin/main, which completed cleanly with no conflicts.

Applied fixes: Updated AGENTS.md so it no longer implies every file under docs/ is runtime spec, and narrowed section 3.5 of docs/agentic-pipeline-learnings.md so it describes the current event-body handoff pattern without overstating today's implementation.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Security & Infrastructure Review

Approved - No security or infrastructure issues

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Psychological Research Evidence Review

Approved - No user-facing behavioral changes to evaluate against ADHD research.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Prompt Engineering Review

Approved - All prompts remain clear, complete, and consistent.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Documentation Consistency Review

Approved - All documentation remains consistent.

**Evidence:** #269

### 1.9 Guardrail: spec-critical `.md` files should stay on the full review path
**Why:** Files like `setup/cron/reminder-check.md`, `TOOLS.md`, and `SOUL.md` are *executable* — they define agent behavior. The current `docs_only=true` classifier is only an implementation shortcut, not a semantic proof that every matching Markdown file is inert, and today it is still too broad around `design/*`; for example, `AGENTS.md` treats `design/adhd-priorities.md` as part of the OpenClaw spec surface even though the workflow currently classifies `design/*` as `docs_only=true`. Future classifier tightening should carve out prompt-bearing design docs instead of assuming all `design/*` changes are safe to bypass the full review path.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this turns a known classifier mismatch into part of the written pipeline contract without reconciling it. AGENTS.md still lists design/adhd-priorities.md as spec-critical, but .github/workflows/codex-code-review.yml still leaves design/* under docs_only=true, which skips security review. For a prescriptive docs-as-spec document, either fix the classifier in this PR or stop describing that design doc as part of the spec surface.

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Design Review

Blocking Issues

  • docs/agentic-pipeline-learnings.md now presents the design/* classifier hole as part of the pipeline contract, but the live workflow still classifies design/* as docs_only=true while AGENTS.md still lists design/adhd-priorities.md as part of the OpenClaw spec surface. That leaves a spec-critical file on a reduced review path, which fails the required docs-as-spec consistency check. Conflicting sources: docs/agentic-pipeline-learnings.md:57, AGENTS.md:139, .github/workflows/codex-code-review.yml:541-546.

Conclusion

Needs revision — the PR improves discoverability and stage naming, but it codifies an unresolved contract mismatch between the docs and the actual review classifier.
Scope check: PASS

@NickBorgers
Copy link
Copy Markdown
Collaborator Author

Merge Decision

Decision: GO-CLEAN

Rationale: The latest branch state already incorporates the review-driven documentation and metadata fixes, and the current automated review gate is green across design, security, psych, prompt, and docs checks. I also ran the required merge probe against origin/main; it merged cleanly with no content conflicts and was then aborted at the --no-commit boundary.

@NickBorgers NickBorgers merged commit c7af494 into main Apr 7, 2026
12 of 13 checks passed
@NickBorgers NickBorgers deleted the docs/pipeline-learnings branch April 7, 2026 14:18
@github-actions github-actions bot removed all-reviews-started All required review jobs have started design-review-started Design review has started security-review-started Security review has started psych-review-started Psych research review has started docs-review-started Docs review has started prompt-review-started Prompt engineering review has started merge-decision-started Merge decision review has started review-cycle-1 Review cycle 1 started labels Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants