docs: codify pipeline learnings and align pipeline metadata#334
docs: codify pipeline learnings and align pipeline metadata#334NickBorgers merged 9 commits intomainfrom
Conversation
Distill ~100 merged PRs into prescriptive rules covering review pipeline architecture, CI runtime infrastructure, and OpenClaw runtime/cron recovery. Each rule cites the PRs that taught us the failure mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Docker silently fails (or errors only at startup) when a bind-mount source path doesn't exist. Always `mkdir -p ~/.config/gh ~/.claude ~/.codex` before launching the devcontainer. | ||
| **Evidence:** #77, #78 | ||
|
|
||
| ### 2.2 Inside containers, use `ANTHROPIC_API_KEY` (not `CLAUDE_CODE_OAUTH_TOKEN`) |
There was a problem hiding this comment.
I would honestly drop this, the bigger lesson is "don't have your pipelines do authentication for LLM token consumption". We really moved to a model of using LiteLLM noauth, in an enterprise I'd say use SPIFFE mTLS transparent to the agent
Security & Infrastructure ReviewApproved - No security or infrastructure issues |
Psychological Research Evidence ReviewApproved - No user-facing behavioral changes to evaluate against ADHD research. |
docs/agentic-pipeline-learnings.md
Outdated
| **Evidence:** #112, #199, #263 | ||
|
|
||
| ### 3.2 Heartbeat enforces spec drift, not just job existence | ||
| **Why:** Heartbeat compares live cron jobs against canonical specs in `setup/cron/` and patches drift via `CronUpdate`. `pull-main` triggers a fast re-apply when specs change. |
There was a problem hiding this comment.
This sentence contradicts the canonical cron docs. setup/cron/pull-main.md explicitly says cron spec re-application is handled by heartbeat drift correction on its next cycle, not by pull-main itself, because the isolated session cannot reliably call CronList/CronUpdate. Please align this rule with HEARTBEAT.md, setup/cron/pull-main.md, and docs/openclaw-integration.md.
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Docker silently fails (or errors only at startup) when a bind-mount source path doesn't exist. Always `mkdir -p ~/.config/gh ~/.claude ~/.codex` before launching the devcontainer. | ||
| **Evidence:** #77, #78 | ||
|
|
||
| ### 2.2 Inside containers, use `ANTHROPIC_API_KEY` (not `CLAUDE_CODE_OAUTH_TOKEN`) |
There was a problem hiding this comment.
This does not describe the current CI implementation. I cannot find any workflow or action in this repo that forwards ANTHROPIC_API_KEY into the container; the Codex jobs pass OPENAI_API_KEY=fake-key and rely on baked credentials/config instead. If this file is meant to be prescriptive about what we do now, this rule needs to be rewritten to match the current auth path or explicitly framed as historical context.
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Closing and reopening a PR previously bypassed the review gate because the `all-reviews-passed` check evaluated before reviews actually re-ran. | ||
| **Evidence:** #92, #99 | ||
|
|
||
| ### 3.7 Cron specs declare `delivery`, `best-effort-deliver`, and `timeout-seconds` explicitly |
There was a problem hiding this comment.
I do not see this reflected in the actual cron specs. setup/cron/reminder-check.md and setup/cron/pull-main.md declare timeout-seconds, but neither spec declares delivery or best-effort-deliver, and the heartbeat/openclaw docs describe the contract in terms of no to plus payload.kind. As written this introduces spec drift in a spec-critical doc.
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** BuildKit's default attestations produce OCI manifest lists without valid platform metadata, breaking `docker push`. | ||
| **Evidence:** #133 | ||
|
|
||
| ### 2.6 Wrap `docker run` directly for run steps; reserve `devcontainers/ci@v0.3` for build-and-push only |
There was a problem hiding this comment.
The mechanism here is also out of sync with the repo. The current run path does reserve devcontainers/ci@v0.3 for build/push, but .github/actions/run-devcontainer/action.yml uses @devcontainers/cli up/exec, not a direct docker run wrapper. Since the doc claims to capture the exact rule we now follow, the wording should match the implemented abstraction.
| @@ -0,0 +1,158 @@ | |||
| # Agentic Pipeline Learnings | |||
There was a problem hiding this comment.
This PR adds a new documentation page, but docs/index.md still lists the old doc set only. Add an entry for docs/agentic-pipeline-learnings.md on the docs home page so the published documentation remains complete and discoverable.
Documentation Consistency ReviewUpdates Needed
ConclusionUpdates needed (non-blocking) |
Design ReviewBlocking Issues
Worth Considering
ConclusionNeeds revision — the doc mixes valid historical lessons with present-tense rules that do not match the repo’s current workflows and cron specs. |
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Closing and reopening a PR previously bypassed the review gate because the `all-reviews-passed` check evaluated before reviews actually re-ran. | ||
| **Evidence:** #92, #99 | ||
|
|
||
| ### 3.7 Cron specs declare `delivery`, `best-effort-deliver`, and `timeout-seconds` explicitly |
There was a problem hiding this comment.
This rule contradicts the current cron contract elsewhere in the repo. setup/cron/reminder-check.md, setup/cron/pull-main.md, HEARTBEAT.md, docs/architecture.md, and docs/openclaw-integration.md all define the canonical comparison fields as name, durable, schedule, prompt, sessionTarget, model, no direct-delivery to, payload.kind, and timeout-seconds. They do not require delivery or best-effort-deliver. Please either scope this lesson to older direct-delivery cron designs or rewrite it to match the current isolated-cron spec so future reviewers do not enforce nonexistent fields.
Prompt Engineering ReviewIssues Found
ConclusionApproved with suggestions |
Merge DecisionDecision: GO-CLEAN Rationale: The review feedback identified doc drift rather than structural problems, and the required corrections were mechanical and directly supported by the repo’s current workflows and cron specs. I applied those fixes, rebased the branch, and a merge probe against Applied fixes: Aligned |
Security & Infrastructure ReviewApproved - No security or infrastructure issues |
Psychological Research Evidence ReviewApproved - No user-facing behavioral changes to evaluate against ADHD research. |
Prompt Engineering ReviewApproved - All prompts remain clear, complete, and consistent. |
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Untrusted PR code on homelab runners with credential access is a privilege escalation. Building devcontainer images from the PR ref is also a Dockerfile injection vector. | ||
| **Evidence:** #110 | ||
|
|
||
| ### 2.5 Set `BUILDX_NO_DEFAULT_ATTESTATIONS=1` when pushing devcontainer images |
There was a problem hiding this comment.
I cannot verify this from the current repo. rg BUILDX_NO_DEFAULT_ATTESTATIONS only hits this new doc; none of the workflows or actions set the env var before image pushes. A prescriptive doc should not state this as current practice unless the implementation is actually present.
docs/agentic-pipeline-learnings.md
Outdated
| **Why:** Runtime CLI downloads stalled on slow networks; missing model defaults caused fallback to mismatched models. Downgrading three of six review stages to `gpt-5-mini` cut review cost ~45% with no measurable quality loss. | ||
| **Evidence:** #110, #130, #145, #292 | ||
|
|
||
| ### 2.8 Auto-detect and strip `SSH_AUTH_SOCK` mount in CI |
There was a problem hiding this comment.
This section appears stale relative to the current implementation. .devcontainer/devcontainer.json no longer defines an ${localEnv:SSH_AUTH_SOCK} mount, and .github/actions/run-devcontainer/action.yml just copies devcontainer.json without stripping any SSH mount. Document the current behavior here or remove the rule.
docs/agentic-pipeline-learnings.md
Outdated
|
|
||
| ## 1. Review Pipeline Architecture | ||
|
|
||
| ### 1.1 Reviewers are read-only; only `merge-decision` pushes |
There was a problem hiding this comment.
This conflicts with the live workflow: fix-test-failures in .github/workflows/codex-code-review.yml still has contents: write and explicitly tells Codex to Commit and push your fixes (.github/workflows/codex-code-review.yml:621-752). If the intent is that only post-review merge resolution can push, this needs narrower wording; as written, the rule is false today.
docs/agentic-pipeline-learnings.md
Outdated
|
|
||
| ## 1. Review Pipeline Architecture | ||
|
|
||
| ### 1.1 Reviewers are read-only; only `merge-decision` pushes |
There was a problem hiding this comment.
This overstates the current single-writer model. The parallel review jobs are read-only, but the workflow still has a separate fix-test-failures writer path that can commit and push before reviews run (.github/workflows/codex-code-review.yml:612-767). Please reword this to distinguish review jobs from the pre-review auto-fix job, or the doc will contradict the actual pipeline.
Merge DecisionDecision: GO-CLEAN Rationale: The remaining review feedback was limited to narrow docs-as-spec drift, and the fixes were mechanical rewrites to match the live workflow and clarify source-of-truth boundaries. I applied those corrections, rebased the branch, and a non-destructive merge simulation against Applied fixes: Clarified |
Security & Infrastructure ReviewApproved - No security or infrastructure issues |
Psychological Research Evidence ReviewApproved - No user-facing behavioral changes to evaluate against ADHD research. |
Prompt Engineering ReviewApproved - All prompts remain clear, complete, and consistent. |
|
|
||
| The boundary is clean: we write markdown and bash scripts; OpenClaw turns that into a running agent that talks to people. | ||
|
|
||
| Most files under `docs/` are runtime behavior spec, but not all of them. Pipeline-focused references such as `docs/agentic-pipeline-learnings.md` document contributor/CI guardrails rather than the OpenClaw runtime contract. |
There was a problem hiding this comment.
This new distinction makes sense, but it leaves AGENTS.md slightly out of sync: the ## Architecture section still describes docs/ as if the whole directory were runtime behavior spec. Ask the merge-decision agent to update that bullet so it matches the new mixed-runtime-vs-pipeline split.
Documentation Consistency ReviewUpdates Needed
ConclusionUpdates needed (non-blocking) |
|
|
||
| The boundary is clean: we write markdown and bash scripts; OpenClaw turns that into a running agent that talks to people. | ||
|
|
||
| Most files under `docs/` are runtime behavior spec, but not all of them. Pipeline-focused references such as `docs/agentic-pipeline-learnings.md` document contributor/CI guardrails rather than the OpenClaw runtime contract. |
There was a problem hiding this comment.
This carve-out now conflicts with the canonical contributor guidance in AGENTS.md:198, which still says "Docs define agent behavior — changing a doc IS changing the system." Since this PR explicitly classifies docs/agentic-pipeline-learnings.md as non-runtime docs/, the AGENTS wording needs to be narrowed as well; otherwise contributors are left with two incompatible rules about whether docs/ changes are product-behavior changes.
docs/agentic-pipeline-learnings.md
Outdated
| **Evidence:** #247 | ||
|
|
||
| ### 3.5 Pass event IDs through Actions; fetch bodies inside the container | ||
| **Why:** Multi-line comment bodies in Actions outputs cause shell-escaping failures and context loss. Pass only `COMMENT_ID`, `ISSUE_NUMBER`, `RUN_ID`, then `gh api` the body inside the devcontainer. |
There was a problem hiding this comment.
This reads like a current pipeline-wide contract, but .github/workflows/codex-code-review.yml still passes PR_BODY_B64 / ISSUE_BODY_B64 through workflow outputs into the reviewer containers (for example at lines 1007-1016 and 1427-1436). Either scope this rule to the event-driven codex.yml path that already follows it, or align the review workflow before documenting it as current contract.
Design ReviewBlocking Issues
Worth Considering
ConclusionNeeds revision — the new pipeline contract doc introduces unresolved source-of-truth drift instead of cleanly codifying the current system. |
Merge DecisionDecision: GO-CLEAN Rationale: The branch was already close to merge-ready; the only remaining issues were narrow docs-as-spec wording mismatches, and the fixes were mechanical rewrites to match the live workflow and repo boundary definitions. I also ran the required non-destructive merge probe against Applied fixes: Updated |
Security & Infrastructure ReviewApproved - No security or infrastructure issues |
Psychological Research Evidence ReviewApproved - No user-facing behavioral changes to evaluate against ADHD research. |
Prompt Engineering ReviewApproved - All prompts remain clear, complete, and consistent. |
Documentation Consistency ReviewApproved - All documentation remains consistent. |
| **Evidence:** #269 | ||
|
|
||
| ### 1.9 Guardrail: spec-critical `.md` files should stay on the full review path | ||
| **Why:** Files like `setup/cron/reminder-check.md`, `TOOLS.md`, and `SOUL.md` are *executable* — they define agent behavior. The current `docs_only=true` classifier is only an implementation shortcut, not a semantic proof that every matching Markdown file is inert, and today it is still too broad around `design/*`; for example, `AGENTS.md` treats `design/adhd-priorities.md` as part of the OpenClaw spec surface even though the workflow currently classifies `design/*` as `docs_only=true`. Future classifier tightening should carve out prompt-bearing design docs instead of assuming all `design/*` changes are safe to bypass the full review path. |
There was a problem hiding this comment.
Blocking: this turns a known classifier mismatch into part of the written pipeline contract without reconciling it. AGENTS.md still lists design/adhd-priorities.md as spec-critical, but .github/workflows/codex-code-review.yml still leaves design/* under docs_only=true, which skips security review. For a prescriptive docs-as-spec document, either fix the classifier in this PR or stop describing that design doc as part of the spec surface.
Design ReviewBlocking Issues
ConclusionNeeds revision — the PR improves discoverability and stage naming, but it codifies an unresolved contract mismatch between the docs and the actual review classifier. |
Merge DecisionDecision: GO-CLEAN Rationale: The latest branch state already incorporates the review-driven documentation and metadata fixes, and the current automated review gate is green across design, security, psych, prompt, and docs checks. I also ran the required merge probe against |
Summary
docs/agentic-pipeline-learnings.md, distilling merged pipeline PR lessons into prescriptive current-contract rules and guardrails.github/workflows/review-coverage-evaluator.ymlwith the current six-stage review pipeline descriptiondocs/index.mdandAGENTS.mdso the new pipeline-learnings doc is discoverable and listed in contributor workflow contextTest plan
origin/maincompleted without conflicts🤖 Generated with Claude Code