Outbound staging-ff: classify workflow-scope push rejection distinctly (task-35e74651)#557
Conversation
…task-35e74651) GitHub rejects the outbound staging→upstream FF push when a commit in the range edits .github/workflows/ and the push token lacks the `workflow` scope. This was misclassified as `other` → `error` → touchSentinel, hiding the real cause behind a 24h throttle for a fix that takes seconds. - classifyPushFailure: new "workflow-scope" class, matched BEFORE credentials (a 403 can partially match credentials; the more specific/actionable class must win). Tolerates ASCII-quote/backtick `workflow` and the OAuth-App `create or update workflow` anchor; the bare word `workflow` is not matched. - syncUpstreamMainFromStaging step (E): new branch → outcome skipped-no-workflow-scope + staging_outbound_workflow_scope_missing event whose message names the cause + remedy (gh auth refresh -s workflow), and does NOT touch the sentinel (next tick retries, stale signal stays armed). - runStagingOutboundPushTick: log the new outcome to stderr; persist the structured `project` on outbound events (and, symmetrically, inbound) so the annotation lookup matches by field, not message-prefix parsing. - briefing-lag: stale outbound-sentinel note now appends cause + remedy read from the latest workflow-scope/credentials outbound event (new shared src/staging-event-meta.ts map + reader). Covers the symmetric credentials gap too (proposal Q2). AC4 health-check finding annotation is scoped to the briefing-lag surface; the health-check finding is inline skill bash with no src precompute to carry the data, so skills/ludics-health-check.md stays unchanged (proposal Scope lines 209-223). Noted as a documented gap, not closed this round. Tests: classifier quote/phrasing variants + 403-ordering + non-ff control; e2e positive (skipped-no-workflow-scope, event, sentinel NOT touched) + negative control (non-ff → error, sentinel touched); reader unit (ordering/project-filter/legacy-prefix/control); briefing-lag annotation (workflow/credentials/control); mag e2e project-field persistence + stderr. scope-expansion: new src/staging-event-meta.ts module (shared cause/remedy map + reader) — proposal floated events.ts; neutral module avoids the staging-ff↔briefing-lag import cycle and preserves staging-ff's events.ts decoupling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01f1f8692c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // remedy (workflow-scope / credentials), append it so the operator sees | ||
| // the copy-pasteable fix next to the stale-sentinel warning. | ||
| if (eventsFile) { | ||
| const annotation = latestOutboundCauseRemedy(eventsFile, project); |
There was a problem hiding this comment.
Only annotate with failures newer than the sentinel
When a workflow-scope/credentials failure is later fixed, the next successful outbound push or no-op touches this sentinel but leaves the old auth event in journal/events.jsonl; if the sentinel later becomes stale for an unrelated reason such as missed keepalive ticks, this lookup still appends the obsolete cause/remedy because it selects the latest matching event without checking that it occurred after the sentinel mtime. That makes the briefing direct operators to refresh credentials even though the recorded auth failure predates the last successful tick, so the reader should filter by the sentinel timestamp or a later success boundary.
Useful? React with 👍 / 👎.
…ilures newer than the sentinel (PR #557 P2) Codex P2: after a workflow-scope/credentials failure is fixed, a later successful push (or no-op) touches the sentinel but the old auth event stays in events.jsonl. If the sentinel later goes stale for an unrelated reason (missed keepalive ticks), the lookup would still append the obsolete cause/remedy — misdirecting the operator to refresh credentials for a failure that predates the last success. latestOutboundCauseRemedy gains an optional `sinceEpoch`; outboundSentinelStaleNote passes the sentinel mtime so only failures that occurred after the last successful/error tick are surfaced. Events without a usable epoch are dropped under a positive boundary (can't be proven newer). Tests: reader sinceEpoch filtering (drop-below / keep-at-or-above + newer-wins); briefing-lag obsolete-event control (event ~100h old, sentinel ~50h old → not annotated); existing positive annotation fixtures updated to recent epochs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
note: branch state has drifted since this body was written (baseline: 1 commit at 2026-06-05T13:09:18Z, current: 2 commits). consider |
|
note: branch state has drifted since this body was written (baseline: 1 commit at 2026-06-05T13:09:21Z, current: 2 commits). consider |
Suggest Refactor — task-35e74651 (PR #557)Retrospective on the workflow-scope push-rejection classification change. What I'd do differently, and follow-ups worth a separate task. 1. The annotation should have been freshness-bounded from the first commitThe Codex P2 (annotate only failures newer than the sentinel) was a real correctness bug I shipped in round 1 and fixed in round 2 ( 2.
|
From task-35e74651 retrospective (PR #557): implement the hooked surface, leave the forbidden prose surface untouched, cite the proposal's authorization for the narrower scope, file the rest as follow-up. Captured via /ludics-process-suggestions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Follow-up task created from retrospective:
This closes the deferred second surface of AC4 (the briefing-lag arm shipped in this PR; the Auto-generated by process-suggestions. |
Summary
When the outbound staging→upstream fast-forward push (
syncUpstreamMainFromStagingstep E) is rejected by GitHub because the push token lacks theworkflowOAuth/PAT scope (a commit in the FF range edits.github/workflows/), it was misclassified as the catch-all"other"→ outcomeerror, which touches the sentinel (24h throttle) and emits a genericstaging_outbound_errorevent — hiding a fix that takes seconds (gh auth refresh -s workflow).This mirrors the existing credentials "don't-throttle + surface" path:
classifyPushFailuregains a"workflow-scope"class, matched beforecredentials(a 403 can partially match credentials; the more specific/actionable class must win). Quote-tolerant matcher ('workflow'/`workflow`) + quote-agnosticcreate or update workflowanchor; the bare wordworkflowis deliberately not matched.skipped-no-workflow-scope+staging_outbound_workflow_scope_missingevent whose message names the cause + remedy (gh auth refresh -h github.com -s workflow), and does not touch the sentinel (next tick retries, stale-sentinel signal stays armed).runStagingOutboundPushTicklogs the new outcome to stderr and persists a structuredprojectfield on outbound (and symmetrically inbound) events.src/briefing-lag.tsstale outbound-sentinel note now appends the cause + remedy read from the latest matching outbound event — covering the symmetric credentials gap too (proposal Q2). Shared map + reader live in newsrc/staging-event-meta.ts. The annotation only fires for an auth failure newer than the sentinel mtime, so an obsolete (already-fixed) failure on a sentinel that went stale for an unrelated reason is not surfaced (Codex P2, commit cc2fb31).AC4 health-check surface — documented gap
The
outbound-staging-ff-stale:<project>finding is computed inline inskills/ludics-health-check.mdbash with nosrc/precompute to attach data to. Per the proposal's flagged ambiguity (Scope lines 209–223), the annotation is scoped to the briefing-lag programmatic surface and the skill markdown stays byte-unchanged. The health-check-side wiring is an accepted-risk follow-up.Scope expansion
src/staging-event-meta.ts(NEW) — the proposal floatedevents.ts; a neutral module avoids thestaging-ff ↔ briefing-lagimport cycle and preserves staging-ff'sevents.tsdecoupling.Tests
other.skipped-no-workflow-scope, event with remedy, sentinel NOT touched) + negative control (non-ff →error, sentinel touched).sinceEpochboundary (drop-below / keep-at-or-above / newer-wins), no-match → null.journal/events.jsonlwith structuredproject+ remedy; stderr logging.Full suite 2845 pass / 0 fail (baseline 2821; +24). typecheck, build, and all 17 CI lints green.