Skip to content

Outbound staging-ff: classify workflow-scope push rejection distinctly (task-35e74651)#557

Merged
lukstafi merged 2 commits into
mainfrom
ludics/task-35e74651-s2/root
Jun 5, 2026
Merged

Outbound staging-ff: classify workflow-scope push rejection distinctly (task-35e74651)#557
lukstafi merged 2 commits into
mainfrom
ludics/task-35e74651-s2/root

Conversation

@lukstafi

@lukstafi lukstafi commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Summary

When the outbound staging→upstream fast-forward push (syncUpstreamMainFromStaging step E) is rejected by GitHub because the push token lacks the workflow OAuth/PAT scope (a commit in the FF range edits .github/workflows/), it was misclassified as the catch-all "other" → outcome error, which touches the sentinel (24h throttle) and emits a generic staging_outbound_error event — hiding a fix that takes seconds (gh auth refresh -s workflow).

This mirrors the existing credentials "don't-throttle + surface" path:

  1. classifyPushFailure gains a "workflow-scope" class, matched before credentials (a 403 can partially match credentials; the more specific/actionable class must win). Quote-tolerant matcher ('workflow' / `workflow`) + quote-agnostic create or update workflow anchor; the bare word workflow is deliberately not matched.
  2. Step (E) new branch → outcome skipped-no-workflow-scope + staging_outbound_workflow_scope_missing event whose message names the cause + remedy (gh auth refresh -h github.com -s workflow), and does not touch the sentinel (next tick retries, stale-sentinel signal stays armed).
  3. runStagingOutboundPushTick logs the new outcome to stderr and persists a structured project field on outbound (and symmetrically inbound) events.
  4. src/briefing-lag.ts stale outbound-sentinel note now appends the cause + remedy read from the latest matching outbound event — covering the symmetric credentials gap too (proposal Q2). Shared map + reader live in new src/staging-event-meta.ts. The annotation only fires for an auth failure newer than the sentinel mtime, so an obsolete (already-fixed) failure on a sentinel that went stale for an unrelated reason is not surfaced (Codex P2, commit cc2fb31).

AC4 health-check surface — documented gap

The outbound-staging-ff-stale:<project> finding is computed inline in skills/ludics-health-check.md bash with no src/ precompute to attach data to. Per the proposal's flagged ambiguity (Scope lines 209–223), the annotation is scoped to the briefing-lag programmatic surface and the skill markdown stays byte-unchanged. The health-check-side wiring is an accepted-risk follow-up.

Scope expansion

src/staging-event-meta.ts (NEW) — the proposal floated events.ts; a neutral module avoids the staging-ff ↔ briefing-lag import cycle and preserves staging-ff's events.ts decoupling.

Tests

  • Classifier: OAuth-App / ASCII-quote / backtick variants, 403-ordering (workflow-scope wins), non-ff control stays other.
  • E2E: positive (skipped-no-workflow-scope, event with remedy, sentinel NOT touched) + negative control (non-ff → error, sentinel touched).
  • Reader: ordering (newest epoch), project filter, legacy message-prefix fallback, sinceEpoch boundary (drop-below / keep-at-or-above / newer-wins), no-match → null.
  • Annotation: workflow + credentials → cause/remedy; dual control (no event / no eventsFile) → unannotated; obsolete-event-predating-sentinel control → unannotated.
  • Mag: workflow-scope event reaches journal/events.jsonl with structured project + remedy; stderr logging.

Full suite 2845 pass / 0 fail (baseline 2821; +24). typecheck, build, and all 17 CI lints green.

…task-35e74651)

GitHub rejects the outbound staging→upstream FF push when a commit in the
range edits .github/workflows/ and the push token lacks the `workflow`
scope. This was misclassified as `other` → `error` → touchSentinel, hiding
the real cause behind a 24h throttle for a fix that takes seconds.

- classifyPushFailure: new "workflow-scope" class, matched BEFORE credentials
  (a 403 can partially match credentials; the more specific/actionable class
  must win). Tolerates ASCII-quote/backtick `workflow` and the OAuth-App
  `create or update workflow` anchor; the bare word `workflow` is not matched.
- syncUpstreamMainFromStaging step (E): new branch → outcome
  skipped-no-workflow-scope + staging_outbound_workflow_scope_missing event
  whose message names the cause + remedy (gh auth refresh -s workflow), and
  does NOT touch the sentinel (next tick retries, stale signal stays armed).
- runStagingOutboundPushTick: log the new outcome to stderr; persist the
  structured `project` on outbound events (and, symmetrically, inbound) so
  the annotation lookup matches by field, not message-prefix parsing.
- briefing-lag: stale outbound-sentinel note now appends cause + remedy read
  from the latest workflow-scope/credentials outbound event (new shared
  src/staging-event-meta.ts map + reader). Covers the symmetric credentials
  gap too (proposal Q2).

AC4 health-check finding annotation is scoped to the briefing-lag surface;
the health-check finding is inline skill bash with no src precompute to carry
the data, so skills/ludics-health-check.md stays unchanged (proposal Scope
lines 209-223). Noted as a documented gap, not closed this round.

Tests: classifier quote/phrasing variants + 403-ordering + non-ff control;
e2e positive (skipped-no-workflow-scope, event, sentinel NOT touched) +
negative control (non-ff → error, sentinel touched); reader unit
(ordering/project-filter/legacy-prefix/control); briefing-lag annotation
(workflow/credentials/control); mag e2e project-field persistence + stderr.

scope-expansion: new src/staging-event-meta.ts module (shared cause/remedy
map + reader) — proposal floated events.ts; neutral module avoids the
staging-ff↔briefing-lag import cycle and preserves staging-ff's events.ts
decoupling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01f1f8692c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/briefing-lag.ts Outdated
// remedy (workflow-scope / credentials), append it so the operator sees
// the copy-pasteable fix next to the stale-sentinel warning.
if (eventsFile) {
const annotation = latestOutboundCauseRemedy(eventsFile, project);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Only annotate with failures newer than the sentinel

When a workflow-scope/credentials failure is later fixed, the next successful outbound push or no-op touches this sentinel but leaves the old auth event in journal/events.jsonl; if the sentinel later becomes stale for an unrelated reason such as missed keepalive ticks, this lookup still appends the obsolete cause/remedy because it selects the latest matching event without checking that it occurred after the sentinel mtime. That makes the briefing direct operators to refresh credentials even though the recorded auth failure predates the last successful tick, so the reader should filter by the sentinel timestamp or a later success boundary.

Useful? React with 👍 / 👎.

…ilures newer than the sentinel (PR #557 P2)

Codex P2: after a workflow-scope/credentials failure is fixed, a later
successful push (or no-op) touches the sentinel but the old auth event stays
in events.jsonl. If the sentinel later goes stale for an unrelated reason
(missed keepalive ticks), the lookup would still append the obsolete
cause/remedy — misdirecting the operator to refresh credentials for a failure
that predates the last success.

latestOutboundCauseRemedy gains an optional `sinceEpoch`; outboundSentinelStaleNote
passes the sentinel mtime so only failures that occurred after the last
successful/error tick are surfaced. Events without a usable epoch are dropped
under a positive boundary (can't be proven newer).

Tests: reader sinceEpoch filtering (drop-below / keep-at-or-above + newer-wins);
briefing-lag obsolete-event control (event ~100h old, sentinel ~50h old → not
annotated); existing positive annotation fixtures updated to recent epochs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lukstafi

lukstafi commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

note: branch state has drifted since this body was written (baseline: 1 commit at 2026-06-05T13:09:18Z, current: 2 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/557 to refresh.

@lukstafi

lukstafi commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

note: branch state has drifted since this body was written (baseline: 1 commit at 2026-06-05T13:09:21Z, current: 2 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/557 to refresh.

@lukstafi lukstafi merged commit 70ad964 into main Jun 5, 2026
1 check passed
@lukstafi lukstafi deleted the ludics/task-35e74651-s2/root branch June 5, 2026 13:18
@lukstafi

lukstafi commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

Suggest Refactor — task-35e74651 (PR #557)

Retrospective on the workflow-scope push-rejection classification change. What I'd do differently, and follow-ups worth a separate task.

1. The annotation should have been freshness-bounded from the first commit

The Codex P2 (annotate only failures newer than the sentinel) was a real correctness bug I shipped in round 1 and fixed in round 2 (cc2fb31). The tell was there in the plan: the briefing-lag note keys off sentinel mtime while the annotation keys off latest event — two independently-aged clocks that I wired together without a relation between them. Next time: when joining two time-ordered sources into one surface, state the ordering invariant between them before writing code. Here the invariant is "the failure must postdate the last successful/error tick (the sentinel touch)"; once named, sinceEpoch is obvious. The original positive tests used epoch: 100 (1970), which passed precisely because they never exercised the boundary — a fixture that can't distinguish the bug from the fix. Boundary-relevant fixtures (event-vs-sentinel relative age) would have caught it pre-review.

2. classifyPushFailure is becoming an ordered-regex ladder — extract a table

The classifier is now four ordered checks where order is load-bearing (workflow-scope must precede credentials must precede network). That ordering constraint is encoded only as statement sequence + comments. It works, but each new class re-litigates "where in the ladder does this go, and what does it shadow?" A small ordered table — [{class, test}] evaluated top-to-bottom — would make the precedence explicit and testable as data (e.g. assert the workflow entry sits above the credentials entry). Not worth doing for 4 entries today; worth it at ~6. Flagging as a watch-item, not a now-fix.

3. The two-surface annotation gap (AC4) deserves a real follow-up task

I scoped the cause/remedy annotation to the briefing-lag surface and left the outbound-staging-ff-stale:<project> health-check finding unannotated, because that finding is computed inline in skill bash with no src/ precompute to attach data to. The proposal authorized this, but the underlying asymmetry is a smell: two surfaces ("briefing-lag" and "health-check") read the same sentinel and should render the same diagnostic, yet only one has a programmatic seam. The durable fix is to give the health-check finding a src/ precompute (like the briefing-lag note has) so both consume latestOutboundCauseRemedy. That's a genuine architecture improvement beyond this task's scope — should be its own task, not silently carried. (I recorded the gap in the AC self-check and PR body; converting it to a tracked task would be better than relying on prose.)

4. Structured-project-on-events is a latent broader cleanup

I added project: ev.project to the staging event adapters because the annotation lookup needed to match by field instead of parsing ${project}: from the message. But many project-scoped events across src/mag.ts still encode project only in the message string (the existing AC4 divergence test even comments on this). A repo-wide "promote project to a structured event field" pass would let any future per-project event query stop string-parsing. Out of scope here; a good standalone hygiene task.

5. Module placement worked, but the import-cycle constraint should be documented

src/staging-ff.ts imports parseLeftRightCount from src/briefing-lag.ts, which silently forecloses putting shared helpers in briefing-lag (cycle) and pushed the new staging-event-meta.ts into existence. That cycle constraint is invisible until you trip it. A one-line comment at the staging-ff → briefing-lag import noting "staging-ff depends on briefing-lag; shared code goes in a neutral module, not briefing-lag" would save the next person the discovery cost.

What went well (keep doing)

  • Mirroring the existing credentials path made the diff small and the review predictable; the e2e tests reused the credentials outboundFakeGit template verbatim.
  • The mutation-oriented tests (403-ordering, sentinel-NOT-touched, project-field persistence, obsolete-event control) each pin a specific invariant and fail under a specific mutation — that paid off when the reviewer probed the exact freshness edge.

lukstafi added a commit that referenced this pull request Jun 5, 2026
From task-35e74651 retrospective (PR #557): implement the hooked surface,
leave the forbidden prose surface untouched, cite the proposal's authorization
for the narrower scope, file the rest as follow-up. Captured via
/ludics-process-suggestions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lukstafi

lukstafi commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

Follow-up task created from retrospective:

  • task-c4aedd6b: Close AC4 health-check arm — surface outbound-staging-ff cause+remedy in the stale-sentinel health-check finding (needs-confirmation)

This closes the deferred second surface of AC4 (the briefing-lag arm shipped in this PR; the outbound-staging-ff-stale:<project> health-check finding needs a programmatic seam to carry the same annotation). Reuses latestOutboundCauseRemedy from src/staging-event-meta.ts.

Auto-generated by process-suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant