Harness should fail fast on authentication errors instead of retrying

## What happened

On [PR #2052](https://github.com/fullsend-ai/fullsend/pull/2052), the review agent was dispatched ([run 27188931660](https://github.com/fullsend-ai/.fullsend/actions/runs/27188931660)). Claude Code CLI exited immediately (81ms) with `authentication_failed` and `apiKeySource: "none"` — zero tokens consumed. The harness treated this as a normal agent failure and retried (iteration 2), which failed identically (58ms, same auth error). After 2 failed iterations, the workflow exited with `validation failed after 2 iteration(s)`. Total wall time wasted on sandbox setup and teardown across both iterations was ~2 minutes, plus the cognitive cost of a human seeing a 'Failure' status on the PR.

## What could go better

Authentication failures are infrastructure issues (credential misconfiguration, OIDC token not forwarded into sandbox, Vertex AI setup problems). They will never self-resolve by retrying within the same workflow run. The harness currently treats all agent failures uniformly and retries up to the configured iteration limit. This wastes runner time and produces misleading failure statuses. I am confident this is a systemic issue — any transient credential delivery failure would trigger the same pointless retry. Issue #1032 proposes a pre-flight auth check which would help catch some cases, but even with pre-flight checks, auth can fail at runtime (e.g., token expiry mid-run per #750). A complementary post-failure classification is needed.

## Proposed change

In the harness retry loop (the iteration logic that re-runs the agent on validation failure), add exit-reason classification before retrying. When the agent's JSONL transcript or exit output contains `authentication_failed` or `apiKeySource: "none"`, classify it as a non-retryable infrastructure failure and exit immediately with a clear error message (e.g., `Error: agent authentication failed — not retrying. Check credential configuration.`). This should be implemented in the harness runner code that handles iteration/retry decisions. The classification could be extended over time to cover other non-retryable failures (e.g., sandbox creation failures, network policy blocks).

## Validation criteria

On the next occurrence of an authentication failure in any agent run, the harness should: (1) not retry the agent, (2) exit within seconds of the first failure, and (3) include 'authentication' or 'credential' in the error message posted to the PR. Verify by searching workflow logs for runs where both iterations failed with identical auth errors — these should no longer occur.

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/2052_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harness should fail fast on authentication errors instead of retrying #2058

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Harness should fail fast on authentication errors instead of retrying #2058

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions