relayfile: long-lived operator session tokens (1-hour TTL too short for unattended mount + ad-hoc CLI writebacks)


## Problem

The relayfile delegated-token mint flow (`POST /api/v1/workspaces/<id>/relayfile/delegated-token` → `/v1/tokens/workspace-path`) issues credentials with a **1-hour `expiresIn`** (`MAX_RELAYFILE_DELEGATED_TOKEN_TTL_SECONDS = 60 * 60` in `cloud/packages/web/lib/relayfile-delegated-token-contract.ts:29`). Every operator-facing flow that uses these credentials hits friction:

- **The mount daemon's stored creds.json** is exactly that 1-hour token. Verified 2026-06-15 by decoding the JWT in `~/.agentworkforce/pear/relayfile/workspaces/.../linear/issues/.relay/creds.json` — `iat: 10:56 UTC, exp: 11:56 UTC`. If the daemon isn't running continuously through its sync cycle, the token expires before any one-shot CLI invocation (`relayfile-mount -flush-outbox-once`, `relayfile pull`) can use it → 401 wedge.
- **`relayfile writeback push`** auto-refreshes per-invocation, but the underlying mint is the same short TTL. So each push round-trip pays a fresh mint cost (network + RelayAuth API call + cloud-route handling). For an operator iterating on factory-create writebacks this is noticeable.
- **One-shot diagnostic / repair commands** (`relayfile workspace status`, `relayfile writeback list`, `skip-stuck`) all need fresh credentials. If the operator runs the command 90 seconds after their last daemon cycle, they hit 401 and have to re-mint manually.

Observed concretely: the operator's local mount on `rw_7ccfea89` had stopped advancing because the daemon exited (separate stale-cursor bug), and every CLI attempt to manually flush the outbox 401'd until a fresh `--write` re-join. The 1-hour TTL means the only way to keep the workspace usable is to leave the daemon running forever; any interruption forces a re-join.

This is friction by design (short-lived credentials are a real security property), but the design assumes the daemon model. For operator-side (CLI) usage — where the human or agent runs ad-hoc commands and may not have a daemon running — the TTL is too tight.

## Goal

Operator-side CLI flows can hold a credential that survives ordinary day-to-day usage gaps (idle terminal, laptop sleep, ~hour-long meetings) without forcing re-joins. The daemon model continues to work as today (the daemon already refreshes — this issue doesn't change that). The bar is "an operator who joined a workspace at 9am and ran a `relayfile writeback push` at 11am should not have to re-join in between," without weakening the security model meaningfully.

## Design

Two complementary mechanisms — pick at least the first, both ideally:

### Part 1: Refresh-token-backed long-lived session

The current mint already returns both an access token AND a refresh token (`TOKEN_PAIR` shape in `cloud/packages/web/app/api/v1/workspaces/[workspaceId]/relayfile/delegated-token/route.test.ts:41-48` — `accessToken`, `refreshToken`, `accessTokenExpiresAt`, `refreshTokenExpiresAt`). Currently the relayfile CLI / daemon don't use the refresh token — they re-mint via the cloud delegated-token route each time.

- Wire the relayfile CLI's credential loader (`loadDelegatedCredentials` in `cmd/relayfile-cli/main.go`) to detect expiry, attempt a refresh-token grant against RelayAuth's standard refresh endpoint (`POST /v1/tokens/refresh`), and only fall back to the full delegated-token re-mint if the refresh fails.
- The refresh token already has a 24-hour TTL (`DEFAULT_RELAYFILE_DELEGATED_TOKEN_DELEGATION_TTL_SECONDS = 24 * 60 * 60`). Using it gets the operator from "1-hour fragile" to "24-hour session" with zero new infra.

### Part 2: Operator-session credential class with longer TTL

For ad-hoc CLI usage specifically (not for daemon-issued tokens), extend the contract to allow a `tokenClass: "operator-session"` request with a longer ceiling (e.g. 8 hours — workday).

- Add a `MAX_RELAYFILE_OPERATOR_SESSION_TTL_SECONDS = 8 * 60 * 60` constant.
- Gate this longer TTL on the request being explicitly authenticated via the cloud session (`auth.source === "session"`) — NOT on the `cli:auth`-scoped tokens used by automation. Operator session = human in the loop.
- The CLI's `relayfile workspace join --write` opts into the operator-session class by default when run from an interactive terminal (`process.stdin.isTTY`); non-TTY invocations keep the 1-hour default.

### Part 3: Visibility

- `relayfile workspace status` (added in AR-272 / v0.8.30) should display the current credential's TTL remaining + the source (operator-session, daemon-refresh, ad-hoc). Lets the operator see at a glance whether they're about to get 401'd.
- A clear `relayfile workspace refresh` command that explicitly re-mints without re-joining (today the only way to refresh is to re-join, which is a heavier operation that resets some local state).

## End-to-end verification — required during development

1. Operator re-joins with `--write`. CLI prints the new TTL (8h if Part 2 lands, 1h otherwise).
2. Wait 90 minutes. Run `relayfile writeback push <factory-create-file>`. **Confirm it succeeds with no 401 + no re-join** — the refresh-token loop is invisible to the operator.
3. Wait 25 hours (or fast-forward by clock manipulation in a test environment). Run another push. Confirm a single re-join is required (refresh token has expired by then; that's correct behavior).
4. With the daemon running in the background, run a one-shot `relayfile pull` from a fresh shell. Confirm it picks up the daemon's currently-valid creds rather than minting new ones (avoid double-mint per invocation).
5. `relayfile workspace status` shows remaining TTL accurately.

## Acceptance criteria

1. CLI uses the refresh token before re-minting. Visible in the cloud route's request count: an 8-hour operator session should hit `/delegated-token` once and `/v1/tokens/refresh` up to 7-8 times instead of `/delegated-token` 8 times.
2. Operator-session class (`tokenClass: "operator-session"`) supported in the contract + gated on session auth + caps at 8h.
3. `relayfile workspace status` shows credential source + TTL.
4. `relayfile workspace refresh` exists.
5. E2E demonstration in the PR: a >2-hour gap between two writeback pushes with no manual re-join.
6. The daemon model continues to work end-to-end (no regression in the long-running mount path).

## Out of scope

- Changing the cloud `MAX_RELAYFILE_DELEGATED_TOKEN_TTL_SECONDS` for the existing non-operator class. The 1-hour ceiling is fine for daemon-issued / machine-to-machine tokens; the daemon refreshes anyway.
- Replacing the delegated-token mint chain. This issue improves credential lifecycle on the client side; the mint chain stays as-is.
- The mount cursor wedge / by-state stale event bug (separate issue) — orthogonal to credential TTL.

## Related

- `cloud/packages/web/lib/relayfile-delegated-token-contract.ts:29-31` — TTL constants.
- `cloud/packages/web/app/api/v1/workspaces/[workspaceId]/relayfile/delegated-token/route.ts` — mint route.
- `relayfile/cmd/relayfile-cli/main.go` — credential loader + refresh logic.
- `~/.agentworkforce/pear/relayfile/workspaces/.../linear/issues/.relay/creds.json` — operator-visible artifact of the bug.
- AR-272 (relayfile CLI direct-writeback / `writeback push`) — the work this issue makes ergonomic for operators.
- AR-275 / AR-276 (Linear project create/sync) — same operator-flow class that benefits from longer sessions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relayfile: long-lived operator session tokens (1-hour TTL too short for unattended mount + ad-hoc CLI writebacks) #284

Problem

Goal

Design

Part 1: Refresh-token-backed long-lived session

Part 2: Operator-session credential class with longer TTL

Part 3: Visibility

End-to-end verification — required during development

Acceptance criteria

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

relayfile: long-lived operator session tokens (1-hour TTL too short for unattended mount + ad-hoc CLI writebacks) #284

Description

Problem

Goal

Design

Part 1: Refresh-token-backed long-lived session

Part 2: Operator-session credential class with longer TTL

Part 3: Visibility

End-to-end verification — required during development

Acceptance criteria

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions