Skip to content

relayfile: long-lived operator session tokens (1-hour TTL too short for unattended mount + ad-hoc CLI writebacks) #284

@khaliqgant

Description

@khaliqgant

Problem

The relayfile delegated-token mint flow (POST /api/v1/workspaces/<id>/relayfile/delegated-token/v1/tokens/workspace-path) issues credentials with a 1-hour expiresIn (MAX_RELAYFILE_DELEGATED_TOKEN_TTL_SECONDS = 60 * 60 in cloud/packages/web/lib/relayfile-delegated-token-contract.ts:29). Every operator-facing flow that uses these credentials hits friction:

  • The mount daemon's stored creds.json is exactly that 1-hour token. Verified 2026-06-15 by decoding the JWT in ~/.agentworkforce/pear/relayfile/workspaces/.../linear/issues/.relay/creds.jsoniat: 10:56 UTC, exp: 11:56 UTC. If the daemon isn't running continuously through its sync cycle, the token expires before any one-shot CLI invocation (relayfile-mount -flush-outbox-once, relayfile pull) can use it → 401 wedge.
  • relayfile writeback push auto-refreshes per-invocation, but the underlying mint is the same short TTL. So each push round-trip pays a fresh mint cost (network + RelayAuth API call + cloud-route handling). For an operator iterating on factory-create writebacks this is noticeable.
  • One-shot diagnostic / repair commands (relayfile workspace status, relayfile writeback list, skip-stuck) all need fresh credentials. If the operator runs the command 90 seconds after their last daemon cycle, they hit 401 and have to re-mint manually.

Observed concretely: the operator's local mount on rw_7ccfea89 had stopped advancing because the daemon exited (separate stale-cursor bug), and every CLI attempt to manually flush the outbox 401'd until a fresh --write re-join. The 1-hour TTL means the only way to keep the workspace usable is to leave the daemon running forever; any interruption forces a re-join.

This is friction by design (short-lived credentials are a real security property), but the design assumes the daemon model. For operator-side (CLI) usage — where the human or agent runs ad-hoc commands and may not have a daemon running — the TTL is too tight.

Goal

Operator-side CLI flows can hold a credential that survives ordinary day-to-day usage gaps (idle terminal, laptop sleep, ~hour-long meetings) without forcing re-joins. The daemon model continues to work as today (the daemon already refreshes — this issue doesn't change that). The bar is "an operator who joined a workspace at 9am and ran a relayfile writeback push at 11am should not have to re-join in between," without weakening the security model meaningfully.

Design

Two complementary mechanisms — pick at least the first, both ideally:

Part 1: Refresh-token-backed long-lived session

The current mint already returns both an access token AND a refresh token (TOKEN_PAIR shape in cloud/packages/web/app/api/v1/workspaces/[workspaceId]/relayfile/delegated-token/route.test.ts:41-48accessToken, refreshToken, accessTokenExpiresAt, refreshTokenExpiresAt). Currently the relayfile CLI / daemon don't use the refresh token — they re-mint via the cloud delegated-token route each time.

  • Wire the relayfile CLI's credential loader (loadDelegatedCredentials in cmd/relayfile-cli/main.go) to detect expiry, attempt a refresh-token grant against RelayAuth's standard refresh endpoint (POST /v1/tokens/refresh), and only fall back to the full delegated-token re-mint if the refresh fails.
  • The refresh token already has a 24-hour TTL (DEFAULT_RELAYFILE_DELEGATED_TOKEN_DELEGATION_TTL_SECONDS = 24 * 60 * 60). Using it gets the operator from "1-hour fragile" to "24-hour session" with zero new infra.

Part 2: Operator-session credential class with longer TTL

For ad-hoc CLI usage specifically (not for daemon-issued tokens), extend the contract to allow a tokenClass: "operator-session" request with a longer ceiling (e.g. 8 hours — workday).

  • Add a MAX_RELAYFILE_OPERATOR_SESSION_TTL_SECONDS = 8 * 60 * 60 constant.
  • Gate this longer TTL on the request being explicitly authenticated via the cloud session (auth.source === "session") — NOT on the cli:auth-scoped tokens used by automation. Operator session = human in the loop.
  • The CLI's relayfile workspace join --write opts into the operator-session class by default when run from an interactive terminal (process.stdin.isTTY); non-TTY invocations keep the 1-hour default.

Part 3: Visibility

  • relayfile workspace status (added in AR-272 / v0.8.30) should display the current credential's TTL remaining + the source (operator-session, daemon-refresh, ad-hoc). Lets the operator see at a glance whether they're about to get 401'd.
  • A clear relayfile workspace refresh command that explicitly re-mints without re-joining (today the only way to refresh is to re-join, which is a heavier operation that resets some local state).

End-to-end verification — required during development

  1. Operator re-joins with --write. CLI prints the new TTL (8h if Part 2 lands, 1h otherwise).
  2. Wait 90 minutes. Run relayfile writeback push <factory-create-file>. Confirm it succeeds with no 401 + no re-join — the refresh-token loop is invisible to the operator.
  3. Wait 25 hours (or fast-forward by clock manipulation in a test environment). Run another push. Confirm a single re-join is required (refresh token has expired by then; that's correct behavior).
  4. With the daemon running in the background, run a one-shot relayfile pull from a fresh shell. Confirm it picks up the daemon's currently-valid creds rather than minting new ones (avoid double-mint per invocation).
  5. relayfile workspace status shows remaining TTL accurately.

Acceptance criteria

  1. CLI uses the refresh token before re-minting. Visible in the cloud route's request count: an 8-hour operator session should hit /delegated-token once and /v1/tokens/refresh up to 7-8 times instead of /delegated-token 8 times.
  2. Operator-session class (tokenClass: "operator-session") supported in the contract + gated on session auth + caps at 8h.
  3. relayfile workspace status shows credential source + TTL.
  4. relayfile workspace refresh exists.
  5. E2E demonstration in the PR: a >2-hour gap between two writeback pushes with no manual re-join.
  6. The daemon model continues to work end-to-end (no regression in the long-running mount path).

Out of scope

  • Changing the cloud MAX_RELAYFILE_DELEGATED_TOKEN_TTL_SECONDS for the existing non-operator class. The 1-hour ceiling is fine for daemon-issued / machine-to-machine tokens; the daemon refreshes anyway.
  • Replacing the delegated-token mint chain. This issue improves credential lifecycle on the client side; the mint chain stays as-is.
  • The mount cursor wedge / by-state stale event bug (separate issue) — orthogonal to credential TTL.

Related

  • cloud/packages/web/lib/relayfile-delegated-token-contract.ts:29-31 — TTL constants.
  • cloud/packages/web/app/api/v1/workspaces/[workspaceId]/relayfile/delegated-token/route.ts — mint route.
  • relayfile/cmd/relayfile-cli/main.go — credential loader + refresh logic.
  • ~/.agentworkforce/pear/relayfile/workspaces/.../linear/issues/.relay/creds.json — operator-visible artifact of the bug.
  • AR-272 (relayfile CLI direct-writeback / writeback push) — the work this issue makes ergonomic for operators.
  • AR-275 / AR-276 (Linear project create/sync) — same operator-flow class that benefits from longer sessions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions