Skip to content

fix(oauth): only re-register on invalid_client, not all 4xx#168

Draft
badjer wants to merge 1 commit intomainfrom
fix/oauth-dcr-only-on-invalid-client
Draft

fix(oauth): only re-register on invalid_client, not all 4xx#168
badjer wants to merge 1 commit intomainfrom
fix/oauth-dcr-only-on-invalid-client

Conversation

@badjer
Copy link
Copy Markdown
Contributor

@badjer badjer commented May 7, 2026

Summary

Stops the SDK from treating every 401/403 from the auth server as "my client credentials are bad" and re-registering. Re-registration is now scoped to the one error code that actually means client creds are bad: `invalid_client` (per RFC 6749 §5.2 / RFC 6750).

This is the dominant driver of the ~1.2M /register/day we see in the `auth` dataset (oauth4webapi user-agent). Each false-positive register rotates the shared `client_secret`, which then makes other in-flight users' tokens fail introspection — triggering more registers. Fix preserves auto-DCR on first use and self-healing on genuine credential failure; only the false-positive path is removed.

Changes

  • `packages/atxp-common/src/oAuthResource.ts`: new `isInvalidClientError(response)` helper. `introspectToken`'s 401/403 retry now only fires when the body or WWW-Authenticate header explicitly identifies the failure as `error=invalid_client`. All other 4xx propagates to the caller unchanged.
  • `packages/atxp-client/src/oAuth.ts`: same gating applied to `exchangeCodeForToken`'s 401/403 retry path.
  • `packages/atxp-common/src/index.ts`: exports `isInvalidClientError`.

Verification

Live auth.atxp.ai contract (curl probes, no DB writes):

  • `POST /introspect` with no auth → 401 `{error: "invalid_client", error_description: "Client authentication required ..."}`
  • `POST /introspect` with bogus Basic auth → 401 `{error: "invalid_client", error_description: "Invalid client credentials"}`
  • `POST /token` with bogus Basic auth → 401 `{error: "invalid_client", error_description: "Invalid client: client is invalid"}`

The AS already returns the right error format. The fix relies on it.

Unit tests (`atxp-common` 113 pass, `atxp-client` 210 pass, `atxp-server` 190 pass):

  • 4 new mocked-fetch tests in `oAuthResource.introspectRetry.test.ts` covering invalid_token (no retry), empty body (no retry), invalid_client (retry), happy path (no retry).
  • 2 new false-positive guards in `oauth.test.ts` for the code-exchange path: invalid_grant returns no /register; empty 401 body returns no /register.
  • Existing "should re-register on code exchange failure" updated to use `{error: invalid_client}` body (was empty body) — matches the live AS contract.

Real-HTTP integration tests (`oAuthResource.introspectRetry.integration.test.ts`):
A local Express AS exercises the full oauth4webapi → fetch → real HTTP path. Same 4 scenarios pass end-to-end.

Shared-client (multi-user atxp-pics-style) scenarios in the same integration file:

  1. Cold start: 10 concurrent first-time users with one shared OAuthDb produce a bounded /register burst (≤10), then 50 sustained requests produce zero additional /register calls.
  2. Steady-state with intermittent 401 `invalid_token`: 50 requests where 10 deliberately return 401 invalid_token. Pre-fix this would have produced ~10 /register calls; post-fix: zero.

The second scenario is the one that maps to the prod volume — that exact pattern (occasional 401 invalid_token amplified into mass DCR via the false-positive path) explains the observed 1.2M/day.

Out of scope (follow-up)

  • Cross-instance single-flight. `registrationLocks` lives on each `OAuthResourceClient` instance, not on the OAuthDb. In multi-user servers (atxp-pics et al), N concurrent first-time requests can each fire a /register before any of them save. The cold-start test above documents this — it asserts `registerCalls ≤ 10`, not `= 1`. Lifting the lock to the OAuthDb (Map for in-memory, SETNX for Redis) is a separate PR.
  • Persistent OAuthDb in atxp-pics / atxp-video / atxp-music. Those repos currently use `MemoryOAuthDb` so each restart re-registers. Smaller secondary issue; separate PRs.

Test plan

  • Unit + integration tests pass: `cd packages/atxp-common && npx vitest run` (113/113); `cd packages/atxp-client && npx vitest run` (210/210); `cd packages/atxp-server && npx vitest run` (190/190).
  • Lint clean: `npm run lint` in atxp-common and atxp-client.
  • Live auth.atxp.ai response format verified via curl (above).
  • Deploy to staging; confirm /register volume on the auth dataset drops materially (expect >90% reduction once SDK roll-out reaches MCP servers).
  • Watch for any unexpected auth failures surfaced to end users (the fix surfaces `invalid_grant` etc. instead of swallowing them in a re-register attempt — this is correct behavior, but worth monitoring).

The introspection and code-exchange retry paths were treating every
401/403 as "client credentials are bad" and re-registering. But 401/403
has many other causes (invalid_token, invalid_grant, AS hiccups, empty
bodies). Each false-positive register rotates the shared client_secret
and amplifies failure across users in multi-user deployments
(atxp-pics, atxp-video, atxp-music — see auth dataset showing ~1.2M
/register/day with oauth4webapi user-agent).

Fix: introduce isInvalidClientError(response) that parses the response
body and WWW-Authenticate header per RFC 6749 §5.2 and RFC 6750.
Re-register only when the failure is explicitly identified as
`error=invalid_client`. Other 4xx errors propagate to the caller as
before. Auto-DCR on first use and self-healing on genuine credential
failure are both preserved.

Tests: 4 unit tests + 4 real-HTTP integration tests + 2 shared-client
multi-user scenarios verifying that 50 requests with 10 false-positive
401s produce zero /register calls (pre-fix would have produced ~10).
2 new false-positive guard tests on the client-side token-exchange
path. Existing "should re-register if code exchange fails" test
updated to use proper invalid_client error body (matching the live
auth.atxp.ai contract, verified via curl probes).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant