feat(auth): remove navigator.locks-based mutex; introduce commit guard + dispose()#2392
Conversation
0b30f2d to
36261d5
Compare
36261d5 to
026e1ee
Compare
@supabase/auth-js
@supabase/functions-js
@supabase/postgrest-js
@supabase/realtime-js
@supabase/storage-js
@supabase/supabase-js
commit: |
…guard + dispose() Removes the `_acquireLock` mutex that wrapped every auth operation and the underlying `navigator.locks` / `processLock` machinery. Replaces it with two lighter primitives that target the specific synchronization needs each operation actually has: - `refreshingDeferred` (already existed) continues to single-flight the refresh path within an instance. - A storage-level commit guard in `_callRefreshToken` re-reads the storage refresh_token between the rotated-tokens response and `_saveSession`. If storage changed under us (e.g. a concurrent `signOut` ran `_removeSession`), the rotated tokens are discarded rather than written back. Returns `AuthRefreshDiscardedError` on the result. Cross-tab refresh races are handled server-side by GoTrue's v1 parent-of-active mechanism at `internal/tokens/service.go:376-385`, so no client-side coordination is needed. New `client.dispose()` tears down the auto-refresh interval, the `visibilitychange` listener, and the BroadcastChannel; clears registered `onAuthStateChange` subscribers. Idempotent. Call from cleanup hooks in React Strict Mode / HMR contexts to prevent stale tickers from outliving the client. The `lock` and `lockAcquireTimeout` constructor options are silently ignored for backwards compatibility; both are marked `@deprecated`. `navigatorLock`, `processLock`, and the `LockAcquireTimeoutError` family remain exported from `./lib/locks` for one major version. Stale lock references in JSDoc on `getSession`, `onAuthStateChange`, `_challengeAndVerify`, and `_listFactors` updated to match the new model. Test branch only, not for merge yet. See RFC `lockless_auth_coordination`. Resolves (test target): supabase#2013, supabase#936, supabase#2111
…ispose() _useSession: explain that concurrent callers can both reach `__loadSession` because storage reads are idempotent and the only write path (refresh) is single-flighted downstream by `refreshingDeferred` in `_callRefreshToken`. No serialization is needed at this layer. dispose(): add a lifecycle caveat clarifying that in-flight refreshes are not aborted, so a disposed instance can still persist a rotated session to storage after `dispose()` returns. A subsequent `createClient` against the same `storageKey` will pick that session up. Notes the mitigation (await pending ops before dispose, or use a fresh `storageKey`). Doc-only changes; no runtime behaviour change.
The two tests at `SupabaseAuthClient.test.ts:61-77` were asserting that `lockAcquireTimeout` is stored as a runtime field on the auth client (`expect((authClient as any).lockAcquireTimeout).toBe(30_000)`). That field no longer exists after the lockless refactor — the option is accepted by the type for backwards compatibility but is silently ignored at runtime because the client doesn't acquire a lock around auth operations. Rewrite both tests to verify the new contract: - `_initSupabaseAuthClient` accepts the option without throwing. - `createClient` accepts it through `auth.lockAcquireTimeout`, the auth client is still constructed, but the value is not stored as a runtime field (`toBeUndefined()`). Fixes the CI failure in `Supabase-JS Integration CI / Unit + Type Check` (2 failed, 102 passed → now 104 passed) on all three platforms.
265c2a9 to
3ca0099
Compare
|
Do not merge until we ask users to test, and dogfood. |
|
@mandarini Thank you! PR #2392 looks like it should fix the stuff that actually hung the UI for us — mainly the lock deadlocks (#2344-style) and iOS tabs where A few things we'd still like to see addressed (or at least documented) before we'd feel comfortable dropping our own workarounds:
None of this is meant to block the lockless refactor - we think that's the right direction. These are the gaps we'd still be covering in the client. Happy to share more detail from our logs if useful once we have them. :) |
|
@thomaslarsson Thanks for the detailed writeup. 1. Refresh storm / no backoff after fatal This is a real gap and a good catch. The good news is the server already returns a well-defined set of fatal codes, all HTTP 400 with a structured
And The auto-refresh ticker doesn't key off any of these today; it just retries on the next tick. We'll file a follow-up to add fatal-code recognition + backoff in auth-js. Your "hundreds of 2. Cookie scope on sign-out (
We're looking at landing this in 3. Multiple clients per tab Fair point. A runtime warning on a second 4. Observability Two halves:
I am going to release this PR tomorrow as part of our next minor release. Let me know how it works for you if you test it ou! Again, thank you so much for all the details! |
|
Wonderful, thanks for this |
This PR updates @supabase/*-js libraries to version 2.107.0. **Source**: manual **Changes**: - Updated @supabase/supabase-js to 2.107.0 - Updated @supabase/auth-js to 2.107.0 - Updated @supabase/realtime-js to 2.107.0 - Updated @supabase/postgest-js to 2.107.0 - Refreshed pnpm-lock.yaml --- ## Release Notes ## v2.107.0 ## 2.107.0 (2026-06-02) ### 🚀 Features - **auth:** remove navigator.locks-based mutex; introduce commit guard + dispose() ([#2392](supabase/supabase-js#2392)) - **realtime:** allow httpSend to send binary payload ([#2400](supabase/supabase-js#2400)) - **supabase:** update X-Client-Info to structured metadata format ([#2359](supabase/supabase-js#2359)) ### 🩹 Fixes - **auth:** return AuthInvalidJwtError from getClaims for expired JWT ([#2395](supabase/supabase-js#2395)) - **auth:** recognize ?error= redirects in implicit grant gate ([#2407](supabase/supabase-js#2407)) - **auth): revert fix(auth:** encode client-id in oauth requests ([#2383](supabase/supabase-js#2383), [#2417](supabase/supabase-js#2417)) - **postgrest:** return a structured error for non-JSON body on successful responses ([#2398](supabase/supabase-js#2398)) - **release:** pin workspace:* sibling deps before JSR publish ([#2418](supabase/supabase-js#2418)) - **release:** publish gotrue-js legacy mirror via pnpm ([#2419](supabase/supabase-js#2419)) ### ❤️ Thank You - Claude Opus 4.7 (1M context) - Claude Sonnet 4.6 - Eduardo Gurgel - Guilherme Souza - Katerina Skroumpelou @mandarini - Omar Al Matar @Bewinxed - youcef zr @youcefzemmar - youcefzemmar This PR was created automatically. Co-authored-by: supabase-workflow-trigger[bot] <266661614+supabase-workflow-trigger[bot]@users.noreply.github.com>
This PR updates @supabase/*-js libraries to version 2.107.0. **Source**: manual **Changes**: - Updated @supabase/supabase-js to 2.107.0 - Updated @supabase/auth-js to 2.107.0 - Updated @supabase/realtime-js to 2.107.0 - Updated @supabase/postgest-js to 2.107.0 - Refreshed pnpm-lock.yaml --- ## Release Notes ## v2.107.0 ## 2.107.0 (2026-06-02) ### 🚀 Features - **auth:** remove navigator.locks-based mutex; introduce commit guard + dispose() ([#2392](supabase/supabase-js#2392)) - **realtime:** allow httpSend to send binary payload ([#2400](supabase/supabase-js#2400)) - **supabase:** update X-Client-Info to structured metadata format ([#2359](supabase/supabase-js#2359)) ### 🩹 Fixes - **auth:** return AuthInvalidJwtError from getClaims for expired JWT ([#2395](supabase/supabase-js#2395)) - **auth:** recognize ?error= redirects in implicit grant gate ([#2407](supabase/supabase-js#2407)) - **auth): revert fix(auth:** encode client-id in oauth requests ([#2383](supabase/supabase-js#2383), [#2417](supabase/supabase-js#2417)) - **postgrest:** return a structured error for non-JSON body on successful responses ([#2398](supabase/supabase-js#2398)) - **release:** pin workspace:* sibling deps before JSR publish ([#2418](supabase/supabase-js#2418)) - **release:** publish gotrue-js legacy mirror via pnpm ([#2419](supabase/supabase-js#2419)) ### ❤️ Thank You - Claude Opus 4.7 (1M context) - Claude Sonnet 4.6 - Eduardo Gurgel - Guilherme Souza - Katerina Skroumpelou @mandarini - Omar Al Matar @Bewinxed - youcef zr @youcefzemmar - youcefzemmar This PR was created automatically. Co-authored-by: supabase-workflow-trigger[bot] <266661614+supabase-workflow-trigger[bot]@users.noreply.github.com>
This PR updates `@supabase/supabase-js` to v2.107.0. **Source**: manual --- ## Release Notes ## v2.107.0 ## 2.107.0 (2026-06-02) ### 🚀 Features - **auth:** remove navigator.locks-based mutex; introduce commit guard + dispose() ([#2392](supabase/supabase-js#2392)) - **realtime:** allow httpSend to send binary payload ([#2400](supabase/supabase-js#2400)) - **supabase:** update X-Client-Info to structured metadata format ([#2359](supabase/supabase-js#2359)) ### 🩹 Fixes - **auth:** return AuthInvalidJwtError from getClaims for expired JWT ([#2395](supabase/supabase-js#2395)) - **auth:** recognize ?error= redirects in implicit grant gate ([#2407](supabase/supabase-js#2407)) - **auth): revert fix(auth:** encode client-id in oauth requests ([#2383](supabase/supabase-js#2383), [#2417](supabase/supabase-js#2417)) - **postgrest:** return a structured error for non-JSON body on successful responses ([#2398](supabase/supabase-js#2398)) - **release:** pin workspace:* sibling deps before JSR publish ([#2418](supabase/supabase-js#2418)) - **release:** publish gotrue-js legacy mirror via pnpm ([#2419](supabase/supabase-js#2419)) ### ❤️ Thank You - Claude Opus 4.7 (1M context) - Claude Sonnet 4.6 - Eduardo Gurgel - Guilherme Souza - Katerina Skroumpelou @mandarini - Omar Al Matar @Bewinxed - youcef zr @youcefzemmar - youcefzemmar ## v2.106.2 ## 2.106.2 (2026-05-25) ### 🩹 Fixes - **auth:** restore signup user response ([#2391](supabase/supabase-js#2391)) - **misc:** add react-native export condition for Hermes-safe resolution ([#2393](supabase/supabase-js#2393)) ### ❤️ Thank You - Myroslav Hryhschenko @BLOCKMATERIAL - Vaibhav @7ttp This PR was created automatically. Co-authored-by: supabase-workflow-trigger[bot] <266661614+supabase-workflow-trigger[bot]@users.noreply.github.com>
## Summary Fixes a stuck-session bug for users who change their cookie `Domain` in production (typically host-only to `.parent.tld`). After such a migration, `signOut` could not clear the stale host-only cookies, the browser kept returning both copies, and the session resurrected on the next read. Reported on supabase/supabase-js#2392. No read-side fix is possible: `document.cookie` does not expose `Domain` to JS, so the fix has to be on the write side. ## What changed - `removeItem`, `setItem` chunk cleanup, and `applyServerStorage`: when `cookieOptions.domain` is set, also emit a `Set-Cookie` clear with no `Domain` attribute. No change when `domain` is not configured. - New exported helper `clearAuthCookiesAtScopes` for rarer multi-scope migrations (`.parent1` to `.parent2`, path changes). Idempotent; safe to over-call (browser ignores clears at scopes the host does not own). ## Behavior Not breaking. Zero observable change for clients that do not set `cookieOptions.domain`. For clients that do, the extra `Set-Cookie` is a no-op unless a stale host-only cookie at the same name actually exists. Ship as minor (`feat` for the helper dominates the `fix` for the auto-clear). ## Tests New `describe` block in `cookies.spec.ts` covering `removeItem`, `setItem` chunk cleanup, and `applyServerStorage` for both with-domain and baseline cases. New `clearAuthCookiesAtScopes.spec.ts` covering the helper.
Description
What changed?
GoTrueClientnow defaults to lockless coordination. Thenavigator.locks-based mutex no longer runs by default — most users get the lockless path. Callers who explicitly pass a customlock(typically React NativeprocessLockor Node multi-process setups) keep the old behavior on an opt-in legacy path so the change is backwards-compatible.The lockless default uses:
refreshingDeferred(already existed) to single-flight in-instance concurrent refreshes._callRefreshToken: (1) snapshots storage before the rotated-token fetch and re-reads after, discarding rotated tokens if a non-null snapshot was cleared between the two reads (typical case: a concurrentsignOutran_removeSession); (2) captures a session-removal epoch counter before_saveSessionand re-checks after, so asignOutthat interleaves inside_saveSession's storage-write awaits is also caught. Either leg returns{ data: null, error: new AuthRefreshDiscardedError() }._recoverAndRefreshrecognises this error and skips its own_removeSession()call so no duplicateSIGNED_OUTevent fires.AuthRefreshDiscardedError(with anisAuthRefreshDiscardedErrortype guard), is what the commit guard returns when it throws away a successfully-rotated token. Distinct fromAuthRetryableFetchError(transient network) andAuthApiError(server rejection). Surfaces throughrefreshSession()andgetSession()results.client.auth.dispose()tears down the auto-refresh interval, thevisibilitychangelistener, theBroadcastChannel, and registeredonAuthStateChangesubscribers. Idempotent. Designed for React Strict Mode and HMR cleanup hooks. In-flightfetchcalls are not aborted — they run to completion.internal/tokens/service.go:376-385, so the client doesn't need to coordinate. The current default is v1 (RefreshTokenAlgorithmVersion = 0); v2 (counter-based) is opt-in viaRefreshTokenUpgradePercentageand gated rollout. This PR's reasoning relies on v1 behaviour, which is what customers actually run today.The legacy opt-in path (
settings.lock != null):_acquireLock,pendingInLock,lockAcquired,lockAcquireTimeout, and all 14 call-site wrappers are preserved verbatim frommasterand gated onthis.lock != null. The default path is untouched by them.// TODO(v3): remove legacy lock pathso the eventual v3 cleanup is a mechanical search-and-delete. A separate v3 follow-up tracks that work.lockandlockAcquireTimeoutconstructor options are accepted and honored when supplied; both are@deprecatedin JSDoc with "Custom locks still work in v2.x for backwards compatibility. Will be removed in v3."navigatorLock,processLock,LockAcquireTimeoutError,NavigatorLockAcquireTimeoutError,ProcessLockAcquireTimeoutError, andinternalsinlib/locks.tsremain exported. Still@deprecated.Stale JSDoc referencing the lock on
getSession,onAuthStateChange,_useSession,_challengeAndVerify, and_listFactorsnow matches the new default behaviour. The@deprecatedtag on the asynconAuthStateChangeoverload is kept, with its reason updated to point at the one residual reentry hazard (refreshSessioncalled from inside aTOKEN_REFRESHEDhandler —refreshingDeferred.resolvehappens after_notifyAllSubscribersreturns).Why was this change needed?
The shared mutex has caused seven failure classes on the issue tracker for 18+ months. Each earlier patch fixed one symptom and created another:
#2235 (primitive swap to
processLock) was an earlier attempt at moving offnavigator.locksand is no longer being actively pursued. Community PRs #2016 and #2019 fixed individual symptoms and were closed waiting on a structural fix.Failure classes resolved by this PR's default (lockless) path:
INITIAL_SESSION/getSession()hangs with a persisted session (#936 @bugprone). The default path no longer touchesnavigator.locks.AbortError: Lock broken by another request with the 'steal' option(#2013 @cpannwitz). The{ steal: true }fallback that produced these errors is bypassed on the default path.GoTrueClient.ts:3824). The cycle needed the lock; a callback that awaitsgetUser()now just runs on the default path.dispose()cleans them up.signOutblocked behind in-flight refresh.signOutnow runs concurrently with the refresh on the default path, and the two-leg commit guard prevents the refresh from writing rotated tokens after_removeSession()cleared storage — whether the clear happened before the post-fetch snapshot read or inside_saveSession's storage writes.Callers on the legacy opt-in path (explicit
settings.lock) keep the old serialization semantics and the failure modes that come with them. They accepted those when they opted in; they can drop the option to migrate to the lockless default at any time.Why default lockless: cross-tab races are handled on the server (GoTrue's parent-of-active), in-instance refresh dedup is already handled by
refreshingDeferred, and the only job the lock did beyond that was serializing subscriber callbacks — which is the deadlock we're fixing.Closes (test target): #2013, #936, #2111
Examples
dispose()in a React appSubscriber callbacks can call other auth methods now (default path)
One residual hazard remains: calling
refreshSessionfrom inside aTOKEN_REFRESHEDhandler still deadlocks viarefreshingDeferred. The async overload ofonAuthStateChangekeeps its@deprecatedmarker for that reason.AuthRefreshDiscardedErrorfor the signOut-during-refresh raceLegacy lock opt-in (existing custom-lock users)
Breaking changes
The earlier draft of this PR silently dropped custom
lockfunctions, which was behaviorally breaking for opt-in users. The current design preserves that behavior on a gated legacy path so the change ships as a v2 minor. Custom-lock users have time to migrate before v3 removes the option entirely.Behaviour changes worth flagging for review
lockoption no longer acquiresnavigator.locksor any in-process lock. For most users this is invisible (the lock was internal coordination). For users who depended on observing the lock indirectly (rare), the change is observable._autoRefreshTokenTickmay now run concurrently withsignOut/setSession/getUseron the default path. Previously_acquireLock(0, ...)made the tick skip whenever any auth op held the lock. The lockless equivalent only skips whenrefreshingDeferredis set. The commit guard keeps storage consistent under the new concurrency. The legacy lock opt-in path retains the old skip-on-any-lock behavior.signOutno longer waits for an in-flight refresh's HTTP and continuation to finish before its own fetch goes out. Both fetches now run concurrently, and the commit guard keeps storage consistent.@deprecatedwarnings onlockandlockAcquireTimeout. Both options now carry@deprecatedJSDoc on@supabase/auth-jsand@supabase/supabase-jstypes. Consumers with strict deprecation lint rules (e.g.deprecation/deprecationas error) will see new warnings after upgrading. The options continue to work in v2.x; removal is slated for v3.Checklist
<type>(<scope>): <description>npx nx formatto ensure consistent code formattingdispose(), subscriber re-entry no-deadlock, default vs. legacy path)Additional notes
Alternatives considered
Continuing to patch
navigator.locks(better steal recovery, lower timeouts, smarter error filtering). Each prior patch in this direction (#1962, #2106, #2178) fixed one symptom and produced another. The Web Locks API has no recovery for orphaned holders other than{ steal: true }, which leaves the previousfn()running concurrently with the new holder and creates the steal-cascade error storm. Each fix here swaps one failure for another instead of removing the contention that causes them.Swapping
navigatorLockforprocessLockas the default browser lock (the direction #2235 explored). Removes the cross-process orphan failure but keeps the same shape of the bug: one primitive serializing operations that don't all need the same synchronization.Non-blocking subscriber notifications alone (#2016). Fixes subscriber re-entry. The other six failure classes still bite.
Removing the lock entirely (no legacy opt-in). The earlier draft of this PR did this. It silently dropped custom locks supplied via
settings.lock, which broke React NativeprocessLockand Node multi-process setups. The current design preserves those callers on a gated legacy path so the change is non-breaking; the legacy path is slated for removal in v3.Adding an
AbortControllerlayer for in-flight operation cancellation. Deferred to a follow-up, not included here. The commit guard catches the races that affect correctness (signOut overwriting cleared storage with rotated tokens, in both the fetch and storage-write phases). AbortController would be a UX improvement on top: cancel the in-flight refresh as soon as signOut runs, instead of letting it finish and discarding the result. Out of scope for this PR.Deferring the commit guard and accepting eventual consistency. Considered and rejected. Without a guard, a refresh that completes after
_removeSession()cleared storage will write the rotated tokens back. Subscribers then seeSIGNED_OUTfollowed byTOKEN_REFRESHED, with stale tokens in storage until the next refresh tick (~30s) fails against the server and clears them. The guard is about 30 lines (both legs combined). Better to land it with this PR than to ship the bug and patch it later.Server-side context
Why cross-tab is safe on the default path: GoTrue's parent-of-active path at
internal/tokens/service.go:376-385(the v1 branch,*models.RefreshToken). When a request arrives with a revoked refresh token whose child is the currently-active token, the server returns the active token instead of rejecting. Two tabs that POST the same refresh token concurrently both receive the same rotated token under DB row locking. This is the production-default behaviour (RefreshTokenAlgorithmVersion = 0, v1). v2 (counter-based, gated onRefreshTokenUpgradePercentage) is safe under the same N-tab concurrency, covered byTestConcurrentReuse.Test coverage
Updated tests in
packages/core/auth-js/test/GoTrueClient.test.ts:'Lockless coordination (default) and legacy lock opt-in'describe:lockoption, assert_acquireLockis never invokedlock, assertlockIS invoked (.toHaveBeenCalled())lockAcquireTimeoutaccepted on both paths (lockless ignores it at runtime; legacy uses it)onAuthStateChangewithout deadlock'dispose() lifecycle'describe: idempotency, subscriber clearing, ticker stopping.'Refresh commit guard (signOut-during-refresh race)'describe (4 tests): cleared-mid-flight discard, empty-storage acceptance, different-session acceptance, different-session-written-mid-flight discard.Updated tests in
packages/core/auth-js/test/GoTrueClient.browser.test.ts:'Lockless coordination: navigator.locks should NOT be invoked': default path no longer touches the browser API.'Legacy lock opt-in: customlockfunction is invoked when supplied': custom lock IS called on the legacy path.Updated tests in
packages/core/supabase-js/test/unit/SupabaseAuthClient.test.ts:'should pass through lockAcquireTimeout option'and'should accept auth.lockAcquireTimeout and wire it to auth client': assert the option flows fromcreateClientthrough to the auth client instance (via a targetedas unknown as { lockAcquireTimeout: number }cast). Comment in-source explains the cast is greppable for the v3 cleanup.AI review responses
Addresses inline AI review feedback on this PR:
_saveSession(MEDIUM): closed via a new_sessionRemovalEpochcounter, incremented synchronously at the top of_removeSessionbefore anyawait, captured in_callRefreshTokenimmediately before_saveSession, and re-checked after. If the epoch advances during the save, the rotated tokens are undone directly withremoveItemAsync(not via_removeSession, which would emit a duplicateSIGNED_OUT).substring(0, 5)ofrefresh_tokenfrom all four_debug()paths (_callRefreshToken/_refreshAccessTokendebugNameconstruction, plus the two fields in the commit-guard discard log). Discard logs now use presence indicators ('present'/'replaced'/'cleared') instead of credential fragments.Resolved during review
AbortSignal.anyplumbing inlib/fetch.ts— not done here and not blocking. Deferred to the AbortController follow-up, which gates onAbortSignal.any(Node 20.3+ floor or polyfill). Worth noting the follow-up has rate-limit-posture implications, not just UX: cancelling refreshes that the commit guard is about to discard reduces project-level/tokenvolume.@supabase/supabase-jstypes deprecation — confirmed.lockandlockAcquireTimeoutare@deprecatedin both@supabase/auth-jsand@supabase/supabase-jstypes. The supabase-js surface is the one most users see; deprecating only in auth-js would miss the JSDoc tooltip for the passthrough. Flagged separately under "Behaviour changes worth flagging" so the new lint warnings don't surprise downstream consumers.v3 cleanup
A separate v3 follow-up tracks the mechanical removal of the legacy lock path. Every gated site in
GoTrueClient.tshas a// TODO(v3): remove legacy lock pathmarker; the follow-up lists the full file-by-file scope (covering both@supabase/auth-jsand@supabase/supabase-js).Replaces #2387 (which was opened against
developbefore the v3 branching update).