Skip to content

Less tab storms through smarter failures#15

Merged
VersusFacit merged 5 commits into
v1.14.0+dbtfrom
mp/smarter_failures_fewer_tabs
Sep 4, 2025
Merged

Less tab storms through smarter failures#15
VersusFacit merged 5 commits into
v1.14.0+dbtfrom
mp/smarter_failures_fewer_tabs

Conversation

@VersusFacit
Copy link
Copy Markdown

@VersusFacit VersusFacit commented Sep 4, 2025

Description

I made a finite state machine 🫠

I manually ran through the behaviors I changed and can report the mysterious (why are all these SAMLs continuously spamming even though my first auth failed / was cancelled) scenario is fast failing now.

Screenshot 2025-09-03 at 5 49 00 PM

External Browser Authentication Failure Scenarios

Scenario Where it fails Example cause Behavior (driver)
SAML-phase failure (no token obtained) During authenticateByExternalBrowser Browser could not open, user canceled at IdP, mistyped password, timeout waiting for redirect Retryable immediately. No backoff marker set.
Auth-phase failure (no cached ID token) After SAML, during authenticate() Snowflake rejects token: IP restriction (390422), account misconfiguration, IdP mismatch Fail fast with backoff. Marker set for ~60s to fail all new connections and prevent tab storms.
Auth-phase failure (cached ID token) Attempting with cached ID token Cached ID token expired, corrupted, or revoked Clear token and retry once interactively. If retry succeeds, clear marker. If retry fails, set backoff.
Auth-phase failure (OAuth refreshable) After SAML or token use Token expired but refreshable Run refresh logic. Delete bad token, refresh, retry authenticate(). Underlying error surfaces if refresh fails.
Context canceled / deadline exceeded Any point Caller cancels Go context, timeout at client level Fail fast, no backoff.
Success All phases complete Valid SAML, token accepted Clear any stale backoff marker. Cache ID token if configured.

Manual tests using build in a dbt project

  1. run project with empty token cache
  2. run project with valid token cache
  3. run project with invalidated-by-manual-corruption token cache
  4. run project with valid token cache and corruption midexecution (see one browser pop up and then fixed)
  5. run project with valid token cache and no vpn -- only one tab

When a failure that isn't retryable happens, fail fast. Proved this design is still self healing mid execution (opens another tab) or upon next invocation (opens another SAML window).
@VersusFacit VersusFacit self-assigned this Sep 4, 2025
@VersusFacit VersusFacit marked this pull request as ready for review September 4, 2025 01:47
Comment thread auth.go Outdated

// Quality of life features for externalbrowser
var lastFail sync.Map // key -> time.Time (expiry)
const extBrowserBackoffWindow = 10 * time.Second
Copy link
Copy Markdown
Author

@VersusFacit VersusFacit Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually thinking I might bump this back up to 30 since we want to fast fail all threads waiting to auth once the first failure is discovered and 10 seconds may not be enough...

@VersusFacit VersusFacit merged commit a5e71f4 into v1.14.0+dbt Sep 4, 2025
1 check failed
@github-actions github-actions Bot locked and limited conversation to collaborators Sep 4, 2025
@felipecrv felipecrv deleted the mp/smarter_failures_fewer_tabs branch September 4, 2025 21:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants