-
Notifications
You must be signed in to change notification settings - Fork 146
Description
tokensContext() grabs a mutex:
Lines 75 to 79 in 323cd42
| func (c *Connection) tokensContext(ctx context.Context) (code int, access, refresh string, err error) { | |
| // We need to make sure that this method isn't execute concurrently, as we will be updating | |
| // multiple attributes of the connection: | |
| c.tokenMutex.Lock() | |
| defer c.tokenMutex.Unlock() |
IIUC this means that while a refresh/re-auth request to SSO is in progress, all concurrent API requests will be blocked.
This is sub-optimal — usually we renew some time before previous token expires, and could be used meanwhile without delaying requests.
Even the original request that triggered the renewal could proceed without blocking!
A second question is what happens with retries and concurrent requests. The mutex is only held for a single try, but if a retry is needed it means we still don't have valid auth, so the 2nd request will immediately take the mutex; the first retry loop doesn't know that happened, so we can get several retry loops running in parallel. 🔀
Once one of them succeeds, the following tokensContext() tries will all see valid auth and succeed immediately, and all retry loops will complete.
- So this sounds internally safe to me;
- however, externally we're not keeping the backoff we intended. Are we risking a "thundering herd" problem where we make an SSO outage worse by amplifying the number of requests?
- also (very minor), this may confuse the retries logging/metrics Better logging and metrics when retrying SSO #231.
I propose we move renewal off the requesting goroutine into a separate dedicated goroutine. One per Connection object. This will simplify management of retries.
We need to decide whether renewal should wait for a request that needs auth (as done now, meaning zero traffic when unused), or be timer-based (minimizes latency).
@igoihman @nimrodshn @tzvatot @vkareh @zgalor @petli-openshift WDYT?