| status | active |
|---|---|
| type | implementation-plan |
| description | Plan to make hoist-core identity/request access safe across async boundaries (post-recycle threads, auto-instrumented spans). |
| created | 2026-05-19 |
RequestFacade.checkFacade() throws IllegalStateException once Tomcat recycles the
underlying request. Hoist-core hits this when async or post-response code paths dereference
the live request indirectly β currently observed via TraceService.createSpan β
hoistTags() β IdentityService.getAuthUsername β request.getSession(). The same shape
applies to TagSpanProcessor.onStart (every auto-instrumented span) and
TrackService.parseSubmittedEntry (userAgent/browser/device).
Root cause: framework accessors re-read identity from the session on every call, and
WebPromises.task propagates a live GrailsWebRequest reference into worker threads that
outlive the original request.
The session is the durable source of truth for identity, but it is read at session-resume
points only β i.e., when a thread starts processing on behalf of a user. After that, identity
is fixed for the thread and should live in thread-local state. No accessor should hit the
session on every call; nothing outside the request thread should touch a RequestFacade at
all.
Goal: Make identity a thread property sourced from the session once at thread entry, eliminate per-call session reads, and add defensive guards on the remaining request-touching paths.
-
Introduce
HoistIdentityβ single immutable POGO holdingusernameandauthUsername. Lives inio.xh.hoist.user. Constructed once at thread entry; never mutated. All identity reads return fields off this object. -
IdentityServiceuses a singleThreadLocal<HoistIdentity>as primary identity source. Replaces the existingthreadUsername/threadAuthUsernameThreadLocals (the legacy pair may be kept populated for one release for BC, or removed if no external callers exist). All public accessors (getUsername,getAuthUsername,getUser,getAuthUser,findHoistUsername) read from the cache. Session is no longer touched from accessors. -
Cache populated at request entry.
HoistFilter(post-auth) reads session attributes once, constructs aHoistIdentity, installs it on the ThreadLocal, clears infinallywhen the filter exits. One session read per request, on the request thread, where the facade is live. -
Identity-mutating operations replace the cached
HoistIdentityin lock-step with the session.login(),logout(),impersonate(),endImpersonate()already write to the session β they additionally construct and install a freshHoistIdentityon the ThreadLocal. Finite, audit-able set of mutation sites. -
New
IdentityPropagatingPromiseFactory(modeled onContextPropagatingPromiseFactory). At task creation: capture the originating thread'sHoistIdentity. On the worker: install it on the ThreadLocal before the closure runs; clear infinally. Installed at startup, composed with the OTel context-propagating factory. -
ClusterTaskconstructs and installs aHoistIdentityrather than setting the raw ThreadLocals. Same behavior; unified accessor surface. -
Defensive guards on remaining request-touching paths, for any code path that bypasses the identity cache:
IdentityService.getSessionIfExistsβ wraprequest?.getSession(false)intry/catch (IllegalStateException) β null. A recycled facade is semantically equivalent to "no session," which already has correct fallback behavior.TrackService.parseSubmittedEntryβ wrapcurrentRequest?.getHeader(...),getBrowser(currentRequest),getDevice(currentRequest)in the same guard. These fields are best-effort observability; null is acceptable.
TraceService.hoistTags(),TagSpanProcessor.onStart, and any other identity consumer no longer touch a request or session. The observed bug is gone.- Async work spawned via
task {}(web or plain) receives the originating user's identity automatically via the decorator. - Any future or app-side code that bypasses the cache and hits the recycled facade fails cleanly (null fallback) rather than throwing.
Low. Identity accessors keep their signatures and return values. The behavioral shift is "session is read once at request start instead of N times during the request," which is a perf improvement, not a semantic change. The defensive guards are pure null-where-it-already- fails-null.
Minor release.
Goal: Defense-in-depth. After Phase 1 the observed bug is fixed, but any future or
app-side code that calls Utils.currentRequest/request.X inside an async block still
holds a live-but-doomed facade. Phase 2 closes that door.
Status: Optional. Evaluate after Phase 1 ships and we see whether real-world reports surface code paths Phase 1 doesn't already cover.
BaseController.runAsync is built on WebPromises.task, which propagates
RequestContextHolder + GORM session binding to the worker. That's exactly what
WebPromises.task is designed for, and it's plausibly used by downstream apps for async
response rendering (start work on a worker, eventually render/renderJSON to the
still-open response). Switching the factory under existing callers would break that use
case. The current runAsync exception handler itself reads actionName (a webRequest
lookup), so even the framework's own code assumes web propagation.
-
New
BaseController.runDetached(Closure). Built on plainPromises.task+ the identity propagation decorator from Phase 1. On the worker:- Identity is available via
identityService.authUsername(from decorator). RequestContextHolder.resetRequestAttributes()is called soUtils.currentRequestreturns null β no live facade reference on the worker.- Exception handler snapshots needed controller state (
actionName) on the request thread into the closure.
- Identity is available via
-
Documentation:
runAsyncβ "Use for async response handling. Worker thread has access to request/response. Do not use for work that outlives the response."runDetachedβ "Use for fire-and-forget background work that outlives the response. Identity propagates; request/response are not accessible."
-
Optional deprecation pass. If telemetry or code inspection shows that
runAsyncis overwhelmingly used for fire-and-forget rather than async response handling, consider deprecatingrunAsyncin favor of explicitrunAsync/runDetachednaming in a future major.
Low β additive API. The risk we avoided is the one of changing runAsync semantics, which
this approach explicitly sidesteps.
Separate minor release. Only if real-world need emerges.
- Does not introduce a multi-field request snapshot POGO. Identity is the only state that needs to survive thread transitions; non-identity request fields are either observability best-effort (handled by the Phase 1 defensive guards) or genuinely scoped to the request thread.
- Does not change
TraceService.hoistTagsorTagSpanProcessordirectly. They benefit automatically once identity comes from cache. - Does not change
WebPromises.taskpropagation semantics. Phase 2 (if pursued) adds a parallel API rather than mutating the existing one. - Does not deprecate any current public API. The Phase 1 ThreadLocals may be unified internally but the existing ones can stay populated for BC if needed.
For Phase 1:
- Unit:
IdentityServiceaccessors read from the thread cache, populate from session only at filter entry, update on login/logout/impersonate. - Unit:
IdentityPropagatingPromiseFactorycaptures on task creation, installs on worker, clears in finally, handles nesting. - Integration: a controller that does
runAsync { Thread.sleep(200); identityService.authUsername }returns the correct username with noIllegalStateExceptionafter the response is rendered. - Integration: same scenario hitting
trackService.track(...)and opening a manual span β confirms tags resolve, no exception. - Regression: existing impersonation flow continues to work (session write + cache update on the same thread).
For Phase 2 (if pursued):
- Confirm
Utils.currentRequestreturns null inside arunDetachedbody. - Confirm
runAsyncstill has live request/response for async response rendering.
| Phase | Ships as | When |
|---|---|---|
| 1 | Minor release | Now |
| 2 | Minor release | Optional. Only if Phase 1 leaves uncovered cases in practice. |