You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
writeCacheEntry keeps a small ~/.failproofai/cache/codex-session-paths.json map that lets findCodexTranscript skip the multi-level directory scan on subsequent lookups — a nice perf win. The current implementation does a classic read-modify-write cycle (readCache() → mutate → writeFileSync) without any locking or atomic rename, so when the dashboard kicks off several Codex session lookups in parallel (which it does on the projects page), two writes that race can stomp on each other and one of the entries vanishes from the cache.
This is purely a "miss" — the next lookup just falls back to the directory scan and re-populates — so the user-visible effect is "the page is occasionally a hair slower than it should be." Tightening this is a friendly perf-on-the-margin improvement that aligns Codex with the atomic-write pattern we already use in hook-activity-store.ts.
For comparison, src/hooks/hook-activity-store.ts:201-209 already uses the right pattern:
consttmpPath=join(storeDir,`stats.json.${process.pid}.tmp`);writeFileSync(tmpPath,JSON.stringify(s),"utf-8");renameSync(tmpPath,join(storeDir,STATS_FILE));// atomic on POSIX
…and src/auth/token-store.ts:46-56 does a similar tmp-file-then-rename for auth.json.
Why this matters
sequenceDiagram
participant R1 as Request A<br/>(sessionId X)
participant R2 as Request B<br/>(sessionId Y)
participant FS as codex-session-paths.json<br/>{}
R1->>FS: readCache → {}
R2->>FS: readCache → {}
R1->>R1: cache[X] = pathX → {X: pathX}
R2->>R2: cache[Y] = pathY → {Y: pathY}
R1->>FS: writeFileSync({X: pathX})
R2->>FS: writeFileSync({Y: pathY})
Note over FS: Final state = {Y: pathY}<br/>X is silently dropped
Loading
How this surfaces today:
The dashboard's projects view calls getCodexSessionLog(sessionId) for many sessions in parallel via batchAll (lib/concurrency.ts). Each one hits findCodexTranscript, and the misses each call writeCacheEntry.
A few of those writes will overlap; whichever lands last wins.
On the next page load, the dropped entries miss the cache and fall back to the full year/month/day scan — visible as a small extra latency for those few sessions.
Worst case in disk-write timing: writeFileSync is non-atomic, so if the process is interrupted mid-write the file can also be left as a partial JSON, which readCache's try/catch already converts into "treat as empty" — so we never crash, we just throw away the whole cache. Annoying but not destructive.
There is no security or data-integrity impact — this cache is purely an optimisation, and findCodexTranscript always falls back to the on-disk truth when the cache is wrong.
Suggested approach
The simplest fix mirrors what hook-activity-store.ts and token-store.ts already do — and it doesn't introduce any new locking primitives:
functionwriteCacheEntry(sessionId: string,path: string): void{try{mkdirSync(dirname(CACHE_PATH),{recursive: true});constcache=readCache();cache[sessionId]=path;consttmp=`${CACHE_PATH}.${process.pid}.tmp`;writeFileSync(tmp,JSON.stringify(cache),"utf-8");renameSync(tmp,CACHE_PATH);// atomic replace on POSIX}catch{// best-effort}}
That alone removes the partial-write window. To also preserve the other request's entry under the race shown above, batch the read+write under a tiny advisory lock — same shape as acquireLock/releaseLock at src/hooks/hook-activity-store.ts:85-110. Optional and only worth it if telemetry shows the cache miss rate is high.
A nice cherry on top: bound the cache to, say, 500 entries with simple FIFO drop, so a long-running dashboard process doesn't grow codex-session-paths.json unboundedly. (Not strictly required to close this issue.)
Customer impact
Today: Dashboard projects page may take a small extra hop on subsequent loads for the small fraction of sessions whose cache entry got clobbered. Imperceptible for ≤ 100 sessions; noticeable for users with thousands of Codex sessions.
After fix: Cache hit rate is what we'd intuit it to be from the code; partial-write windows go away; behaviour matches the rest of the codebase's atomic-write pattern.
Zero impact on correctness or hot path — writeCacheEntry is already fire-and-forget.
Acceptance criteria
writeCacheEntry writes to a tmp path and renames into place.
readCache never sees a half-written JSON blob.
(Stretch) writeCacheEntry preserves concurrent writers' entries under load.
Optional: small FIFO bound on cache size.
Related
Same atomic-write pattern as src/auth/token-store.ts and src/hooks/hook-activity-store.ts.
Summary
writeCacheEntrykeeps a small~/.failproofai/cache/codex-session-paths.jsonmap that letsfindCodexTranscriptskip the multi-level directory scan on subsequent lookups — a nice perf win. The current implementation does a classic read-modify-write cycle (readCache()→ mutate →writeFileSync) without any locking or atomic rename, so when the dashboard kicks off several Codex session lookups in parallel (which it does on the projects page), two writes that race can stomp on each other and one of the entries vanishes from the cache.This is purely a "miss" — the next lookup just falls back to the directory scan and re-populates — so the user-visible effect is "the page is occasionally a hair slower than it should be." Tightening this is a friendly perf-on-the-margin improvement that aligns Codex with the atomic-write pattern we already use in
hook-activity-store.ts.Where
lib/codex-sessions.ts:50-59For comparison,
src/hooks/hook-activity-store.ts:201-209already uses the right pattern:…and
src/auth/token-store.ts:46-56does a similar tmp-file-then-rename forauth.json.Why this matters
sequenceDiagram participant R1 as Request A<br/>(sessionId X) participant R2 as Request B<br/>(sessionId Y) participant FS as codex-session-paths.json<br/>{} R1->>FS: readCache → {} R2->>FS: readCache → {} R1->>R1: cache[X] = pathX → {X: pathX} R2->>R2: cache[Y] = pathY → {Y: pathY} R1->>FS: writeFileSync({X: pathX}) R2->>FS: writeFileSync({Y: pathY}) Note over FS: Final state = {Y: pathY}<br/>X is silently droppedHow this surfaces today:
getCodexSessionLog(sessionId)for many sessions in parallel viabatchAll(lib/concurrency.ts). Each one hitsfindCodexTranscript, and the misses each callwriteCacheEntry.writeFileSyncis non-atomic, so if the process is interrupted mid-write the file can also be left as a partial JSON, whichreadCache'stry/catchalready converts into "treat as empty" — so we never crash, we just throw away the whole cache. Annoying but not destructive.There is no security or data-integrity impact — this cache is purely an optimisation, and
findCodexTranscriptalways falls back to the on-disk truth when the cache is wrong.Suggested approach
The simplest fix mirrors what
hook-activity-store.tsandtoken-store.tsalready do — and it doesn't introduce any new locking primitives:That alone removes the partial-write window. To also preserve the other request's entry under the race shown above, batch the read+write under a tiny advisory lock — same shape as
acquireLock/releaseLockatsrc/hooks/hook-activity-store.ts:85-110. Optional and only worth it if telemetry shows the cache miss rate is high.A nice cherry on top: bound the cache to, say, 500 entries with simple FIFO drop, so a long-running dashboard process doesn't grow
codex-session-paths.jsonunboundedly. (Not strictly required to close this issue.)Customer impact
writeCacheEntryis already fire-and-forget.Acceptance criteria
writeCacheEntrywrites to a tmp path and renames into place.readCachenever sees a half-written JSON blob.writeCacheEntrypreserves concurrent writers' entries under load.Related
src/auth/token-store.tsandsrc/hooks/hook-activity-store.ts.https://claude.ai/code/session_01SAwaAnE9bTuLujnvksoCYN