fix(App): eliminate clientMsgDeviceId=0 race via sessionStorage cache (closes #256, depends on #255)#274
Conversation
Pure function handleDeviceFetchError() detects 400 + err.server.user.device_not_found and invokes injected recovery handlers. No App/WKApp imports — fully unit-testable. Wiring into App.startMain follows in a later commit. Refs Mininglamp-OSS#76
…ininglamp-OSS#76) Regression guard: auth-expired errors must remain on the APIClient's existing auth path, not our stale-device path. Refs Mininglamp-OSS#76
Mininglamp-OSS#76) When the request fails before reaching the server (timeout, DNS, offline), status is undefined and we must not logout. Refs Mininglamp-OSS#76
… handler (Mininglamp-OSS#76) Only err.server.user.device_not_found should trigger the recovery flow. Other validation errors at the same HTTP status are unrelated and must pass through. Refs Mininglamp-OSS#76
…lamp-OSS#76) Wires handleDeviceFetchError into the device fetch in App.startMain. On 400 err.server.user.device_not_found: clear sessionStorage deviceId, reset in-memory deviceId, trigger logout. logout() ends in window.location.reload(), which re-runs App.startup → getDeviceIdFromStorage() generates a fresh UUID; the user's next login re-registers the device record server-side. Without this handler the device fetch silently fails, leaving WKSDK.config.clientMsgDeviceId at its default 0. Outbound messages then build clientMsgNo as "<uuid>_0_3" instead of "<uuid>_<real-device-id>_3", breaking per-device message dedup and cross-device tracking — a degrading symptom, not a visible UI freeze. Trade-off acknowledged: this force-logs the user out, which violates the issue reporter's "no manual intervention" preference. A silent re-register path would require a new server endpoint (POST /v1/user/devices/register or allowing re-login with existing token) — out of scope for this PR. See PR description for forward suggestion. Local logout() is intentionally used over logoutUserInitiated() to avoid triggering OIDC end-session for a system-detected condition (see deviceFailure.ts module doc). Fixes Mininglamp-OSS#76
…nglamp-OSS#76) Browser empirical test revealed err.status === 400 never matches in production. APIClient.normalizeApiError exposes err.status from the response body's err.http_status field (=404), not from the actual HTTP status line (400). The two diverge for this endpoint because the server's response body is internally inconsistent (HTTP 400 Bad Request but body.error.http_status=404). Switch the gate to err.code-only matching. The code string 'err.server.user.device_not_found' is the stable backend contract and unambiguously identifies the stale-device condition. Tests updated: - Test 1: status value changed 400 → 404 (matches production reality) - New test 5: explicit defensive case that asserts the gate fires on the device_not_found code regardless of status field value Refs Mininglamp-OSS#76
Address final code review feedback: replace `normalized: undefined as any`
with a real (minimal) NormalizedApiError stub `{ message: '', raw: undefined }`,
and drop the unnecessary `status: undefined as any` baseline (status is already
optional in APIClientRejectedError; omitting from baseline is equivalent).
Removes the outer `as APIClientRejectedError` cast — the returned object is
now structurally complete enough to satisfy the type without coercion.
If handleDeviceFetchError ever begins to read `normalized.*`, the stub will
need real values rather than failing silently against `undefined`.
Refs Mininglamp-OSS#76
…Mininglamp-OSS#76) Address review feedback (PR Mininglamp-OSS#255): silently ignoring non-stale device fetch errors hides any future regression on /user/devices. Add a single console.warn for any err.code that is NOT err.server.user.device_not_found. The stale-device path still calls handleDeviceFetchError unchanged; this warn is purely additive observability. Status/code/msg are logged in structured form for easy filtering. Refs Mininglamp-OSS#76
Address review feedback (PR Mininglamp-OSS#255): the order removeDeviceId → resetInMemoryDeviceId → triggerLogout is load-bearing — triggerLogout fires window.location.reload(), and the post-reload App.startup → getDeviceIdFromStorage() only regenerates a fresh UUID if sessionStorage.deviceId is empty. Pin the order with an explicit array-tracking test so any future refactor that accidentally reorders the calls fails the test. Refs Mininglamp-OSS#76
First entry: err.server.user.device_not_found. Code-only matching because server returns HTTP 400 with body.http_status=404 — neither status field is the stable contract. Future siblings (e.g., space.not_found) can be added to the set without interceptor changes. Part of Plan F (octo-web#270 / follow-up to Mininglamp-OSS#76). See docs/superpowers/specs/2026-06-05-plan-f-stale-resource-interceptor-design.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the existing logoutCallback/isAuthExpiredApiError pattern. Branch is else-if (mutually exclusive with auth-expired) — a single error response fires at most one recovery callback. Part of Plan F (octo-web#270 / follow-up to Mininglamp-OSS#76). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ity, cousin code, concurrency, code propagation Adds 5 cases to APIClient.staleResource.test.ts: - auth-expired still routes via logoutCallback only - login_device_expired (HTTP 401 cousin) routes via auth path, not stale - concurrent stale responses fire callback per response (re-entry contract) - unrelated 404 codes don't trigger stale callback - code string propagates correctly to callback argument Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan F adds a new callback field on APIClient.shared. The shared singleton means a test that sets a mock callback would leak it to subsequent tests without an explicit reset. Adding this line proactively (rather than after hitting cross-test pollution) is bundled fix Mininglamp-OSS#2 from the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Registers the Plan F callback next to logoutCallback. Dispatches the device_not_found code to the existing handleDeviceFetchError recovery handler (unchanged). Future sibling codes plug in here by adding else-if branches — no APIClient or interceptor changes needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… interceptor Plan F: stale device recovery now fires in the APIClient response interceptor via staleLocalResourceCallback. startMain's catch only consumes the promise rejection to prevent Uncaught noise and logs non-stale failures for observability. Recovery semantics are unchanged (handleDeviceFetchError still invoked, just from module.tsx now). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multiple concurrent API errors (auth-expired + stale-device, or N parallel stale-device responses) could call logout() rapidly before the page actually unloads, causing clearLocalLoginState() to run N times. Idempotent in practice but wasteful. Guard is one-way per session: only a real page reload resets the flag, which matches production semantics (the flag's lifetime equals the JS context's lifetime). Bundled fix Mininglamp-OSS#1 from Plan F spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan F Task 7's one-way loggingOut guard had a latent wedge: if clearLocalLoginState threw (Safari private mode, QuotaExceededError, SecurityError on raw localStorage/sessionStorage), the guard would be set to true but the reload would never run, and every subsequent logout attempt would short-circuit on the guard. The user would be permanently stuck with no path back to login except a manual browser refresh. Wrap in try/catch and let the reload happen regardless. After reload, the next bootstrap will detect the broken/missing auth state and route to login normally. Surfaced by Plan F final code review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment mistakenly said try/finally; the code uses try/catch. Surfaced by Plan F final review nit. One-word fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… convention Addresses PR Mininglamp-OSS#255 blocking review (code-review bot): although Vitest 4.1.5 ships toHaveBeenCalledExactlyOnceWith as a built-in matcher, no other test in this repo uses it. Substitute the equivalent two-line form to match project convention. Semantics identical (count==1 AND called-with-arg). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-OSS#256 Three pure helpers (loadCachedNumericDeviceId, saveNumericDeviceId, clearCachedNumericDeviceId) wrapping StorageService.shared. Parse-with- validation rejects NaN/0/negative on load. Save silently no-ops on storage write failures (Safari private mode, QuotaExceededError) so a write failure degrades to "no cache" rather than breaking the caller. Used by App.startMain in the next commit to eliminate the clientMsgDeviceId=0 race window (issue Mininglamp-OSS#256). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ininglamp-OSS#256) Cache hit path: load cached numeric id from sessionStorage, set WKSDK.config.clientMsgDeviceId BEFORE connectIM(), then refresh in background. Zero WS connect latency, zero race window for repeat users. Cache miss path (first login / new tab / post-logout-relogin): await the device fetch with a 5s timeout safety net, then connectIM(). ~200ms WS connect delay only for first-login, but zero race. Timeout fallback degrades to pre-Mininglamp-OSS#256 behavior (proceed with clientMsgDeviceId=0); the underlying fetch keeps running in background and will update the cache if it eventually resolves. The fetchPromise has its own .catch to prevent late rejections from becoming Uncaught (in promise) when timeout wins the race. Plan F interceptor still handles stale device errors centrally. Risk A guard: skip cache writes if `loggingOut` is set between dispatch and resolve — prevents the background .then from repopulating a cache that clearLocalLoginState just emptied. startMain becomes async. Both callers (App.tsx inside addOnLogin and inside isLogined()) are intentionally fire-and-forget; the inner .catch handlers handle errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clearLocalLoginState now removes the cached numeric device id. Without this, a same-tab relogin or user-switch would inherit the previous user's cached id, leading to wrong-id messages until the next device fetch updates the cache. With this clear, the post-logout session always re-traverses the first-login path → correct id for the new auth. Also consumes the clearCachedNumericDeviceId import that was staged in the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ninglamp-OSS#256) Regression discovered during E2E: when the cache-miss path's await fetch returns stale device, Plan F's APIClient interceptor fires logout synchronously. After the await resolves, control returns to startMain and continues with connectIM / contactsSync / prohibitWordsSync — but the token/uid are already cleared, so all three fire requests that 401. The 401 chains produce unhandled Uncaught (in promise) noise in the console before the pending reload completes. Fix: check this.loggingOut after the await. If logout was triggered during the await, return early. The pending window.location.reload() will reset the page; no further work needs to happen in this function. The cache-hit branch already returns BEFORE any await, so this same guard is unnecessary there — connectIM fires before any logout signal could land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Jerry-Xin
left a comment
There was a problem hiding this comment.
The PR is relevant to Mininglamp-OSS/octo-web and the implementation is generally sound: it addresses the startup clientMsgDeviceId=0 race with a cache/await split, adds interceptor-based stale-device recovery, and covers the new helper/interceptor behavior with focused tests.
💬 Non-blocking
🟡 Warning — startMain() is now async, but all call sites ignore its returned promise. Most expected failures are handled internally, but any unexpected synchronous throw in loadCachedNumericDeviceId(), connectIM(), contactsSync(), or ProhibitwordsService.shared.sync() will now surface as an unhandled promise rejection rather than a synchronous exception. Consider wrapping the top-level body or changing callers to void this.startMain().catch(...). Reference: packages/dmworkbase/src/App.tsx:854, packages/dmworkbase/src/App.tsx:862, packages/dmworkbase/src/App.tsx:883.
🔵 Suggestion — loadCachedNumericDeviceId() uses parseInt, so corrupted values like "42abc" or "1.5" are accepted as valid numeric IDs. The comment says the cache must be a positive finite integer; Number(raw) plus Number.isInteger() would match that contract more tightly. Reference: packages/dmworkbase/src/Service/clientMsgDeviceIdCache.ts:24.
🔵 Suggestion — clearCachedNumericDeviceId() does not catch storage removal errors, while saveNumericDeviceId() explicitly treats storage as best-effort. logout() catches clearLocalLoginState() failures, so this is not blocking, but making clear symmetric with save would keep the helper contract cleaner. Reference: packages/dmworkbase/src/Service/clientMsgDeviceIdCache.ts:53.
🔵 Suggestion — The APIClient tests mutate axios.defaults.adapter globally and do not restore it. Existing tests already follow this pattern in places, but this new file adds more global mutation. Adding afterEach restoration would reduce order-dependent failures. Reference: packages/dmworkbase/src/Service/__tests__/APIClient.staleResource.test.ts:20.
✅ Highlights
The cache-miss path correctly awaits the device lookup before connectIM() and short-circuits after stale-device logout, which closes the main race without adding a permanent startup delay. Reference: packages/dmworkbase/src/App.tsx:869.
The cache-hit path sets WKSDK.shared().config.clientMsgDeviceId before opening the websocket and refreshes the value in the background with rejection handling. Reference: packages/dmworkbase/src/App.tsx:857.
The interceptor keeps auth-expired and stale-resource handling mutually exclusive, and the added tests cover the important classifier boundaries. Reference: packages/dmworkbase/src/Service/APIClient.ts:96.
I could not run the targeted Vitest suite in this checkout because dependencies are not installed and pnpm exec vitest is unavailable.
lml2468
left a comment
There was a problem hiding this comment.
Review: octo-web #274 — fix(App): eliminate clientMsgDeviceId=0 race
Commit: 4411d64
Verdict: APPROVED
Blocking
- None.
Non-blocking
loadCachedNumericDeviceId()usesparseInt, so malformed stored values with numeric prefixes like42abcor1.5would be accepted as valid ids. Since this key is only written by the new helper, I do not consider this blocking, butNumber(raw)plusNumber.isInteger()would match the helper contract more tightly.- The new test coverage is helper-level. A focused
startMain()ordering test would better lock the actual race invariant: cache hit setsWKSDK.config.clientMsgDeviceIdbeforeconnectIM(), and cache miss awaits the device fetch beforeconnectIM().
Summary
Reviewed only the stacked delta from c0ed0f218a4c to 4411d6498603: App.tsx, clientMsgDeviceIdCache.ts, and the new cache tests. The implementation preserves the intended ordering: cache hits synchronously set the SDK numeric device id before opening the WebSocket, cache misses await /user/devices/{deviceId} before connecting, and the 5s timeout degrades to the prior clientMsgDeviceId=0 behavior rather than wedging startup. Plan F stale-device recovery is also handled correctly: fetch rejections are consumed, loggingOut prevents post-logout connect/sync work on the cache-miss path, and clearLocalLoginState() clears the numeric-id cache.
Verification: pnpm exec vitest run packages/dmworkbase/src/Service/__tests__/clientMsgDeviceIdCache.test.ts passed locally (11 tests). A direct tsc -p packages/dmworkbase/tsconfig.json --noEmit is not a useful signal in this repo state because it fails broadly on pre-existing dependency/storybook/test-global typing issues unrelated to this delta.
REVIEW_STATE: APPROVED
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #274 (octo-web)
Verdict: APPROVED
This change cleanly closes the clientMsgDeviceId = 0 race for the common cache-hit path and meaningfully narrows it for first login, with no server or SDK changes. The new helper is small, well-guarded, and well-tested. No P0/P1 blocking issues were found, including a focused pass on the auth/token boundary (this PR was flagged security-sensitive). The notes below are non-blocking hardening suggestions.
What was reviewed
The 4 #256 commits only (the diff also contains 22 commits belonging to #255 / Plan F, reviewed separately):
packages/dmworkbase/src/Service/clientMsgDeviceIdCache.ts(new helper)packages/dmworkbase/src/Service/__tests__/clientMsgDeviceIdCache.test.ts(new, 11 tests)packages/dmworkbase/src/App.tsx(startMaindual-track,clearLocalLoginStatecache clear)
Verification
- ✅ Cache-hit fast path closes the race.
loadCachedNumericDeviceId()returnsnullfor"0", negatives, NaN and empty (clientMsgDeviceIdCache.ts:28), so a non-null cache value is guaranteed> 0. It is assigned toWKSDK.config.clientMsgDeviceIdbeforeconnectIM()(App.tsx:861-862). WS never opens with0on this path. ✅ - ✅ First-login path awaits before connecting with a 5s
Promise.racetimeout safety net (App.tsx:874,949-963). On success within 5s the race is also closed; on timeout/non-stale-error it gracefully degrades to the pre-#256 behavior (WS opens with0, in-flight fetch updates later). This is documented in the inline comments. ✅ - ✅ Self-handled internal promises. Both
fetchAndCacheDeviceIdandawaitDeviceIdOrTimeoutchain an inline.catch, sofetchPromiseis non-rejecting and only the timeout can rejectPromise.race— noUncaught (in promise)from the device fetch. ✅ - ✅
loggingOutshort-circuit after the await (App.tsx:882) correctly preventsconnectIM/contactsSync/prohibitWordsSyncfrom firing against a just-cleared token when Plan F's stale-device interceptor triggers logout during the await. The.thenhandlers also re-checkloggingOut(902,934). ✅ - ✅ Cache cleared on logout.
clearCachedNumericDeviceId()is wired intoclearLocalLoginState(App.tsx:988), which runs on every logout path (logout,logoutUserInitiated, OIDC post-logout cleanup). A same-tab user-switch therefore always re-traverses the first-login flow → no risk of one user's device id leaking into another's session. ✅ - ✅ Storage scope is correct. The cache key
clientMsgDeviceIdNumericis not inCROSS_TAB_KEYS, so it lives insessionStorageonly (per-tab) — same scope as thedeviceIdUUID, keeping each tab's(UUID, numeric id)pair consistent. ✅ - ✅ Save is defensive.
saveNumericDeviceIdno-ops onn <= 0/ non-finite and swallows storage write failures (Safari private mode / quota), degrading to the first-login path rather than throwing. ✅ - ✅ Unit tests cover the helper's load/save/clear and the important guards (rejecting
"0", negatives, NaN; quota-exceeded swallow; round-trip). 11 tests. ✅
Non-blocking suggestions
P2 — Unguarded fire-and-forget at the startMain call sites. startMain is now async but is called without .catch at App.tsx:697 and 701. A synchronous throw originating in startMain's own body (e.g. from connectIM()) would now surface as an Uncaught (in promise) rather than a synchronous exception. This is a label change, not a regression (the same throw was unhandled before), and realistic throw probability is low — but since the rest of this PR carefully eliminates unhandled rejections, consider void this.startMain().catch(e => console.warn("[startMain] failed", e)); for consistency.
P2 — Verification wording overstates "race eliminated". The header comment / linked issue say the WS is never opened with clientMsgDeviceId=0. That is exactly true only for cache hits; for first login it is closed on success within 5s and falls back to pre-#256 behavior on timeout or non-stale fetch error. The code itself is correct and the comments do concede the fallback — just suggest the summary read "eliminated for cache hits; narrowed (with timeout fallback) for first login."
Nit — if (this.loggingOut) return; is on the first-login branch but not the cache-hit branch (App.tsx:857-866). The cache-hit branch is fully synchronous so the window is essentially unreachable, and a reload() is already pending in any logout, so worst case is a benign 401. Hoisting the guard to the top of startMain would cover both branches uniformly — cheap defense-in-depth.
Nit — if (res?.id) is a truthiness guard. A negative res.id (server contract violation) would be assigned to the SDK while saveNumericDeviceId rejects it. The producing endpoint (GET /user/devices/:id) returns a positive int64, so this is not reachable in practice; validating Number.isFinite(n) && n > 0 once before both the SDK assignment and the cache write would make the two paths share a single validated value.
Nit — test coverage is helper-only. The async orchestration in startMain (timeout race, cache-hit/miss branch selection, loggingOut short-circuit) has no direct tests; the defect-prone pure logic (the validation guards) is fully covered, so this is acceptable, but a couple of startMain tests with a mocked apiClient and fake timers would lock in the timeout/short-circuit behavior.
For human verification (security-sensitive)
- Confirm
GET /user/devices/{deviceId}returnsidas a JSON number (not a quoted string). The backend struct (int64, no,stringtag) indicates a number, which is what makessaveNumericDeviceIdpersist correctly; if that ever changes, the cache would silently stop persisting (theNumber.isFinite(n) && n > 0suggestion above would also future-proof this). - Confirm there is no logout path that bypasses
clearLocalLoginState(all known paths route through it), so the per-tab device-id cache can never survive an auth-session change.
4411d64 to
9039a31
Compare
❌ PR title does not follow Conventional CommitsYour title: Expected format: Allowed types: Examples:
This repository uses squash-and-merge, so the PR title becomes the commit subject on |
mochashanyao
left a comment
There was a problem hiding this comment.
[Octo-Q · automated review]
Verdict: Approve — no blocking findings; notes below (data-flow traced).
octo-web PR#274 Review Report
Reviewer: Octo-Q (automated review)
PR: #274
Head SHA: 9039a31717cebd17bb0ff70f44c5736d71b91d59
Title: fix(App): eliminate clientMsgDeviceId=0 race via sessionStorage cache (closes #256, depends on #255)
Routing: complexity=security_sensitive (automated review)
1. Verification Summary
| Item | Status | Evidence |
|---|---|---|
| Race elimination (cache-hit) | ✅ | App.tsx:861 sets clientMsgDeviceId = cached BEFORE connectIM() at :862 |
| Race elimination (cache-miss) | ✅ | awaitDeviceIdOrTimeout resolves with res.id set at :936 before post-await connectIM() at :884 |
| Timeout safety net | ✅ | Promise.race with 5s timer at :947-949; catch at :951-959 logs and falls back to SDK default |
loggingOut guard (post-await) |
✅ | :882 short-circuits if Plan F fired during await |
loggingOut guard (background refresh) |
✅ | :898 in fetchAndCacheDeviceId().then() skips writes if logout in-flight |
| Cache clear on logout | ✅ | clearLocalLoginState() at :991 calls clearCachedNumericDeviceId() |
| Plan F interceptor mutual exclusion | ✅ | APIClient.ts:92-100 uses else if — at most one callback per error |
| Unhandled rejection prevention | ✅ | Both fetchAndCacheDeviceId and awaitDeviceIdOrTimeout have .catch() chains; fetchPromise is never-rejecting |
startMain() async safety |
✅ | Callers at :697/:701 don't await, but awaitDeviceIdOrTimeout cannot reject (self-handled), so no unhandled promise rejection |
| Test coverage | ✅ | 4 new test files: clientMsgDeviceIdCache.test.ts (11 tests), deviceFailure.test.ts (6 tests), APIClient.staleResource.test.ts (6 tests), apiError.test.ts additions (3 tests) |
2. Findings
No P0 or P1 issues found.
P2-1: { code } as any type cast in staleLocalResourceCallback wiring
File: module.tsx:218
Diff-scope: new (introduced by this PR)
handleDeviceFetchError(
{ code } as any, // ← only `code` populated; rest of APIClientRejectedError is undefined
{ ... }
);handleDeviceFetchError currently only reads err.code, so this works. But the as any cast masks the incomplete construction. If a future refactor adds field access to handleDeviceFetchError (e.g., err.status for logging), it would silently get undefined.
Suggestion: Either construct a fuller APIClientRejectedError stub, or refactor handleDeviceFetchError to accept just { code: string } as its first argument (since that's all it uses).
P2-2: else if ordering in APIClient interceptor creates latent shadowing risk
File: APIClient.ts:92-100
Diff-scope: new (Plan F interceptor addition)
if (isAuthExpiredApiError(normalized) && self.logoutCallback) {
self.logoutCallback()
} else if (
isStaleLocalResourceApiError(normalized) && ...
) {
self.staleLocalResourceCallback(normalized.code)
}Currently safe because:
- Auth codes (
err.shared.auth.*) and stale codes (err.server.user.device_not_found) are disjoint sets. - Server returns
http_status=404in body for device_not_found, soisAuthExpiredApiError(which falls back tohttpStatus === 401) doesn't match.
Latent risk: If the server ever changes body.error.http_status to 401 for device_not_found, isAuthExpiredApiError would match first via the httpStatus fallback, and the stale-device handler would never fire. The stale UUID would persist in storage → post-reload re-use → infinite logout loop.
Suggestion: Consider checking stale-resource codes BEFORE auth-expired (swap the if/else if order), or add a code-based exclusion in isAuthExpiredApiError that skips known stale codes. Low urgency — the current server contract makes this unreachable.
P2-3: startMain() callers don't handle the returned promise
File: App.tsx:697, App.tsx:701
Diff-scope: amplified (function changed from sync to async)
WKApp.endpoints.addOnLogin(() => {
this.startMain(); // ← returned promise is discarded
});
if (WKApp.loginInfo.isLogined()) {
this.startMain(); // ← same
}Currently safe because awaitDeviceIdOrTimeout is designed to never reject (self-handled .catch() chain). However, any future code added after the await that could throw would produce an unhandled rejection.
Suggestion: Add .catch(e => console.error(...)) to the call sites, or add a void prefix (void this.startMain()) to signal intentional discard.
3. Suggestions
- P2-1 fix: Change
handleDeviceFetchErrorsignature to(err: Pick<APIClientRejectedError, 'code'>, handlers: ...)to match actual usage and eliminate theas any. - P2-2 fix: Not urgent. Add a comment in
APIClient.tsnear theelse ifdocumenting the mutual-exclusion assumption on code sets. - P2-3 fix: Add
voidprefix or.catch()at the twostartMain()call sites.
4. Additional Observations
- Double handling of stale-device errors (benign): Both the APIClient interceptor (→
staleLocalResourceCallback→handleDeviceFetchError→logout()) AND the.catch()infetchAndCacheDeviceId/awaitDeviceIdOrTimeoutprocess the same rejection. The.catch()correctly identifiesdevice_not_foundand stays silent. TheloggingOutguard prevents post-logout side effects. No bug, but worth noting for future maintainers. - Concurrent stale-device responses: The test
"concurrent stale device responses each invoke callback"(APIClient.staleResource.test.ts) confirms N callbacks fire for N concurrent stale responses. TheloggingOutone-way guard inlogout()correctly handles this — second+ calls short-circuit atif (this.loggingOut) return. - sessionStorage scope:
clientMsgDeviceIdNumericis stored in sessionStorage (per-tab) viaStorageService. This matches the UUID scope, keeping(UUID, numeric id)pairs consistent per tab. New tabs fall through to the first-login path — correct behavior.
5. Data-Flow Trace
Cache-hit path (refresh / subsequent session)
loadCachedNumericDeviceId()
→ StorageService.shared.getItem("clientMsgDeviceIdNumeric")
→ sessionStorage.getItem("clientMsgDeviceIdNumeric")
→ parseInt(raw, 10) → n > 0 ? n : null
→ returns number | null
[cached !== null]:
WKSDK.shared().config.clientMsgDeviceId = cached ← SET BEFORE CONNECT
this.connectIM() ← WS opens with real id
WKApp.dataSource.contactsSync()
ProhibitwordsService.shared.sync()
this.fetchAndCacheDeviceId() ← background refresh
→ WKApp.apiClient.get(`/user/devices/${deviceId}`)
→ .then: if (!loggingOut && res?.id) → update SDK config + saveNumericDeviceId
→ .catch: log non-stale errors; stale handled by Plan F interceptor
Cache-miss path (first login / new tab)
awaitDeviceIdOrTimeout(5000)
→ fetchPromise = WKApp.apiClient.get(`/user/devices/${deviceId}`)
→ .then: if (!loggingOut && res?.id) → set SDK config + saveNumericDeviceId
→ .catch: log non-stale errors (never rejects — self-handled)
→ timeoutPromise = setTimeout 5s → reject("FIRST_LOGIN_TIMEOUT")
→ Promise.race: try { await race } catch { log timeout warning }
[post-await]:
if (this.loggingOut) return ← Plan F guard
this.connectIM() ← WS opens with real id (or 0 on timeout)
WKApp.dataSource.contactsSync()
ProhibitwordsService.shared.sync()
Plan F stale-device interceptor chain
API response: { code: "err.server.user.device_not_found", http_status: 404 }
→ APIClient response interceptor
→ normalizeApiError → { code: "err.server.user.device_not_found", httpStatus: 404 }
→ isAuthExpiredApiError? NO (code not in authExpiredCodes, httpStatus ≠ 401)
→ isStaleLocalResourceApiError? YES (code in staleLocalResourceCodes)
→ staleLocalResourceCallback("err.server.user.device_not_found")
→ module.tsx callback
→ handleDeviceFetchError({ code })
→ removeDeviceId(): StorageService.shared.removeItem("deviceId")
→ resetInMemoryDeviceId(): WKApp.shared.deviceId = ""
→ triggerLogout(): WKApp.shared.logout()
→ if (loggingOut) return ← idempotent guard
→ loggingOut = true
→ clearLocalLoginState()
→ loginInfo.logout()
→ clearAuthStorage()
→ clearCachedNumericDeviceId() ← clears numeric cache
→ window.location.reload()
Logout cache-clear path
WKApp.shared.logout()
→ loggingOut = true (one-way guard)
→ clearLocalLoginState()
→ WKApp.loginInfo.logout()
→ clearAuthStorage() ← clears token/uid/etc, NOT deviceId
→ clearCachedNumericDeviceId() ← clears "clientMsgDeviceIdNumeric"
→ window.location.reload()
6. R5 Blind-Spot Checklist (security_sensitive)
C1 — Dual-path parity:
- Cache read ↔ write ↔ clear:
loadCachedNumericDeviceId/saveNumericDeviceId/clearCachedNumericDeviceId— all three are present and correctly wired. Cache is cleared inclearLocalLoginState()(logout path). ✅ Clear. fetchAndCacheDeviceId(background) vsawaitDeviceIdOrTimeout(first-login): both have identicalloggingOutguards and.catch()handling. Symmetric. ✅ Clear.- No create/delete asymmetry found.
C2 — Control-flow ordering / nesting reuse:
loggingOutone-way guard: checked at 3 points (post-await:882, background.then():898,logout():1001). All correctly ordered — guard is set BEFOREclearLocalLoginState()and BEFOREwindow.location.reload(). ✅ Clear.else ifin APIClient interceptor: mutual exclusion verified (see P2-2 above for latent risk). ✅ Clear for current server contract.Promise.raceordering:fetchPromiseis never-rejecting (.catch()chained inline), so onlytimeoutPromisecan reject. Thetry/catchcorrectly handles this. ✅ Clear.
C3 — Authorization boundary ≠ capability boundary:
- The cached value is a numeric integer (server-issued device id). No secrets, tokens, or PII stored in the cache. ✅ N/A.
sessionStoragescope is per-tab, matching the UUID scope. No cross-tab leakage. ✅ Clear.- No new endpoints or tools exposed by this PR. ✅ N/A.
Verdict
No P0 or P1 issues found. Three P2 findings (type cast fragility, latent interceptor shadowing, unhandled promise discard) are all maintainability concerns that don't affect current production behavior.
The race elimination design is sound: cache-hit sets the id before connectIM(), cache-miss awaits the fetch with a timeout safety net, and the loggingOut one-way guard correctly prevents post-logout side effects. Data-flow tracing confirms all consumed values (cached id, fetched id, loggingOut flag) are properly populated at their source and correctly flow to their consumption points.
Test coverage is thorough: 26 new tests across 4 test files covering the cache module, device failure handler, APIClient interceptor, and error classification.
[Octo-Q] verdict: APPROVE + 理由:No P0/P1 findings. Three P2 maintainability suggestions (type cast, interceptor ordering comment, promise discard) don't block landing. Race elimination is correctly implemented with proper guards and comprehensive test coverage.
Jerry-Xin
left a comment
There was a problem hiding this comment.
Summary: The PR is relevant to octo-web and the main approach is sound: it sets clientMsgDeviceId before IM connection on cache hits, awaits the first fetch on cache misses, clears cache on logout, and handles stale-device recovery centrally.
💬 Non-blocking
🟡 Warning — Cache identity is only keyed by clientMsgDeviceIdNumeric, not by uid or deviceId.
packages/dmworkbase/src/Service/clientMsgDeviceIdCache.ts:10 and packages/dmworkbase/src/App.tsx:855
If a same-tab account switch happens without the normal logout path, or if deviceId changes independently, the cache-hit path can briefly connect using a numeric ID from a previous identity/device until the background refresh resolves. Consider storing { uid, deviceId, numericId } and validating both before using the fast path.
🔵 Suggestion — Tighten numeric parsing.
packages/dmworkbase/src/Service/clientMsgDeviceIdCache.ts:27
parseInt("123abc", 10) returns 123, despite the comment saying the stored value must be a positive finite integer. Number(raw) plus Number.isInteger() would better match the contract and tests could cover malformed numeric strings.
🔵 Suggestion — Add focused tests for App.startMain.
packages/dmworkbase/src/App.tsx:854
The helper and interceptor tests are useful, but the highest-risk behavior is orchestration: cache hit sets SDK config before connectIM, cache miss awaits before connectIM, timeout falls back, and stale-device logout short-circuits. A small mocked WKApp/SDK test would lock down the actual race fix.
✅ Highlights
The cache-miss path correctly awaits the device fetch before opening IM, with a bounded timeout fallback. The loggingOut guard after the await is a good regression guard against stale-device logout side effects.
The interceptor keeps auth-expired and stale-local-resource recovery mutually exclusive, which avoids double recovery on one response.
I could not run the focused tests in this checkout because the available package scripts/tooling did not expose a working test/vitest command from this environment.
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #274 (octo-web)
Summary
This PR eliminates the clientMsgDeviceId=0 race (issue #256) by caching the server-issued numeric device id in sessionStorage and reading it back before WKSDK.connect(). It also bundles a generic "stale local resource" recovery path (Plan F): an API-client interceptor branch that classifies err.server.user.device_not_found by error code (not HTTP status) and routes it to a central dispatcher that clears the device id and forces logout.
The change is well-structured, defensively coded, and the extracted helpers (clientMsgDeviceIdCache, deviceFailure, apiError.isStaleLocalResourceApiError, and the APIClient interceptor branch) are covered by 40 passing unit tests (all green locally).
Verdict: APPROVE. No P0/P1 (correctness / security / data-loss / build-break) issues found. A handful of P2/maintainability nits are listed below — none block merge.
What I verified (✅)
- Core race fix is correct. In the cache-hit path,
WKSDK.shared().config.clientMsgDeviceId = cachedis assigned beforeconnectIM()(App.tsx:854-861), so the WS is never opened withclientMsgDeviceId=0when a valid cache exists. On cache miss the code awaits the device fetch (with a 5s timeout) before connecting (App.tsx:879-901). - Mutual exclusivity of recovery callbacks. The interceptor uses
else if(APIClient.ts:95-103) so a single error response triggers at most one oflogoutCallback/staleLocalResourceCallback. Confirmed by theauth-expiredandlogin_device_expired(HTTP 401, cousin code) tests inAPIClient.staleResource.test.ts. - Code-only gating for stale device.
isStaleLocalResourceApiErrormatches the string code only (apiError.ts:93-95), which is the stable contract given the backend returns HTTP 400 withbody.http_status=404. The "regardless of status field value" test indeviceFailure.test.tslocks this in. - Cross-user safety on same tab.
clearCachedNumericDeviceId()is called fromclearLocalLoginState()(App.tsx:991), so a relogin / user-switch on the same tab re-traverses the first-login flow and cannot reuse a previous user's numeric id.sessionStorage(per-tab) scope matches the existingdeviceIdUUID scope. - Logout is wedge-proof. The
loggingOutin-flight guard (App.tsx:1013-1014) plus thetry/catcharoundclearLocalLoginState()(App.tsx:1024-1028) guaranteewindow.location.reload()always runs even if storage writes throw (Safari private mode / QuotaExceededError) — preventing a permanent short-circuit wedge. - Task E short-circuit. After the cache-miss await,
if (this.loggingOut) return(App.tsx:892) preventsconnectIM/contactsSync/prohibitWordsSyncfrom firing against a just-cleared token when Plan F's interceptor already triggered logout — avoidingUncaught (in promise)401 noise. - No unhandled rejections from
async startMain. Both call sites (App.tsx:707,711) are fire-and-forget, butstartMainnever rejects: the cache-hit branch returns synchronously, and the cache-miss branch'sawaitDeviceIdOrTimeoutswallows all errors internally (Promise.racewrapped in try/catch; thefetchPromisehas its own inline.catch, so only the timeout can reject the race and that is caught). - No secret/credential exposure. The cached value is a server-issued numeric routing id, not a token or PII. Security-sensitive review of the auth/logout paths surfaced nothing that leaks or weakens auth state.
Findings
P2 — Duplicated fetch/then/catch logic between fetchAndCacheDeviceId and awaitDeviceIdOrTimeout
App.tsx:909-959. The .then body (if (this.loggingOut) return; if (res?.id) { set clientMsgDeviceId; saveNumericDeviceId }) and the .catch body (warn on non-stale code) are duplicated almost verbatim across the two methods. Extracting a single private applyDeviceIdResult(res) / logNonStaleFetchError(err) pair would remove the drift risk (e.g., if the loggingOut guard or the cache-write rule changes in one place but not the other). Not blocking.
P2 — startMain dual-track integration is not unit-tested
The extracted helpers are well covered, but the orchestration in startMain itself — cache-hit fast path, cache-miss await, the 5s timeout fallback, and the if (this.loggingOut) return short-circuit — has no direct test (no App.test additions in this PR). These are the highest-value behaviors of #256. A focused test that stubs loadCachedNumericDeviceId / WKApp.apiClient.get and asserts (a) cache hit sets clientMsgDeviceId before connectIM, (b) the loggingOut short-circuit skips connectIM, would lock in the fix against future regressions. Suggested follow-up, not a merge blocker.
P2 — { code } as any in the dispatcher
module.tsx:218. The Plan F dispatcher constructs a partial APIClientRejectedError as { code } as any. handleDeviceFetchError only reads err.code, so this is harmless today, but it bypasses the type system; if handleDeviceFetchError ever starts reading another field it will silently get undefined. Consider narrowing the handler's parameter type to Pick<APIClientRejectedError, "code"> so the dispatcher can pass { code } without the as any.
Nit — saveNumericDeviceId silently drops non-numeric res.id
clientMsgDeviceIdCache.ts:saveNumericDeviceId guards on Number.isFinite(n) && n > 0, so if the backend ever returns res.id as a numeric string, the cache silently never populates (each session falls back to the first-login fetch). The original code assigned res.id directly to clientMsgDeviceId, implying it is numeric — so this is fine today. Flagging only as a robustness note if the API contract is not strictly typed.
Notes for human verification (security-sensitive)
- Confirm the backend contract:
err.server.user.device_not_foundis returned as HTTP 400 withbody.error.http_status=404. The code-only classifier depends on the code string remaining stable; if the backend renames it, stale-device recovery silently stops working (no test can catch a server-side rename). Worth a contract test or a shared constant if one exists across services. - Confirm that forcing logout on
device_not_foundis the desired UX (vs. silently re-registering the device). Current behavior: clear deviceId UUID + numeric cache, thenwindow.location.reload()→ re-bootstrap. This is reasonable, but it is a user-visible session drop, so it should be an intentional product decision.
Conclusion
Correctness and security checks pass; the fix targets the documented race precisely and degrades gracefully (timeout → current pre-#256 behavior). The findings above are maintainability/test-coverage improvements, not blockers.
lml2468
left a comment
There was a problem hiding this comment.
PR Review: clientMsgDeviceId=0 race fix (closes #256)
Change overview
- 11 files, +649/-7; linked to #256.
- Eliminates the
clientMsgDeviceId=0race by caching the numeric device id insessionStorageand gating first-login on a device fetch (with a 5s timeout fallback), plus a central stale-local-resource interceptor that clears the device id and logs out ondevice_not_found.
Verification
- Tests pass: true; no new failures: true (baseline 57 -> 53).
- Build: true; Lint: true.
Findings
- 2 blocking issues; 9 minor/nit issues.
- 9 initially-suspected findings were dropped after verification as false positives.
The two blocking issues are both about missing test coverage on the highest-risk new logic:
App.startMaindual-track logic (cache-hit / first-login timeout /loggingOutshort-circuit) has zero coverage of the #256 race-fix correctness guarantees.logout()'s concurrent in-flight guard and theclearLocalLoginStatetry/catch (a clearly-described permanent-stuck failure mode) are untested.
The fix itself looks functionally correct and is a real improvement; the blockers are about protecting these invariants with regression tests before merge.
Verdict
Request changes — please add coverage for the two blocking items above. The minor/nit items are non-blocking quality improvements.
| * | ||
| * Issue #256: this read happens BEFORE WKSDK.connect() in App.startMain to | ||
| * eliminate the race window where outbound messages would carry | ||
| * clientMsgNo = <uuid>_0_3 instead of <uuid>_<real-id>_3. |
There was a problem hiding this comment.
[nit] loadCachedNumericDeviceId uses parseInt, accepting trailing garbage.
parseInt(raw, 10) returns 42 for values like "42abc" or "42.9" instead of rejecting them. Currently saveNumericDeviceId only writes String(positiveInt), so the normal path never produces such values; but if sessionStorage is written with dirty data by an external/older version, it would be silently accepted as a valid id. Tests only cover "not-a-number" (pure non-numeric -> NaN -> null), not partially-parseable garbage like "123abc".
Suggestion: use Number(raw) combined with Number.isInteger, or an explicit /^\d+$/ check before parsing, so any invalid input falls back to the first-login path.
| // Cache miss → await device fetch with 5s timeout, then connectIM. | ||
| // Either way, the WS is never opened with clientMsgDeviceId=0 unless the | ||
| // first-login fetch times out (graceful degradation to current behavior). | ||
| async startMain() { |
There was a problem hiding this comment.
[minor] First-login path serializes IM connect / contacts sync / prohibited-words sync behind the device fetch (blocks up to 5s).
The original startMain() (main branch) was synchronous: connectIM(), contactsSync(), and ProhibitwordsService.shared.sync() fired immediately, in parallel with the /user/devices fetch. After this change, the cache-hit path (L850-859) stays parallel and is a net perf win; but the cache-miss (first login) path at L866 runs await this.awaitDeviceIdOrTimeout(5000), deferring connectIM/contactsSync/prohibitWordsSync (L875-877) until the device fetch resolves or the 5s timeout elapses. On first login the WS connection, contacts sync, and prohibited-words sync all wait an extra device-API RTT, and in the worst case (server hangs) IM connect is delayed a full 5s.
This is an intentional trade-off to fix the #256 clientMsgDeviceId=0 race and is functionally correct, but it is a real latency regression on the first-login path.
Suggestion: if IM connect doesn't require clientMsgDeviceId to be in place at the exact moment of connect (the SDK may read config.clientMsgDeviceId after connecting), consider also calling connectIM()/contactsSync()/sync() immediately on the first-login path and gating only the first outbound message on device-id readiness, eliminating this serial wait; or lower the 5s timeout to something commensurate with acceptable first-message latency (e.g. 2-3s) to shrink the worst-case blocking window. If product confirms one extra RTT on first login is acceptable, keep as-is and let this comment record the trade-off.
| this.connectIM(); | ||
| WKApp.dataSource.contactsSync(); // 同步通讯录 | ||
| ProhibitwordsService.shared.sync(); // 同步敏感词 | ||
| // Issue #256: dual-track to eliminate the clientMsgDeviceId=0 race window. |
There was a problem hiding this comment.
[major] The App.startMain dual-track logic (cache-hit / first-login timeout / loggingOut short-circuit) has zero test coverage.
The highest-risk, most-branched new logic in this PR lives in startMain / awaitDeviceIdOrTimeout / fetchAndCacheDeviceId, yet nothing under packages/dmworkbase/src exercises them (grep startMain|awaitDeviceIdOrTimeout|fetchAndCacheDeviceId|loggingOut hits 0 test files). Uncovered critical paths:
- cache hit -> fast path + background refresh;
- cache miss -> after 5s timeout, continue with default
clientMsgDeviceId=0(graceful degradation); - a late fetch after timeout still writes back to cache without an
Uncaught (in promise); if (this.loggingOut) returnshort-circuit -- after the stale-device interceptor triggers logout,startMainno longer callsconnectIM/contactsSync(the comment specifically notes this avoids 401 noise on an already-cleared token).
These are exactly the correctness guarantees of the #256 race-window fix, currently resting on manual reasoning with no regression net.
Suggestion: add integration tests for the WKApp instance methods (mock WKApp.apiClient / WKSDK / loadCachedNumericDeviceId) asserting at least: on cache hit, clientMsgDeviceId is optimistically set and connectIM is called immediately; the timeout branch does not throw an unhandled rejection; and when loggingOut=true, connectIM is not called.
| @@ -883,7 +997,27 @@ export default class WKApp extends ProviderListener { | |||
|
|
|||
| // 登出 | |||
| logout() { | |||
There was a problem hiding this comment.
[major] logout() concurrent in-flight guard and the clearLocalLoginState try/catch are untested.
logout() adds two defensive pieces: if (this.loggingOut) return to prevent concurrent re-entry, and a try/catch wrapping clearLocalLoginState (the comment explicitly notes: if clearLocalLoginState throws in Safari private mode / on QuotaExceeded with no catch, loggingOut stays permanently true and the reload is skipped, leaving the user permanently stuck). This is a clearly described severe failure mode, yet no test verifies 'still reload on throw' or 'concurrent logout only runs clearLocalLoginState once'. deviceFailure.test.ts only tests the three handler calls of the pure handleDeviceFetchError, and does not cover these two new WKApp.logout invariants at all.
Suggestion: add a test mocking clearLocalLoginState to throw, asserting window.location.reload is still called and loggingOut is set; and a test calling logout() twice in a row, asserting clearLocalLoginState runs only once.
| // can be added to the staleLocalResourceCodes set in apiError.ts and | ||
| // routed here without touching the interceptor or business catch sites. | ||
| APIClient.shared.staleLocalResourceCallback = (code: string) => { | ||
| if (code === "err.server.user.device_not_found") { |
There was a problem hiding this comment.
[minor] The staleLocalResourceCallback dispatcher closure in module.tsx is untested; key handler behavior is unverified.
module.tsx assembles the real recovery handlers inside staleLocalResourceCallback: removeDeviceId = StorageService.shared.removeItem('deviceId'), resetInMemoryDeviceId = WKApp.shared.deviceId='', triggerLogout = WKApp.shared.logout(). deviceFailure.test.ts verifies these three handlers are called in the right order (with vi.fn() placeholders), but the wiring -- 'is the key passed to removeItem actually "deviceId"', 'is deviceId actually cleared', 'does a non-matching code avoid triggering' -- is never tested. APIClient.staleResource.test.ts only verifies the interceptor invokes the callback with the correct code, stopping at the callback boundary. Net result: the end-to-end chain from server device_not_found to actually clearing the local deviceId has a blind spot at the module.tsx assembly layer. The order assertion in deviceFailure (its comment stresses removeDeviceId must precede triggerLogout, else a reload race reuses the stale UUID and loops forever) protects placeholder fns, not the real production handlers in module.tsx.
Suggestion: add tests for the callback registered in module.tsx, asserting that on device_not_found StorageService.removeItem is called with 'deviceId', WKApp.shared.deviceId becomes empty, and a different code does not trigger.
| APIClient.shared.staleLocalResourceCallback = (code: string) => { | ||
| if (code === "err.server.user.device_not_found") { | ||
| handleDeviceFetchError( | ||
| { code } as any, |
There was a problem hiding this comment.
[nit] module.tsx fakes an APIClientRejectedError with { code } as any, bypassing the type contract.
handleDeviceFetchError's signature declares its first parameter as a full APIClientRejectedError (deviceFailure.ts:23), but the only call site (module.tsx:218) passes { code } as any -- a code-only object with the type check erased via as any. This shows the function's parameter type (the whole rejected error) is over-engineered: it actually only needs the code string.
This is a sign of a mismatch between the abstraction boundary and real usage: the function claims to consume APIClientRejectedError (and deviceFailure.test.ts's makeRejected builds a pile of error/msg/normalized fields), but the production path has no real rejected-error object to pass (the interceptor only forwards code to the callback). The as any will mask breakage from future APIClientRejectedError shape changes.
Suggestion: if keeping the helper, narrow the signature to handleDeviceFetchError(code: string, handlers) or ({ code }: Pick<APIClientRejectedError,'code'>, handlers) to remove the as any; if the layer is deleted per the related comment, this is resolved too.
| WKApp.dataSource.contactsSync(); | ||
| ProhibitwordsService.shared.sync(); | ||
| this.fetchAndCacheDeviceId(); | ||
| return; |
There was a problem hiding this comment.
[nit] Device-error recovery has two paths; responsibility is split across the interceptor and App.startMain.
The same device_not_found failure is handled both via the central APIClient interceptor -> staleLocalResourceCallback -> logout path, and again explicitly distinguished in the .catch of App.startMain's fetchAndCacheDeviceId / awaitDeviceIdOrTimeout (if (err?.code !== 'err.server.user.device_not_found') then warn, else comment 'handled centrally'). That is, the device-fetch catch sites must know which codes are centrally handled vs. self-logged, leaking the interceptor's classification knowledge back into business catch blocks; the two must stay in sync.
This is a coupling point: ownership of device_not_found (central vs. local) is simultaneously encoded in apiError.ts's staleLocalResourceCodes and in the string comparisons in both App.tsx catches. Adding a new stale code means that if you forget to update the App.tsx catch, that code gets incorrectly warned as 'non-stale'.
Suggestion: have the catch sites stop hardcoding the code string and reuse isStaleLocalResourceApiError(err) to decide whether to stay silent (sharing one classification source with the interceptor), eliminating the string duplication; or document in a comment that this is intentional ('interceptor already handled; this is just a fallback log') and accept the coupling.
| this.connectIM(); | ||
| WKApp.dataSource.contactsSync(); // 同步通讯录 | ||
| ProhibitwordsService.shared.sync(); // 同步敏感词 | ||
| // Issue #256: dual-track to eliminate the clientMsgDeviceId=0 race window. |
There was a problem hiding this comment.
[minor] Comments reference internal planning codenames (Plan F / Task E / Task 9 / Risk A / bundled fix #1) that are meaningless to later readers.
New code uses internal planning jargon as the subject of many comments: // Plan F: ... (App.tsx:849, 888, 919, 1000), // Task E guard: (875), // Plan F Task 9 hardening: (~1008), // Risk A guard: (895, 928), plus // Plan F: blocks in apiError.ts:33, deviceFailure.ts, module.tsx:207. These codenames exist only in this PR/issue discussion; once merged, readers cannot look up what 'Plan F' / 'Task E' mean, so the comments lose explanatory value and add noise.
Suggestion: drop the codename prefixes, keep the substantive 'what/why'. E.g. rewrite // Task E guard: if the await resolved because Plan F's interceptor fired ... as // if the device fetch triggered staleLocalResourceCallback -> logout(), loggingOut is set and a reload is imminent; return here to avoid 401 noise from requests on an already-cleared token. Put issue-linking info in one place (e.g. // see #256 / octo-web#76) instead of repeating it in every block.
| // APIClient.normalizeApiError reads the body's inconsistent `http_status` | ||
| // field (=404) into err.status, not the actual HTTP status code. The code | ||
| // string is the stable backend contract — that's what we match on. | ||
| if (err.code === 'err.server.user.device_not_found') { |
There was a problem hiding this comment.
[minor] The magic string err.server.user.device_not_found is scattered across 5 places with no single source of truth, hurting readability and consistency.
The same backend error-code string is hardcoded in: apiError.ts:41 (the staleLocalResourceCodes set), App.tsx:909 and 941 (two .catch blocks' err?.code !== "err.server.user.device_not_found"), deviceFailure.ts:30, and module.tsx:214 (if (code === "err.server.user.device_not_found")). A reader must cross-check 5 locations to confirm they refer to the same code, and any spelling drift in one place won't be caught by the compiler.
Suggestion: export a named constant in apiError.ts (e.g. export const DEVICE_NOT_FOUND_CODE = "err.server.user.device_not_found") and reference it everywhere; the set and every check then derive from a single source, improving readability and removing the spelling-mismatch risk.
| // | ||
| // Risk A guard: same as fetchAndCacheDeviceId — skip writes if logout | ||
| // started between dispatch and resolve. | ||
| private async awaitDeviceIdOrTimeout(timeoutMs: number): Promise<void> { |
There was a problem hiding this comment.
[minor] fetchAndCacheDeviceId and awaitDeviceIdOrTimeout contain duplicated .then/.catch device-fetch blocks.
App.tsx:898-916 and 931-948 each inline a nearly verbatim WKApp.apiClient.get(/user/devices/${...}).then(res => { if(loggingOut) return; if(res?.id){ set + saveNumericDeviceId } }).catch(err => { if(err?.code !== "...device_not_found") console.warn(...) }) block. Only the log text differs. The duplication forces readers to diff word-by-word to confirm the two behave identically, and future edits risk changing only one copy.
Suggestion: extract a private method (e.g. private fetchDeviceIdAndApply(logLabel: string): Promise<void>) reused by both paths; fetchAndCacheDeviceId and awaitDeviceIdOrTimeout then keep only their own 'fire-and-forget' vs. 'race+timeout' shells.
lml2468
left a comment
There was a problem hiding this comment.
✅ APPROVE — #274 9039a317(校准 workflow 的 request-changes → approve)
跑了 pr-review workflow(自动判 request-changes,2 个 "blocker"),但我独立核实了核心竞态修复与那两条 blocker:两条所谓 blocker 实为 major 级「缺测试」,非功能缺陷;核心 #256 竞态修复正确、CI 全绿、另三位 reviewer 已 approve。据此校准为 approve,缺测试列为强烈建议补的 fast-follow。
本 PR depends on #255 且把 #255 改动卷入自身 diff(#255 未合并)。#255 部分我已单独审过;此处聚焦 #274 增量(
clientMsgDeviceIdCache.ts+startMain双轨)。
核心竞态修复正确(已独立验证)
#256:出站消息在 clientMsgDeviceId=0 时带 <uuid>_0_3 而非真实 id。修复在 WKSDK.connect() 前读 sessionStorage 缓存:
- 缓存命中 → 乐观设值 + 立即 connectIM/sync + 后台刷新;缓存未命中 →
await awaitDeviceIdOrTimeout(5s)再 connect。WS 永不在 id=0 时打开(除非首登超时,优雅降级到旧行为)。✓ loadCachedNumericDeviceId返回null(非 0)区分「无 id / 真 id」,与 SDK 默认 0 语义对齐;sessionStorage per-tab 与 UUID 同 scope,配对一致。✓- 未处理 rejection 防御严密:
awaitDeviceIdOrTimeout的fetchPromise自带 inline.catch使其 never-reject,只有 timeout 能让Promise.racereject——迟到的 fetch rejection 不会变成 Uncaught。✓ - logout 竞态守卫:
.then里if (this.loggingOut) return避免回种已清空缓存;startMain的 Task E guard 避免对已清 token 发 401 噪声。✓
🟡 强烈建议补(should-fix,缺测试;不阻断)
startMain双轨逻辑无测试App.tsx:849— 缓存命中快路径 / 首登 5s 超时降级 / 超时后迟到 fetch 写回不产生 Uncaught /loggingOut短路,这些是 #256 修复的核心正确性保证,目前无回归网。建议 mockapiClient/WKSDK/loadCachedNumericDeviceId补集成测试。logout守卫 + try/catch 无测试App.tsx:999—loggingOut并发守卫、clearLocalLoginState 抛错仍 reload(注释明确「否则用户永久卡死」)这两个不变量未测。建议 mock clearLocalLoginState 抛错断言 reload 仍调用 + 连续两次 logout 只清理一次。- module.tsx 装配层无测试
:216— interceptor→handler 的'deviceId'key /deviceId=''/ 非匹配 code 不触发未测(纯函数测的是vi.fn()桩)。
🟡 Fast-follow(非阻塞)
loggingOut守卫未覆盖 OIDC 主动登出(同我在 #255 提的那条,随 #255 卷入)— OIDC 分支不设loggingOut,并发下双导航 + clearLocalLoginState 两次(幂等、终态都是登出,低危)。建议 OIDC 分支也设守卫。loadCachedNumericDeviceId用parseInt:22— 接受"42abc"→42 尾随垃圾。当前写路径只写正整数,但 sessionStorage 被脏数据污染时会静默接受。建议/^\d+$/校验或Number+Number.isInteger。{ code } as anymodule.tsx:218— handler 只读 code,签名收窄为Pick<…,'code'>去掉 as any。- device_not_found 魔法字符串散落 5 处 — 建议导出
DEVICE_NOT_FOUND_CODE常量;catch 站点改用isStaleLocalResourceApiError复用分类源,消除字符串重复 + 双路径耦合。 - 首登路径串行延迟
:854— connectIM/contactsSync/sync 被推到 device fetch 之后(最坏 5s)。有意取舍修 #256,但确是首登延迟回归;建议评估 connect 是否可不等 id 就绪、仅对首条消息门控,或下调超时到 2-3s。 - nit —
fetchAndCacheDeviceId/awaitDeviceIdOrTimeout两处设备拉取代码块近乎逐字重复,建议抽fetchDeviceIdAndApply(label);Plan F/Task E/Risk A内部代号注释无法溯源,建议改引可定位 issue。
核心竞态修复正确、CI 全绿、零功能阻断 → APPROVE。三条缺测试请尽快补(逻辑复杂、分支多,最值得回归网保护)。
校准说明:workflow 这次又把 major 级「缺测试」判成 blocker 触发 request-changes(同 #365/#368 模式)。我读了 clientMsgDeviceIdCache.ts + startMain/awaitDeviceIdOrTimeout/fetchAndCacheDeviceId 全部源码,确认竞态修复、never-reject 防御、logout 守卫均正确,无功能缺陷,故校准为 approve。缺测试是真实 should-fix,但不应拦一个核心逻辑正确的竞态修复。
|
This pull request has been automatically marked as stale due to inactivity. Please add an update or it will be closed. |
Summary
Eliminates the
clientMsgDeviceId = 0race window inApp.startMain, where outboundchatManager.send()messages getclientMsgNo = <uuid>_0_3instead of<uuid>_<real-id>_3during the ~50-300ms window betweenconnectIM()(synchronous WebSocket open) and/user/devices/{deviceId}(async numeric id fetch).Approach — Option D+ hybrid (cache + first-login await):
/user/devicesresponseWKSDK.config.clientMsgDeviceIdfrom cache beforeconnectIM()— zero latency, zero raceawaitthe device fetch with a 5s timeout safety net beforeconnectIM()— ~200ms WS connect delay only for the first session, zero raceclearLocalLoginStateso user-switch on the same tab always re-traverses the first-login path correctlyNo server changes. No SDK changes.
This branch was developed on top of #255 (Plan F stale-resource interceptor). Of the 26 commits in this PR's diff, 22 belong to #255 and 4 belong to #256.
The 4 new commits:
Recommended merge order: merge #255 first; this PR will then auto-rebase to a clean 4-commit diff. Until then, review either by filtering to those 4 SHAs or by waiting for #255 to land.
Plan F integration
#256 uses Plan F's existing
staleLocalResourceCallbackinterceptor for stale-device error recovery (unchanged). The newawaitpath in cache-miss explicitly short-circuits viathis.loggingOutguard after the await — without it, a stale-device-triggered logout would letconnectIM/contactsSync/prohibitWordsSyncfire against a just-cleared token, causing 401 chains andUncaught (in promise)console noise. This was a real regression caught by end-to-end browser testing during development; the guard prevents it.Files changed (just the 4 #256 commits)
Race elimination truth table
Test plan
cd packages/dmworkbase && pnpm test→ 46 tests pass (35 Plan F baseline + 11 newclientMsgDeviceIdCache.test.ts)octo-serverduring development):/user/devicesreturns (await path proven; observed t=550ms after t=537ms fetch end)/user/devicessettles (cache-hit fast path proven; observed t=550.6ms vs t=551.0ms)sessionStorage.clientMsgDeviceIdNumericcleared (verified directly)Uncaught (in promise)in console (Task E guard verified working — pre-Task-E had 2× unhandled rejections; post-Task-E has 0)Risks & mitigations
Promise.racetimeout falls back to current pre-#256 behaviorloggingOutguard (already in place from #255) + new outer post-await guardif (this.loggingOut) return;after awaitRoll-back plan
Each task is its own commit; revert specific layers if needed:
git revert 4411d649git revert 4411d649 0fdaa32b 1f300888 8f27af44Closes #256
🤖 Generated with Claude Code