You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`nitpick` — `users.go:113-135`: user-status fan-out cannot loop/amplify — inbox-worker `handleUserStatusUpdated` writes Mongo (LWW), does not re-publish. Confirmed safe.
129
129
130
130
Verified-correct: `Classify` ordering (poison/skip precede `isFinal`); `IsFinalDelivery` guards `maxDeliver<=0`; empty `parentMessage._id`/`username` + decode failures correctly Term (not Nak-storm); delete events get the shorter `DeleteMaxDeliver` cap; unknown-user `member_added` correctly Naks; no nil-deref/unchecked assertion; `mustMarshal` panics only on fixed-shape structs; no leaked secrets/tokens/bodies.
131
+
132
+
---
133
+
134
+
# Lens: Performance
135
+
136
+
Scope is a one-time migration replay pump (single replica), so avoiding Nak-storms and silent stalls matters more than raw throughput.
137
+
138
+
-`medium` — `users.go:113-135` (`publishUserStatus`): payload marshaled once (good), but `h.nowMillis()` is called per loop iteration for `evt.Timestamp`, and on any per-dest publish error the function returns → Naks the **whole** event → re-fans to *all* sites on redelivery, re-publishing to destinations that already succeeded. With N sites that's N duplicate publishes per transient blip. Idempotent (status LWW), so write-amplification not correctness; hoist `nowMillis()` and consider a per-dest/fan-out metric.
139
+
-`medium` — `threadsubs.go:81,100`: per-event `FindThreadRoom` + `FindUserID` = two uncached Mongo reads per thread-sub event, **every redelivery included**. Both are indexed point-reads (`thread_rooms.parentMessageId`, `users.account`), so cheap individually, but a thread-sub arriving before its room/user re-runs *both* on every Nak cycle → dominant Mongo load under a large replay. The resolver-cache (account→userId, parentMessage→threadRoom) is the documented mitigation and was **explicitly deferred by the user** — flagging as a known opportunity only. The empty-FK poison guards correctly prevent invalid rows from Nak-storming.
140
+
-`medium` — `targetstore.go:64-88`: `FindThreadRoom`/`FindUserID` fetch whole documents (dup of the projection finding) — wasted bytes/decode on the hottest read path.
141
+
-`low` — `main.go:169`: sequential `cons.Consume()` caps throughput at one in-flight event. Acceptable given single-replica intent, ordering simplicity, and the resolver-miss Nak/retry model; flagging the trade-off, not a defect.
142
+
-`low` — `subscriptions.go:308`: `mustMarshal` per event (up to 5 per insert via `publishSubscriptionState`); fixed-shape small structs, negligible at migration scale.
143
+
-`nitpick` — `inbox-worker` legacy index `DropOne` runs every startup (idempotent, one extra admin command per boot); `subscription_deleted``DeleteOne(_id)` is an optimal primary-key delete.
144
+
145
+
No goroutine leaks, no blocking calls in the consume callback beyond intended I/O; shutdown ordering (`cc.Stop()` → drain → Mongo) correct.
146
+
147
+
---
148
+
149
+
# Lens: Observability
150
+
151
+
Strong baseline: slog-JSON everywhere, `request_id` stamped once at `processOne` entry (`natsutil.StampRequestID`) and threaded via ctx into every disposition log + the inbox publish, nil-safe metrics with op+collection dispositions, `TermExhausted` metered instead of silent JetStream drops. No `fmt.Print`/`log.Print`; no err-logged-and-returned double-logs in the new handlers.
152
+
153
+
-`medium` — `subscriptions.go:80-91`: subscription hard-delete is **observability-silent**. The delete branch publishes `subscription_deleted` and returns nil → `onProcessed("delete", subsColl)`, indistinguishable from any other processed delete. A mass-leave storm or a bug turning every delete into a downstream no-op is invisible. Add a reason label or dedicated delete counter.
154
+
-`medium` — `subscriptions.go:98`, `users.go:56`, `threadsubs.go:58`: `onSkipped(ctx, ev.Op+"_skip")` builds Prometheus labels from connector-controlled `ev.Op`. A malformed/garbage `op` becomes a new time series (unbounded-cardinality risk). Map to a fixed enum before labeling. Low real-world risk (fixed connector), but a latent cardinality vector.
155
+
-`low` — `users.go:113-135`: user-status fan-out not separately observable — no per-dest or fan-out-total counter; the Nak log (`main.go:248`) carries only `eventId`, not the failing dest, so a single-site routing gap is hard to localize.
156
+
-`low` — `handler.go:80` + mappers: no OTel child span on the per-collection handlers (only `migration source.findByID` spans). Trace waterfalls can't attribute latency to FK resolution vs publish vs decode. Not a regression; a gap for a multi-hop migration path.
157
+
-`nitpick` — `sourcelookup.go:35`: shared tracer renamed to `"migration"` while `oplog-transformer/historyclient.go:44` still uses `"oplog-transformer"` — two instrumentation-scope names coexist in one binary. Intentional/harmless; note for dashboard authors.
158
+
159
+
PII/secrets: clean. No log emits source bodies, `statusText` content, or tokens; usernames appear only in error strings (`account %q`), acceptable. Request-ID propagation correct across both services, including the new `handleSubscriptionDeleted`. Pre-existing gap: `inbox-worker/handler.go:119``unknown event type` warn omits request_id (not introduced here; the new `subscription_deleted` case sits above it).
0 commit comments