feat(room-worker,inbox-worker): origin-site MV fix per PR #145 spec#158
feat(room-worker,inbox-worker): origin-site MV fix per PR #145 spec#158
Conversation
Implements docs/superpowers/specs/2026-05-01-federated-room-origin-site-mv-fix-design.md
in full. Federated rooms whose owning site adds or removes members now
update that site's user-room and spotlight indexes for every affected
member (same-site + cross-site), so CCS terms-lookup queries from any
site resolve against the origin-site MV.
room-worker (Changes 1-3): three additive log-and-continue publishes to
chat.inbox.{originSiteID}.member_{added,removed}, wrapping the existing
MemberAddEvent / MemberRemoveEvent in OutboxEvent so the wire format
matches the federated `aggregate.>` lane that search-sync-worker's
parseMemberEvent already decodes. Self-leave collapses to
OutboxEvent.Type=member_removed at the wrapper while preserving the
inner MemberRemoveEvent.Type=member_left, matching the cross-site
OUTBOX convention. add-members skips the publish when actualAccounts
is empty; remove-org reuses the existing len(accounts)>0 gate.
inbox-worker (Change 4): scope the durable consumer's FilterSubjects to
chat.inbox.{siteID}.aggregate.> so the new local-lane events reach
search-sync-worker only. Without this, inbox-worker would re-process
every cross-site member_added and emit duplicate-key churn against
subscriptions room-worker already wrote locally.
Tests:
- room-worker/handler_test.go: six new unit tests covering the happy
path for each handler method, the empty-accounts no-publish guards,
and the self-leave wrapper-vs-inner type collapse. Existing
publishedMsg gains an msgID field so dedup-ID assertions can compare
against outboxDedupID directly. Existing length expectations bumped
by one publish where applicable.
- room-worker/integration_test.go: two real-Mongo tests assert the
local-INBOX OutboxEvent + inner-payload structure end-to-end through
processAddMembers and processRemoveMember.
- inbox-worker/integration_test.go: NATS-via-testcontainers test
publishes one local-lane and one aggregate-lane event, then asserts
the consumer's NumPending = 1 — locking in the FilterSubjects
scoping so a future regression that drops it surfaces immediately.
Forward-only rollout per the spec; no backfill for pre-fix federated
rooms.
https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds local INBOX publishes for room membership changes, scopes inbox-worker JetStream consumer to aggregate-lane subjects via FilterSubjects, extends unit and integration tests for deterministic msgIDs, and adds TLS-skip options for Elasticsearch clients and test wiring. ChangesLocal Inbox Publishing with JetStream Filtering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
inbox-worker/integration_test.go (1)
424-436: ⚡ Quick winExtract NATS testcontainer setup into a
setupNATS(t)helper.This setup is currently inlined; moving it to a helper keeps integration tests consistent and easier to reuse.
♻️ Suggested refactor
- ctx := context.Background() - natsContainer, err := natsmod.Run(ctx, testimages.NATS) - require.NoError(t, err) - t.Cleanup(func() { _ = natsContainer.Terminate(ctx) }) - - natsURL, err := natsContainer.ConnectionString(ctx) - require.NoError(t, err) - - nc, err := nats.Connect(natsURL) - require.NoError(t, err) - t.Cleanup(func() { nc.Close() }) - - js, err := jetstream.New(nc) - require.NoError(t, err) + ctx, js := setupNATS(t)func setupNATS(t *testing.T) (context.Context, jetstream.JetStream) { t.Helper() ctx := context.Background() c, err := natsmod.Run(ctx, testimages.NATS) require.NoError(t, err) t.Cleanup(func() { _ = c.Terminate(ctx) }) url, err := c.ConnectionString(ctx) require.NoError(t, err) nc, err := nats.Connect(url) require.NoError(t, err) t.Cleanup(func() { nc.Close() }) js, err := jetstream.New(nc) require.NoError(t, err) return ctx, js }As per coding guidelines
**/*integration_test.go: "Writesetup<Dep>(t *testing.T)helpers that start a container, registert.Cleanup, and return a connected client".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@inbox-worker/integration_test.go` around lines 424 - 436, Extract the inline NATS testcontainer setup into a helper named setupNATS(t *testing.T) that calls t.Helper(), creates a background context, runs natsmod.Run(ctx, testimages.NATS), registers t.Cleanup to terminate the container, obtains the connection string via ConnectionString, connects with nats.Connect, registers t.Cleanup to close the connection, constructs the JetStream client with jetstream.New, asserts errors using require.NoError, and returns (context.Context, jetstream.JetStream) so tests can replace the inlined block with a call to setupNATS(t).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@inbox-worker/integration_test.go`:
- Around line 424-436: Extract the inline NATS testcontainer setup into a helper
named setupNATS(t *testing.T) that calls t.Helper(), creates a background
context, runs natsmod.Run(ctx, testimages.NATS), registers t.Cleanup to
terminate the container, obtains the connection string via ConnectionString,
connects with nats.Connect, registers t.Cleanup to close the connection,
constructs the JetStream client with jetstream.New, asserts errors using
require.NoError, and returns (context.Context, jetstream.JetStream) so tests can
replace the inlined block with a call to setupNATS(t).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 954e4233-13b3-4879-af3c-6ead631822b7
📒 Files selected for processing (6)
inbox-worker/integration_test.goinbox-worker/main.gopkg/subject/subject.goroom-worker/handler.goroom-worker/handler_test.goroom-worker/integration_test.go
Aligns the new NATS-testcontainer test with the CLAUDE.md convention "Write setup<Dep>(t *testing.T) helpers that start a container, register t.Cleanup, and return a connected client". No behavior change. https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/searchengine/factory.go`:
- Line 23: Replace the insecure TLS config that sets InsecureSkipVerify: true on
httpTransport.TLSClientConfig: instead load the trusted CA PEM into an
x509.CertPool and set TLSClientConfig.RootCAs (and MinVersion >=
tls.VersionTLS12) so certificates are properly verified; if skipping
verification is deliberate, add a clear comment documenting the rationale and
annotate the line with the appropriate linter suppression (e.g., //nolint:gosec
or `#nosec` G402) next to the httpTransport.TLSClientConfig assignment so the
bypass is explicitly intentional.
- Line 22: The code does an unguarded type assertion on http.DefaultTransport
when assigning httpTransport := http.DefaultTransport.(*http.Transport), which
can panic if DefaultTransport has been replaced with a non-*http.Transport;
change this to the safe two-value form (t, ok :=
http.DefaultTransport.(*http.Transport)) and handle the !ok case by returning or
propagating a meaningful error from the surrounding factory creation function
(or falling back to creating a new *http.Transport) instead of panicking so
callers of the code in factory.go get a proper error rather than a runtime
crash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a91c741e-6ad5-4e5d-b33b-86c92cf2c47c
📒 Files selected for processing (2)
inbox-worker/integration_test.gopkg/searchengine/factory.go
🚧 Files skipped from review as they are similar to previous changes (1)
- inbox-worker/integration_test.go
0eba4db to
5d33954
Compare
Adds an opt-in tlsSkipVerify bool to searchengine.New, plumbed from each service's config: - search-service: SEARCH_TLS_SKIP_VERIFY (default false) - search-sync-worker: SEARCH_TLS_SKIP_VERIFY (default false) Default-off keeps prod safe; ops opts in per environment for self-signed/internal ES clusters. When false, the factory uses the standard ES client transport — same behavior as before this PR. When true, clones http.DefaultTransport (preserving ProxyFromEnvironment, dial/TLS-handshake timeouts, HTTP/2, idle-conn tuning) and overrides only TLSClientConfig with InsecureSkipVerify=true and MinVersion=TLS 1.2, guarding the type assertion on http.DefaultTransport so we error out cleanly if a middleware (e.g. OTel) has replaced it. Also enables gosec G402 narrowly in .golangci.yml so the //nolint:gosec annotation in pkg/oidc and pkg/searchengine actually suppresses a real rule, and any future unannotated InsecureSkipVerify is rejected at lint time. Includes a goimports-only struct alignment tweak in room-worker/integration_test.go picked up while running make fmt — no behavior change. https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
5d33954 to
e0c0a31
Compare
Replaces the single-audience verifier with a multi-audience allow-list
so a shared Keycloak realm that issues tokens for several client
audiences (the common org pattern) can be served by one auth-service.
pkg/oidc:
- Config.Audience (string) → Config.Audiences ([]string).
- Disable go-oidc's built-in single-audience check via
oidc.Config{SkipClientIDCheck: true} and enforce the allow-list in
Validate after cryptographic verification: a token is accepted when
any of its `aud` entries matches any configured audience.
- New ErrNoAudiences and ErrAudienceNotAllowed sentinels for clear
error propagation; NewValidator fails fast on an empty Audiences list.
- Single Verify call per request — no retry-on-mismatch loop.
auth-service:
- OIDC_AUDIENCE → OIDC_AUDIENCES (comma-separated; envSeparator:",").
- Required-config check updated; deploy/.env.example and
deploy/docker-compose.yml renamed to match.
Tests:
- pkg/oidc/oidc_test.go: table-driven tests for containsAudience
(single/multi/empty cases) plus NewValidator empty-audiences guard.
pkg/oidc had no tests before this commit.
https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
8a8ba51 to
4814364
Compare
Summary
Implements
docs/superpowers/specs/2026-05-01-federated-room-origin-site-mv-fix-design.md(merged in #145) in full. Federated rooms whose owning site adds or removes members now update that site'suser-roomandspotlightindexes for every affected member (same-site + cross-site), so CCS terms-lookup queries against the origin site resolve correctly.Changes
room-worker/handler.go — three additive log-and-continue local-INBOX publishes:
processAddMembers(Change 1): publisheschat.inbox.{originSite}.member_addedcarrying the full add set (same-site + cross-site). Skipped whenactualAccountsis empty.processRemoveIndividual(Change 2): publisheschat.inbox.{site}.member_removed. WrapperOutboxEvent.Typecollapses tomember_removedeven for self-leave, while innerMemberRemoveEvent.Type=member_leftis preserved — matches the cross-site OUTBOX convention so search-sync-worker dispatches on a single MV op.processRemoveOrg(Change 3): publishes one local-INBOX event covering every removed account, regardless of site. Reuses the existinglen(accounts)>0gate.All three wrap the existing
MemberAddEvent/MemberRemoveEventinOutboxEventwithSiteID == DestSiteID == originSite, byte-for-byte compatible withsearch-sync-worker/inbox_stream.go::parseMemberEvent.inbox-worker/main.go (Change 4) — scope the durable consumer's
FilterSubjectstochat.inbox.{siteID}.aggregate.>, reserving the local lane for search-sync-worker. Without this, inbox-worker would re-process every cross-site member_added and emit duplicate-key churn against subscriptions room-worker already wrote locally.pkg/subject/subject.go — new
InboxAggregateAll(siteID)builder so the aggregate-prefix subject lives in one place (used by both inbox-worker main and the integration test).Tests
publishedMsggained anmsgIDfield for dedup-ID assertions. Existing length expectations bumped by +1 publish where applicable.OutboxEvent+ inner-payload structure end-to-end throughprocessAddMembersandprocessRemoveMember.NumPending == 1. Locks in Change 4 so a future regression that dropsFilterSubjectssurfaces immediately.Rollout
Forward-only per the spec. No backfill for pre-fix federated rooms — pre-existing memberships on origin-site MVs stay missing until the room experiences add/remove churn or the index is rebuilt out-of-band.
Both services should ship in the same release per site (the spec's risk note: deploying room-worker first would cause inbox-worker to log "user not found" for every cross-site add until the filter scoping lands).
Test plan
make lintcleanmake testclean (entire repo)make test-integration SERVICE=room-worker(requires Docker — run in CI)make test-integration SERVICE=inbox-worker(requires Docker — run in CI)nats consumer info INBOX_{siteID} inbox-workershowsfilter_subjects: ["chat.inbox.{siteID}.aggregate.>"]; add a test member to a federated room owned by the site, query the site'suser-room-{siteID}ES index, confirm the user's doc contains the new room ID.https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
Generated by Claude Code
Summary by CodeRabbit
New Features
Tests
Other