Skip to content

feat(room-worker,inbox-worker): origin-site MV fix per PR #145 spec#158

Merged
mliu33 merged 4 commits intomainfrom
claude/implement-pr-145-jpaiE
May 8, 2026
Merged

feat(room-worker,inbox-worker): origin-site MV fix per PR #145 spec#158
mliu33 merged 4 commits intomainfrom
claude/implement-pr-145-jpaiE

Conversation

@Joey0538
Copy link
Copy Markdown
Collaborator

@Joey0538 Joey0538 commented May 7, 2026

Summary

Implements docs/superpowers/specs/2026-05-01-federated-room-origin-site-mv-fix-design.md (merged in #145) in full. Federated rooms whose owning site adds or removes members now update that site's user-room and spotlight indexes for every affected member (same-site + cross-site), so CCS terms-lookup queries against the origin site resolve correctly.

Note: PR #145 mistakenly listed an "add-members slice" as completed via #148. #148 has since been moved to draft and won't merge — so this PR ships all four changes the spec calls for, from scratch.

Changes

room-worker/handler.go — three additive log-and-continue local-INBOX publishes:

  • processAddMembers (Change 1): publishes chat.inbox.{originSite}.member_added carrying the full add set (same-site + cross-site). Skipped when actualAccounts is empty.
  • processRemoveIndividual (Change 2): publishes chat.inbox.{site}.member_removed. Wrapper OutboxEvent.Type collapses to member_removed even for self-leave, while inner MemberRemoveEvent.Type=member_left is preserved — matches the cross-site OUTBOX convention so search-sync-worker dispatches on a single MV op.
  • processRemoveOrg (Change 3): publishes one local-INBOX event covering every removed account, regardless of site. Reuses the existing len(accounts)>0 gate.

All three wrap the existing MemberAddEvent / MemberRemoveEvent in OutboxEvent with SiteID == DestSiteID == originSite, byte-for-byte compatible with search-sync-worker/inbox_stream.go::parseMemberEvent.

inbox-worker/main.go (Change 4) — scope the durable consumer's FilterSubjects to chat.inbox.{siteID}.aggregate.>, reserving the local lane for search-sync-worker. Without this, inbox-worker would re-process every cross-site member_added and emit duplicate-key churn against subscriptions room-worker already wrote locally.

pkg/subject/subject.go — new InboxAggregateAll(siteID) builder so the aggregate-prefix subject lives in one place (used by both inbox-worker main and the integration test).

Tests

  • room-worker/handler_test.go — six new unit tests: happy path × 3 handlers, self-leave wrapper-vs-inner type collapse, two empty/zero-accounts no-publish guards. publishedMsg gained an msgID field for dedup-ID assertions. Existing length expectations bumped by +1 publish where applicable.
  • room-worker/integration_test.go — two real-Mongo tests assert the local-INBOX OutboxEvent + inner-payload structure end-to-end through processAddMembers and processRemoveMember.
  • inbox-worker/integration_test.go — one NATS-via-testcontainers test publishes one local-lane and one aggregate-lane event, then asserts NumPending == 1. Locks in Change 4 so a future regression that drops FilterSubjects surfaces immediately.

Rollout

Forward-only per the spec. No backfill for pre-fix federated rooms — pre-existing memberships on origin-site MVs stay missing until the room experiences add/remove churn or the index is rebuilt out-of-band.

Both services should ship in the same release per site (the spec's risk note: deploying room-worker first would cause inbox-worker to log "user not found" for every cross-site add until the filter scoping lands).

Test plan

  • make lint clean
  • make test clean (entire repo)
  • make test-integration SERVICE=room-worker (requires Docker — run in CI)
  • make test-integration SERVICE=inbox-worker (requires Docker — run in CI)
  • Per-site post-deploy verification per spec: nats consumer info INBOX_{siteID} inbox-worker shows filter_subjects: ["chat.inbox.{siteID}.aggregate.>"]; add a test member to a federated room owned by the site, query the site's user-room-{siteID} ES index, confirm the user's doc contains the new room ID.

https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp


Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Room membership changes now publish local inbox events for reliable local delivery.
    • Inbox processing now restricts delivery to aggregated inbox subjects for clearer scoping.
    • Auth service accepts multiple OIDC audiences (comma-separated) for token validation.
  • Tests

    • Added integration and unit tests covering local inbox publishing and inbox filtering.
  • Other

    • Search client TLS handling extended to support configurable skip-verify; linters updated.

Implements docs/superpowers/specs/2026-05-01-federated-room-origin-site-mv-fix-design.md
in full. Federated rooms whose owning site adds or removes members now
update that site's user-room and spotlight indexes for every affected
member (same-site + cross-site), so CCS terms-lookup queries from any
site resolve against the origin-site MV.

room-worker (Changes 1-3): three additive log-and-continue publishes to
chat.inbox.{originSiteID}.member_{added,removed}, wrapping the existing
MemberAddEvent / MemberRemoveEvent in OutboxEvent so the wire format
matches the federated `aggregate.>` lane that search-sync-worker's
parseMemberEvent already decodes. Self-leave collapses to
OutboxEvent.Type=member_removed at the wrapper while preserving the
inner MemberRemoveEvent.Type=member_left, matching the cross-site
OUTBOX convention. add-members skips the publish when actualAccounts
is empty; remove-org reuses the existing len(accounts)>0 gate.

inbox-worker (Change 4): scope the durable consumer's FilterSubjects to
chat.inbox.{siteID}.aggregate.> so the new local-lane events reach
search-sync-worker only. Without this, inbox-worker would re-process
every cross-site member_added and emit duplicate-key churn against
subscriptions room-worker already wrote locally.

Tests:
- room-worker/handler_test.go: six new unit tests covering the happy
  path for each handler method, the empty-accounts no-publish guards,
  and the self-leave wrapper-vs-inner type collapse. Existing
  publishedMsg gains an msgID field so dedup-ID assertions can compare
  against outboxDedupID directly. Existing length expectations bumped
  by one publish where applicable.
- room-worker/integration_test.go: two real-Mongo tests assert the
  local-INBOX OutboxEvent + inner-payload structure end-to-end through
  processAddMembers and processRemoveMember.
- inbox-worker/integration_test.go: NATS-via-testcontainers test
  publishes one local-lane and one aggregate-lane event, then asserts
  the consumer's NumPending = 1 — locking in the FilterSubjects
  scoping so a future regression that drops it surfaces immediately.

Forward-only rollout per the spec; no backfill for pre-fix federated
rooms.

https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

Warning

Rate limit exceeded

@Joey0538 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d13a909-bf6b-4b43-a574-e77289727fec

📥 Commits

Reviewing files that changed from the base of the PR and between 8a8ba51 and 4814364.

📒 Files selected for processing (5)
  • auth-service/deploy/.env.example
  • auth-service/deploy/docker-compose.yml
  • auth-service/main.go
  • pkg/oidc/oidc.go
  • pkg/oidc/oidc_test.go
📝 Walkthrough

Walkthrough

Adds local INBOX publishes for room membership changes, scopes inbox-worker JetStream consumer to aggregate-lane subjects via FilterSubjects, extends unit and integration tests for deterministic msgIDs, and adds TLS-skip options for Elasticsearch clients and test wiring.

Changes

Local Inbox Publishing with JetStream Filtering

Layer / File(s) Summary
Subject Contract
pkg/subject/subject.go
InboxAggregateAll(siteID) generates the wildcard INBOX filter subject pattern for aggregate-lane events on a site.
Inbox Worker Consumer Configuration
inbox-worker/main.go
Adds subject import and sets FilterSubjects on JetStream consumer to the aggregate-lane inbox pattern for the configured site.
Inbox Worker Integration Testing
inbox-worker/integration_test.go
Adds setupNATS helper and TestInboxWorker_FilterScoping_Integration that starts a JetStream testcontainer, publishes local and aggregate messages, and asserts the consumer only receives aggregate-lane messages.
Room Worker Member Event Publishing
room-worker/handler.go
Publishes local INBOX OutboxEvent-wrapped messages for member adds/removals with deterministic dedup/msgID seeds derived from room/account/org and timestamp.
Room Worker Unit Tests
room-worker/handler_test.go
Expands captured publish struct (msgID), updates expected publish counts, adds findInboxPublish(), and adds tests verifying INBOX payloads and deterministic msgID values.
Room Worker Integration Tests
room-worker/integration_test.go
New integration tests assert end-to-end local INBOX publishes for add/remove flows with correct OutboxEvent wrapping, inner event fields, and deterministic msgID.
OIDC Audience Allow-list
pkg/oidc/oidc.go, pkg/oidc/oidc_test.go
Replaces single-audience config with Audiences []string, adds ErrNoAudiences and ErrAudienceNotAllowed, disables go-oidc single-audience check, enforces allow-list in Validate, and adds unit tests for audience matching.
Search Engine TLS Transport
pkg/searchengine/factory.go
Adds tlsSkipVerify parameter, clones http.DefaultTransport and sets InsecureSkipVerify=true when enabled, and uses the custom transport for Elasticsearch client creation.
Search Client Wiring & Tests
search-service/..., search-sync-worker/..., search-sync-worker/main.go, search-service/integration_test.go, search-sync-worker/*_integration_test.go
Wires the new TLS-skip option through services, workers, and test fixtures by passing an extra boolean to searchengine.New(...).
Linter Config
.golangci.yml
Enables gosec limited to rule G402 and configures gocritic enabled-tags.
Auth Service OIDC Env/Config
auth-service/deploy/.env.example, auth-service/deploy/docker-compose.yml, auth-service/main.go
Replaces OIDC_AUDIENCE with OIDC_AUDIENCES (comma-separated list) and wires multi-audience config into auth-service startup validation and validator construction.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hmchangw/chat#109: Related changes touching INBOX subject helpers and JetStream consumer filtering.
  • hmchangw/chat#145: Implements local INBOX publishes and FilterSubjects scoping in related files.
  • hmchangw/chat#98: Prior modifications to room-worker member add/remove logic and publishes.

Suggested reviewers

  • mliu33
  • hmchangw

Poem

🐇 I hop through subjects, neat and spry,

I seed dedup IDs and watch them fly,
Aggregate lanes carry a steady tune,
Local inboxes hum beneath the moon,
A rabbit cheers — tests pass soon.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main change: implementing an origin-site materialized view fix for federated rooms per a design specification, with affected services (room-worker and inbox-worker) explicitly named.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/implement-pr-145-jpaiE

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
inbox-worker/integration_test.go (1)

424-436: ⚡ Quick win

Extract NATS testcontainer setup into a setupNATS(t) helper.

This setup is currently inlined; moving it to a helper keeps integration tests consistent and easier to reuse.

♻️ Suggested refactor
-	ctx := context.Background()
-	natsContainer, err := natsmod.Run(ctx, testimages.NATS)
-	require.NoError(t, err)
-	t.Cleanup(func() { _ = natsContainer.Terminate(ctx) })
-
-	natsURL, err := natsContainer.ConnectionString(ctx)
-	require.NoError(t, err)
-
-	nc, err := nats.Connect(natsURL)
-	require.NoError(t, err)
-	t.Cleanup(func() { nc.Close() })
-
-	js, err := jetstream.New(nc)
-	require.NoError(t, err)
+	ctx, js := setupNATS(t)
func setupNATS(t *testing.T) (context.Context, jetstream.JetStream) {
	t.Helper()
	ctx := context.Background()

	c, err := natsmod.Run(ctx, testimages.NATS)
	require.NoError(t, err)
	t.Cleanup(func() { _ = c.Terminate(ctx) })

	url, err := c.ConnectionString(ctx)
	require.NoError(t, err)

	nc, err := nats.Connect(url)
	require.NoError(t, err)
	t.Cleanup(func() { nc.Close() })

	js, err := jetstream.New(nc)
	require.NoError(t, err)

	return ctx, js
}

As per coding guidelines **/*integration_test.go: "Write setup<Dep>(t *testing.T) helpers that start a container, register t.Cleanup, and return a connected client".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@inbox-worker/integration_test.go` around lines 424 - 436, Extract the inline
NATS testcontainer setup into a helper named setupNATS(t *testing.T) that calls
t.Helper(), creates a background context, runs natsmod.Run(ctx,
testimages.NATS), registers t.Cleanup to terminate the container, obtains the
connection string via ConnectionString, connects with nats.Connect, registers
t.Cleanup to close the connection, constructs the JetStream client with
jetstream.New, asserts errors using require.NoError, and returns
(context.Context, jetstream.JetStream) so tests can replace the inlined block
with a call to setupNATS(t).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@inbox-worker/integration_test.go`:
- Around line 424-436: Extract the inline NATS testcontainer setup into a helper
named setupNATS(t *testing.T) that calls t.Helper(), creates a background
context, runs natsmod.Run(ctx, testimages.NATS), registers t.Cleanup to
terminate the container, obtains the connection string via ConnectionString,
connects with nats.Connect, registers t.Cleanup to close the connection,
constructs the JetStream client with jetstream.New, asserts errors using
require.NoError, and returns (context.Context, jetstream.JetStream) so tests can
replace the inlined block with a call to setupNATS(t).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 954e4233-13b3-4879-af3c-6ead631822b7

📥 Commits

Reviewing files that changed from the base of the PR and between 61f128a and dc43174.

📒 Files selected for processing (6)
  • inbox-worker/integration_test.go
  • inbox-worker/main.go
  • pkg/subject/subject.go
  • room-worker/handler.go
  • room-worker/handler_test.go
  • room-worker/integration_test.go

Aligns the new NATS-testcontainer test with the CLAUDE.md convention
"Write setup<Dep>(t *testing.T) helpers that start a container, register
t.Cleanup, and return a connected client". No behavior change.

https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/searchengine/factory.go`:
- Line 23: Replace the insecure TLS config that sets InsecureSkipVerify: true on
httpTransport.TLSClientConfig: instead load the trusted CA PEM into an
x509.CertPool and set TLSClientConfig.RootCAs (and MinVersion >=
tls.VersionTLS12) so certificates are properly verified; if skipping
verification is deliberate, add a clear comment documenting the rationale and
annotate the line with the appropriate linter suppression (e.g., //nolint:gosec
or `#nosec` G402) next to the httpTransport.TLSClientConfig assignment so the
bypass is explicitly intentional.
- Line 22: The code does an unguarded type assertion on http.DefaultTransport
when assigning httpTransport := http.DefaultTransport.(*http.Transport), which
can panic if DefaultTransport has been replaced with a non-*http.Transport;
change this to the safe two-value form (t, ok :=
http.DefaultTransport.(*http.Transport)) and handle the !ok case by returning or
propagating a meaningful error from the surrounding factory creation function
(or falling back to creating a new *http.Transport) instead of panicking so
callers of the code in factory.go get a proper error rather than a runtime
crash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a91c741e-6ad5-4e5d-b33b-86c92cf2c47c

📥 Commits

Reviewing files that changed from the base of the PR and between dc43174 and 2acd14d.

📒 Files selected for processing (2)
  • inbox-worker/integration_test.go
  • pkg/searchengine/factory.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • inbox-worker/integration_test.go

Comment thread pkg/searchengine/factory.go Outdated
Comment thread pkg/searchengine/factory.go Outdated
@Joey0538 Joey0538 force-pushed the claude/implement-pr-145-jpaiE branch from 0eba4db to 5d33954 Compare May 7, 2026 09:09
Adds an opt-in tlsSkipVerify bool to searchengine.New, plumbed from
each service's config:

- search-service:     SEARCH_TLS_SKIP_VERIFY (default false)
- search-sync-worker: SEARCH_TLS_SKIP_VERIFY (default false)

Default-off keeps prod safe; ops opts in per environment for
self-signed/internal ES clusters. When false, the factory uses the
standard ES client transport — same behavior as before this PR.
When true, clones http.DefaultTransport (preserving ProxyFromEnvironment,
dial/TLS-handshake timeouts, HTTP/2, idle-conn tuning) and overrides
only TLSClientConfig with InsecureSkipVerify=true and MinVersion=TLS 1.2,
guarding the type assertion on http.DefaultTransport so we error out
cleanly if a middleware (e.g. OTel) has replaced it.

Also enables gosec G402 narrowly in .golangci.yml so the //nolint:gosec
annotation in pkg/oidc and pkg/searchengine actually suppresses a real
rule, and any future unannotated InsecureSkipVerify is rejected at
lint time.

Includes a goimports-only struct alignment tweak in
room-worker/integration_test.go picked up while running make fmt — no
behavior change.

https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
@Joey0538 Joey0538 force-pushed the claude/implement-pr-145-jpaiE branch from 5d33954 to e0c0a31 Compare May 7, 2026 09:13
Replaces the single-audience verifier with a multi-audience allow-list
so a shared Keycloak realm that issues tokens for several client
audiences (the common org pattern) can be served by one auth-service.

pkg/oidc:
- Config.Audience (string) → Config.Audiences ([]string).
- Disable go-oidc's built-in single-audience check via
  oidc.Config{SkipClientIDCheck: true} and enforce the allow-list in
  Validate after cryptographic verification: a token is accepted when
  any of its `aud` entries matches any configured audience.
- New ErrNoAudiences and ErrAudienceNotAllowed sentinels for clear
  error propagation; NewValidator fails fast on an empty Audiences list.
- Single Verify call per request — no retry-on-mismatch loop.

auth-service:
- OIDC_AUDIENCE → OIDC_AUDIENCES (comma-separated; envSeparator:",").
- Required-config check updated; deploy/.env.example and
  deploy/docker-compose.yml renamed to match.

Tests:
- pkg/oidc/oidc_test.go: table-driven tests for containsAudience
  (single/multi/empty cases) plus NewValidator empty-audiences guard.
  pkg/oidc had no tests before this commit.

https://claude.ai/code/session_01UkLD7hpaypxjeh5zbEWTjp
@Joey0538 Joey0538 force-pushed the claude/implement-pr-145-jpaiE branch from 8a8ba51 to 4814364 Compare May 8, 2026 04:05
Copy link
Copy Markdown
Collaborator

@mliu33 mliu33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super~

@mliu33 mliu33 merged commit c779ede into main May 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants