Skip to content

feat(redis): instrument raw redis clients via octoredis chokepoint constructor#495

Open
an9xyz wants to merge 2 commits into
mainfrom
instrument-raw-redis-clients
Open

feat(redis): instrument raw redis clients via octoredis chokepoint constructor#495
an9xyz wants to merge 2 commits into
mainfrom
instrument-raw-redis-clients

Conversation

@an9xyz

@an9xyz an9xyz commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the control-plane coverage gap in the dependency="redis" latency metric. octo-lib#104 (merged) exported redis.Instrument(*rd.Client); this PR wires it in at a single chokepoint so the ~18 raw rd.NewClient(...) instances β€” rate limiters, OIDC locks, auth, health, etc. β€” feed dependency="redis" instead of being blind to it (only pool stats covered them before).

Related Issue

Addresses the primary raw-client follow-up in octo-lib#96. Builds on octo-lib#104 (merged) and octo-server #474.

Linked Spec

octo-lib#104 (redis.Instrument export) + #442 DependencyMetrics. Reviewers on #474 (yujiawei / OctoBoooot / Octo-Q) flagged the control-plane redis clients as an alerting blind spot; this is the fix.

Changes

  • pkg/redis/options.go β€” add NewInstrumentedClient(cfg, overrides...) and InstrumentedClientFromOptions(opts): build a raw *rd.Client and call liboredis.Instrument before returning, so commands feed dependency="redis". Instrumentation at construction satisfies octo-lib's call-before-share contract.
  • pkg/redis/redis.go β€” octoredis's own Conn (NewWithOptions) routes through InstrumentedClientFromOptions too.
  • 18 raw-client sites migrated to the chokepoint constructors: main.go (rlRedis), pkg/wkhttp/ratelimit_helper.go, modules/{oidcΓ—5, userΓ—2, group, space, integration, incomingwebhook, usersecret, opanalytics, bot_api, bot_provision}, and modules/common/health.go (the readinessRedisOptions path).
  • main.go β€” refresh the coverage comment (control-plane clients are now instrumented).
  • go.mod β€” bump octo-lib to the merged feat(auth): add login.local_off switch to disable local-account loginΒ #104 commit.
  • tests β€” pkg/redis/instrument_test.go proves both constructors install the hook (drives a command at a dead address so the hook fires without a live Redis).

Mechanical, uniform change: every site swaps rd.NewClient(octoredis.MustBuildOptions(cfg, fn)) β†’ octoredis.NewInstrumentedClient(cfg, fn). No behavior change beyond the added timing hook; pool params, TLS, and lifecycle are unchanged.

Testing

go build ./...      # ok
go vet ./...        # clean
go test -race ./pkg/redis/...   # ok (new hook-install tests pass without live Redis)
golangci-lint run ./pkg/redis/... ./modules/common/...   # 0 issues
  • Unit tests added/updated
  • Manually verified (covered by hook-install unit tests against the merged octo-lib)

COMPREHENSION

  1. What it does to the load-bearing path β€” routes every raw control-plane redis client through a constructor that installs octo-lib's per-command timing hook at build time. At runtime each command on those clients now also invokes the observer β†’ one Observe(dur) into the existing DependencyMetrics. No change to command results, errors, pooling, or TLS.

  2. What could break β€” the hook runs on every command of the affected clients. It's panic-isolated in octo-lib (defer recover()), does only an in-memory Observe, and Instrument is idempotent + nil-safe (octo-lib#104), so double-counting/leaks are guarded. The one contract to honor β€” instrument before the client is shared β€” is satisfied because the constructor instruments before returning the client. Labels stay low-cardinality (cmd.Name() only; no keys/args/scripts reach the observer). Risk if a future site bypasses the chokepoint and calls rd.NewClient directly β†’ it'd be uninstrumented again (mitigated: no such sites remain; grep confirms).

  3. How I know it works β€” TestNewInstrumentedClientInstruments / TestInstrumentedClientFromOptionsInstruments register an observer and confirm a command on a constructor-built client reaches it (using a dead-address command so the hook fires without a live Redis). go build/vet/-race test/golangci-lint all green; grep confirms zero remaining rd.NewClient(octoredis.MustBuildOptions(...)) sites.

Checklist

  • I have read CONTRIBUTING.md
  • PR description is in English
  • Added tests for my changes
  • Updated documentation (godoc on the new constructors; main.go coverage comment; octo-lib#96)
  • Followed commit message conventions (Conventional Commits)

πŸ€– Generated with Claude Code


Generated by Claude Code

…nstructor

octo-lib#104 exported redis.Instrument so raw *rd.Client instances can opt into
the dependency="redis" latency hook. Wire it in at a single chokepoint rather
than scattering Instrument calls across ~18 sites:

- pkg/redis: add NewInstrumentedClient(cfg, overrides...) and
  InstrumentedClientFromOptions(opts) β€” build a raw client and Instrument it
  before returning, so its commands feed dependency="redis". Instrumentation
  happens at construction, before the client is shared (octo-lib's
  call-before-share contract). NewWithOptions (octoredis.Conn) routes through it
  too.
- Migrate all ~18 raw rd.NewClient(octoredis.MustBuildOptions(...)) sites β€” rate
  limiters, OIDC state/bind/logout/locks, bot registry/provision, user/group/
  space auth, integration/incomingwebhook/usersecret/opanalytics, health probe β€”
  plus the readinessRedisOptions path. These control-plane clients were
  previously blind to dependency="redis" (only pool stats covered them).
- main.go: rlRedis is now instrumented; refresh the coverage comment.
- go.mod: bump octo-lib to the merged #104 commit.

Addresses the primary raw-client follow-up in octo-lib#96.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HHTDHbMPAVkNvYdQkh2ts3
@an9xyz an9xyz requested a review from a team as a code owner June 28, 2026 02:03
@github-actions github-actions Bot added the size/M PR size: M label Jun 28, 2026
@github-actions

Copy link
Copy Markdown

Dependency Changes Detected

This PR modifies dependency files. Please review whether these changes are intentional.

Changed files:

  • go.mod
  • go.sum

Maintainer checklist:

  • Confirm dependency changes are intentional
  • Review package delta if lockfile changed

@github-actions github-actions Bot added the dependencies-changed This PR modifies dependency files label Jun 28, 2026
Jerry-Xin
Jerry-Xin previously approved these changes Jun 28, 2026

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: The PR is in scope for octo-server and correctly centralizes raw Redis client instrumentation without changing the migrated clients’ Redis options or call behavior.

πŸ’¬ Non-blocking

  • 🟑 Warning: pkg/redis/options.go uses octo-lib’s exported Instrument for every client created by InstrumentedClientFromOptions. That is fine for the current process-lifetime raw clients, but Instrument retains clients in its idempotence guard, so this helper should not be used for short-lived/request-scoped clients. Consider documenting that constraint more explicitly, especially because pkg/redis/redis.go now routes the general Conn constructor through it.

βœ… Highlights

  • The production raw Redis client sites now consistently go through octoredis.NewInstrumentedClient or InstrumentedClientFromOptions.
  • The hook is installed before the clients are shared, matching the octo-lib contract.
  • The new tests cover both constructor paths and pass without a live Redis.
  • I verified go build ./..., go test ./pkg/redis/..., and go test -race ./pkg/redis/.... A broader go test ./modules/common/... failed on local migration database state, not on this PR’s Redis instrumentation path.

mochashanyao
mochashanyao previously approved these changes Jun 28, 2026

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q Β· automated review]

Verdict: Approve β€” no blocking findings; notes below (data-flow traced).


octo-server PR#495 Review Report

Reviewer: Octo-Q (automated review)
PR: #495
Head SHA: af60fa5
Title: feat(redis): instrument raw redis clients via octoredis chokepoint constructor

1. Verification Summary

Item Status Evidence
Chokepoint constructors correct βœ… pkg/redis/options.go:67-79 β€” NewInstrumentedClient delegates to InstrumentedClientFromOptions; both call rd.NewClient then liboredis.Instrument before returning
All 18 sites migrated uniformly βœ… Every rd.NewClient(octoredis.MustBuildOptions(...)) replaced with octoredis.NewInstrumentedClient(cfg, fn) or InstrumentedClientFromOptions(opts) β€” identical override semantics preserved
Zero remaining production bypass βœ… grep 'rd\.NewClient|redis\.NewClient' finds only _test.go files + options.go:76 (the chokepoint itself)
Instrument-before-share contract honored βœ… InstrumentedClientFromOptions calls Instrument(c) before return c β€” client not shared at instrumentation time
Idempotency guard functional βœ… octo-lib Instrument checks instrumented map under mutex (instrument.go:126-135) β€” repeated calls are no-op
Conn wrapper correctly routed βœ… pkg/redis/redis.go:34-36 β€” NewWithOptions now uses InstrumentedClientFromOptions(opts) instead of raw rd.NewClient(opts)
sync.Once patterns preserved βœ… incomingwebhook/api.go, integration/api.go β€” sync.Once wrapping intact; instrumented client created once
Observer error isolation βœ… octo-lib reportRedis wraps observer call in defer recover() (instrument.go:47) β€” observer panic cannot reach Redis caller
No sensitive data exposure βœ… Observer receives only cmd.Name() (low-cardinality command name); no keys/args/scripts
Tests validate hook installation βœ… instrument_test.go β€” both tests drive a command to dead address; WrapProcess fires regardless; observer receives "get"
go.mod bump correct βœ… octo-lib bumped to 0c34e6f108c4 which includes the exported Instrument from octo-lib#104

2. Findings

No P0/P1 findings.

P2 β€” InstrumentedClientFromOptions does not defensive-copy *rd.Options

Diff-scope: pre-existing (octo-server's NewWithOptions had the same behavior before this PR; PR does not change it).

octo-lib's redis.NewWithOptions takes a shallow copy (local = *opts) before mutating MaxRetries. octo-server's InstrumentedClientFromOptions passes opts directly to rd.NewClient(opts) without copying. If a caller reused the same *rd.Options across multiple constructors, go-redis internal mutations could leak across.

Why P2 not P1: All current callers either use fresh options from MustBuildOptions (which allocates a new struct each call) or readinessRedisOptions (which also returns a fresh struct). No reuse pattern exists in practice. Purely a latent hardening concern.

Suggestion: Consider local := *opts at the top of InstrumentedClientFromOptions to match octo-lib's defensive pattern. Low priority.

P2 β€” Test nil-check asymmetry

Diff-scope: new (introduced by this PR's test file).

TestNewInstrumentedClientInstruments includes if c == nil { t.Fatal(...) } (line 37), but TestInstrumentedClientFromOptionsInstruments does not. Both constructors ultimately call rd.NewClient which never returns nil, so this is cosmetic only. Minor hygiene nit.

3. Data-Flow Traceback

Path A: Raw control-plane clients (~18 sites)

Call site (e.g. main.go:202)
  β†’ octoredis.NewInstrumentedClient(cfg, overrides...)
    β†’ octoredis.MustBuildOptions(cfg, overrides...)  // fresh *rd.Options with TLS
    β†’ InstrumentedClientFromOptions(opts)
      β†’ rd.NewClient(opts)                          // creates *rd.Client
      β†’ liboredis.Instrument(c)                      // installs WrapProcess hook
      β†’ return c                                     // client instrumented before share

At runtime, every command on these clients triggers WrapProcess β†’ reportRedis(cmd.Name(), dur, err) β†’ process-level RedisObserver (set by libredis.SetRedisObserver(metrics.ObserveRedisCmd) in main.go:248). Data flows correctly to dependency="redis" metric. βœ…

Path B: Conn wrapper (pkg/redis/redis.go)

db.NewRedisFromConfig(cfg) or redis.New(addr, pass)
  β†’ octo-server's redis.NewWithOptions(opts)
    β†’ InstrumentedClientFromOptions(opts)
      β†’ rd.NewClient(opts) + Instrument(c)
    β†’ &Conn{client: instrumented_client}

Conn methods (Get/Set/Del/etc.) delegate to rc.client which is instrumented. Data flows correctly. βœ…

Path C: ctx.GetRedisConn() (octo-lib shared Conn)

config.Context.NewRedisCache()
  β†’ octo-lib's redis.NewWithOptions(opts)   // octo-lib internal
    β†’ rd.NewClient(&local) + wrapClient(client)  // octo-lib's internal wrapping

This path is not touched by this PR β€” it was already instrumented by octo-lib's internal wrapClient. No change in behavior. βœ…

Path D: Health probe

newDependencyReadinessChecker(ctx, db)
  β†’ readinessRedisOptions(cfg)             // fresh *rd.Options via MustBuildOptions
  β†’ octoredis.InstrumentedClientFromOptions(opts)
    β†’ rd.NewClient(opts) + Instrument(c)
  β†’ stored in dependencyReadinessChecker.redisClient

Health check's pingRedis calls c.redisClient.WithContext(ctx).Ping(). Since Instrument(c) runs before WithContext clone, the cloned client inherits the process hook. Data flows correctly. βœ…

4. Blindspot Checklist (R5 β€” security_sensitive PR)

C1 β€” Dual-path parity

N/A. This PR adds observability instrumentation; no create/remove, subscribe/unsubscribe, or authorization symmetry paths are involved. The change is purely additive (adding a timing hook) and does not modify any control-flow or permission boundaries.

C2 β€” Control-flow ordering / nested reuse + safety control non-canonical probing

Clear. The only control-flow ordering concern is WrapProcess hook stacking. Verified:

  • liboredis.Instrument is idempotent via instrumented map under mutex β€” double-call is no-op.
  • InstrumentedClientFromOptions calls wrapClient directly (bypasses the map), but each call site creates a fresh client via rd.NewClient, so no double-wrap occurs.
  • WithContext clones inherit the installed hook (clone copies the process field at clone time, after instrumentation).
  • No nested/compound wrapping patterns exist β€” each client is instrumented exactly once.

C3 β€” Authorization boundary β‰  capability boundary

N/A. No authorization, permission, or capability changes. Pure observability.

C4 β€” Authorization lifecycle / container-member state cascade

N/A. No auth changes.

C5 β€” Build/note pass β‰  runtime path correctness

Clear. No build artifacts, browser extensions, CLI tools, or relative URLs involved. This is a Go server-side change. The go build, go vet, go test -race, and golangci-lint results reported in the PR description are consistent with the code changes. Runtime path: WrapProcess installs at construction time, before any commands are issued β€” the hook is present when traffic starts.

C6 β€” Governance / policy / security document consistency

N/A. No documentation, policy, or governance changes.

5. Additional Observations

  • Memory consideration: liboredis.Instrument registers each client in a process-level map[*rd.Client]struct{}. All ~18 control-plane clients are long-lived singletons (created at startup via constructors or sync.Once), so this is not a memory leak. The godoc explicitly warns against using Instrument for short-lived clients.
  • octo-lib Conn already covered: ctx.GetRedisConn() goes through octo-lib's own redis.NewWithOptions which internally calls wrapClient. The shared data-plane Conn was already instrumented before this PR.
  • main.go comment refresh: The updated coverage comment (lines 241-246) accurately reflects the new state β€” control-plane clients are now instrumented via the chokepoint constructors.

6. Verdict

No P0 or P1 findings. Two P2 nits (options defensive copy, test nil-check asymmetry) β€” both are minor hygiene concerns with no production impact.

[Octo-Q] verdict: APPROVE β€” clean, mechanical observability improvement. All production redis clients now feed dependency="redis" through a single chokepoint. Instrument-before-share contract honored. Idempotency guard functional.

@OctoBoooot OctoBoooot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(redis): instrument raw redis clients via octoredis chokepoint constructor (#495)

Verdict: Approve with comments β€” clean, uniform instrumentation refactor; the chokepoint claim and the pool-stats interaction both byte-verify. No blockers, no majors.
CI is red on check-sprint only (admin Sprint-field gate); Build βœ“ and Vet βœ“ pass, Test was still pending at review time. Posting COMMENT per the CI precondition β€” clean approve once Test goes green and the Sprint field is set.

Risk tier: high (pkg/wkhttp/ratelimit_helper.go β€” rate limiter). needs-human-review + size/M + dependencies-changed on the PR. Established author (100 merged, 0 closed-unmerged).

Verified β€” the load-bearing claims

  • βœ… "No raw rd.NewClient redis site remains β€” grep confirms" (the PR's central thesis) β€” independently grepped the whole repo. Zero raw redis-client constructions outside pkg/redis. The remaining NewClient( hits are different libraries (apns2 / elastic / sts / oidc / smtp). The four files that import go-redis but aren't in the diff (bot_api/events.go, robot/api.go, incomingwebhook/ratelimit.go, pkg/metrics/pool.go) all use ctx.GetRedisConn() (already-instrumented lib path) or receive a client by parameter β€” none bypass the chokepoint.
  • βœ… Pool-stats coverage is NOT traded away for command-timing β€” the subtle interaction: rlRedis switched to NewInstrumentedClient (returns *rd.Client), and main.go:249-250 still passes that same instance into RegisterPoolCollectors. So the ratelimit client now feeds both dependency="redis" timing AND pool stats β€” the instrumentation didn't displace the existing collector registration.
  • βœ… go.mod is replace-free (the #463 local-replace trap) β€” clean bump of octo-lib to the merged #104 commit (...cff4d7a48f55 β†’ ...0c34e6f108c4), no replace/file:///../ directives. Tests pass against the bumped lib; go build of the touched packages is clean.
  • βœ… Both constructors instrument before returning β€” InstrumentedClientFromOptions calls liboredis.Instrument(c) before handing the client back, satisfying octo-lib's call-before-share contract; NewWithOptions (the Conn path) now routes through it too, so wrapped and raw clients are consistent.
  • βœ… Test pins real behavior without live Redis β€” drives a GET at 127.0.0.1:1 so WrapProcess fires and the observer sees get, proving the hook is installed (not just that the constructor returns non-nil). Both constructors covered.

Minor

  • pkg/redis/instrument_test.go β€” see inline. The two constructors are tested, but the architectural invariant the PR rests on (no raw rd.NewClient outside the chokepoint) isn't pinned by a source-guard test. The migration is correct today; a guard keeps a future site from silently reopening the blind spot β€” the exact regression Β§2 of the COMPREHENSION names.

Praise

  • The chokepoint is the right shape, and the two-constructor split is deliberate β€” NewInstrumentedClient(cfg, ...) for the common case and InstrumentedClientFromOptions(opts) for the few sites (health probe) that pre-build their own Options. Both funnel through one Instrument call, so there's exactly one place instrumentation can be forgotten, and the Conn path was wired through it too rather than left as a second uninstrumented door.
  • Keeping rlRedis registered with the pool collector after the switch β€” easy to miss that swapping the constructor could have dropped the client out of RegisterPoolCollectors and silently lost pool stats. The same instance feeds both metrics; that's the careful version of this change.
  • The hook's safety envelope is stated and matches the dependency β€” panic-isolated, in-memory Observe only, low-cardinality (cmd.Name() only, no keys/args/scripts), idempotent + nil-safe per octo-lib#104. The cardinality note in particular is the thing that turns a metrics change into an incident if gotten wrong, and it's explicitly bounded.

Out of scope (informational)

  • CI check-sprint β€” same admin Sprint-field gate as the rest of this cycle.
  • The follow-up chain (octo-lib#96 raw-client coverage) is referenced correctly; this PR is the server-side wiring of the merged #104 export, appropriately scoped.

Comment thread pkg/redis/instrument_test.go
yujiawei
yujiawei previously approved these changes Jun 28, 2026

@yujiawei yujiawei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review β€” PR #495 (octo-server)

Scope reviewed: head SHA af60fa5317194cfee86f82fa75477716d5ccc65c, merge-base f438ff42. 22 files, +148/-44. This is a security-sensitive change (touches rate-limiters, OIDC token/state/bind stores, auth paths), so it was reviewed with extra care, including an independent build, race-enabled unit run, and cross-checks of the upstream octo-lib instrumentation contract.

1. Specification compliance

Spec: βœ…

The stated goal β€” route every raw *redis.Client through a single instrumented chokepoint constructor β€” is fully and exactly implemented.

  • Nothing missing. Every prior rd.NewClient(octoredis.MustBuildOptions(...)) production site is migrated to octoredis.NewInstrumentedClient(...): main.go, pkg/wkhttp/ratelimit_helper.go, and modules bot_api, bot_provision, common/health, group, incomingwebhook, integration, oidc (api/bind_store/logout_idtoken/state_store/sync_lock), opanalytics/etl_lock, space, user (Γ—2), usersecret. A whole-repo scan for the old rd.NewClient(...) pattern in non-test code returns zero remaining sites. health.go correctly uses InstrumentedClientFromOptions for its pre-built options, and pkg/redis/redis.go's Conn constructor now also routes through the instrumented path.
  • Nothing extra. No new env vars, flags, config fields, or behavioral switches. Only two helper funcs (NewInstrumentedClient, InstrumentedClientFromOptions), two tests, the call-site migrations, the octo-lib bump, and a corrected coverage comment in main.go.
  • No deviation. Each migrated client preserves its exact options (rate-limiters MaxRetries=1/PoolSize=10; OIDC stores MaxRetries=3 + 3s dial/read/write; bot registry timeouts; health probe pool/timeouts). The octo-lib bump 20260626… β†’ 20260628015025-0c34e6f108c4 is the version that exposes Instrument; go.mod/go.sum verify clean and no stale version is referenced in code.

2. Code quality

Quality: Approved

Verified the load-bearing correctness and security properties:

  • No double-instrumentation. The most important risk for this kind of "wrap everything" change is a client being timed twice (2Γ— metrics, no runtime signal). It cannot happen here. octo-lib's public Instrument() is idempotent via a mutex-guarded global map and is the only path octoredis uses; the lib's internal wrapClient (used by the lib's own New/NewWithOptions for the ctx.GetRedisConn() data plane) operates on freshly-constructed clients that never pass back through Instrument(). The control-plane and data-plane instrumentation paths never converge on the same *rd.Client.
  • TLS not weakened. NewInstrumentedClient builds options via MustBuildOptions, which applies TLSConfig from cfg.DB and only then runs caller overrides (overrides cannot drop TLS). Behavior is identical to the pre-PR rd.NewClient(MustBuildOptions(...)).
  • No fail-open regression. The instrumentation hook (WrapProcess) only observes timing and forwards the original error unchanged; redis.Nil is normalized for metrics only, not for the caller. Rate-limiter and OIDC fail-open/closed posture is unaffected.
  • No high-cardinality / sensitive leakage. The observer reports only the lowercased command name (e.g. get, eval) β€” no keys, args, uids, or tokens β€” consistent with the dependency/op/status label contract.
  • Lifecycle is sound. Instrument() pins clients in a process-global map (never GC'd), so the lib mandates startup-singleton usage only. Every migrated call site qualifies: constructed once in New()/Route() (run once at module setup) or behind sync.Once/mutex guards (SharedUIDRateLimiter, the sharedRateRedis helpers). No per-request or hot-loop construction.
  • Tests + build. go build ./... passes; go vet ./pkg/redis/... clean; the new pkg/redis/instrument_test.go passes under -race. The tests cleverly prove the hook fires even when the command fails (unreachable 127.0.0.1:1), which validates the plumbing without a live Redis.

Non-blocking notes (P2 / nits)

  • P2 (test hygiene): modules/user/api_authcode_token_redis_test.go:29 still constructs its client via the old rd.NewClient(octoredis.MustBuildOptions(...)) pattern rather than the new chokepoint. Test-only, no production impact, but migrating it keeps the "single chokepoint" invariant honest and prevents the old pattern from being copied forward.
  • P2 (observer restore in tests): instrument_test.go cleanup sets SetRedisObserver(nil) rather than capturing/restoring the prior observer. Fine today (no package-level default observer), but capture-and-restore would be more robust if a suite-level observer is ever introduced.
  • Nit: the updated main.go coverage comment is accurate; consider a brief note that RegisterPoolCollectors still only registers the ratelimit pool by name (pre-existing; pool metrics for the other singletons are not individually registered, though command-latency now covers them).

3. Overall verdict

APPROVE. Spec is fully met with no scope creep, and the implementation preserves TLS, error semantics, and lifecycle correctness with no double-counting. Only P2/nit items remain, none blocking merge.

4. Items a human may wish to spot-check (security-sensitive)

  • Confirm in a staging environment that dependency="redis" metrics for an OIDC/rate-limit path show a single command count per request (sanity check against the double-counting class of bug, even though static analysis rules it out).
  • The Instrument global map intentionally retains client references for the process lifetime β€” acceptable given all sites are startup singletons; worth keeping in mind if any future caller constructs instrumented clients dynamically.

Address #495 review feedback in one batch:

- Source guard (OctoBoooot): TestNoRawRedisClientOutsideChokepoint walks the repo
  and fails if any non-test production file outside pkg/redis constructs a redis
  client via raw rd.NewClient/redis.NewClient. Pins the PR's core invariant β€” a
  future site can't silently reopen the dependency="redis" blind spot β€” the same
  way Test*NoLegacyResponseError pins the i18n invariant.
- Defensive copy (Octo-Q): InstrumentedClientFromOptions now copies *rd.Options
  before handing it to go-redis (which mutates defaults in place), matching
  octo-lib redis.NewWithOptions; protects against caller reuse of the same opts.
- Migrate the last old-pattern site (yujiawei): a user test still used
  rd.NewClient(octoredis.MustBuildOptions(...)); route it through the chokepoint
  too so the invariant holds in tests as well.
- Test nil-check symmetry (Octo-Q nit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HHTDHbMPAVkNvYdQkh2ts3
@an9xyz an9xyz dismissed stale reviews from yujiawei, mochashanyao, and Jerry-Xin via 8f7bce0 June 28, 2026 03:15
@github-actions github-actions Bot added size/L PR size: L and removed size/M PR size: M labels Jun 28, 2026

@OctoBoooot OctoBoooot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(redis): instrument raw redis clients via octoredis chokepoint constructor (#495) β€” delta @ 8f7bce0

Verdict: Approve β€” the delta pins the chokepoint invariant my prior minor asked for and folds in both peer nits. No blockers, no majors. CI check-sprint now passes (the only thing that held my prior verdict to COMMENT); Build/Test/Vet/Lint pending. Upgrading COMMENT β†’ APPROVE.

Delta from prior COMMENT @ af60fa53: 1 commit β€” "test(redis): pin the chokepoint invariant + review nits".

Resolved β€” byte-verified

  • βœ“ Source-guard test added (my minor) β€” chokepoint_guard_test.go walks the repo and fails if rd.NewClient(/redis.NewClient( appears in any non-test .go outside pkg/redis. I adversarially confirmed it's not a no-op: planted a rd.NewClient in a temp modules/zzfaketest/fake.go β†’ the test FAILS with the violation path; removed it β†’ passes. Correctly skips the chokepoint dir + _test.go, and uses \b so octoredis.NewClient (if ever added) won't false-match. This is the invariant the PR's whole thesis rests on, now regression-proof β€” modeled on the existing i18n NoLegacyResponseError guard.
  • βœ“ Defensive copy in InstrumentedClientFromOptions (mochashanyao 🟑) β€” local := *opts; rd.NewClient(&local). Correct: go-redis writes its defaults into the value fields, so the shallow copy prevents mutating a caller-reused *rd.Options; the shared *tls.Config is only read, not mutated, so shallow suffices. The comment cites parity with the lib's NewWithOptions handling.
  • βœ“ nil-check consistency (mochashanyao 🟑) β€” both Instrument tests now assert c == nil before use.
  • βœ“ One more test site migrated β€” api_authcode_token_redis_test.go switched its raw rd.NewClient β†’ NewInstrumentedClient, which the guard doesn't even require (it excludes _test.go). Consistency beyond what's enforced.

Full pkg/redis suite + build of touched packages pass.

Praise

  • The guard test is the right way to make "single chokepoint" durable. A prose claim that no raw client remains is true at one SHA; a source-walking test makes it true at every future SHA, and it fails loudly with the exact file path and the remediation in the message. Verified it actually catches a violation rather than just passing on the clean tree β€” which is the difference between a guard and a decoration.
  • The defensive copy matches the lib's own contract instead of inventing a new one. Pointing the comment at octo-lib redis.NewWithOptions's identical handling means the two constructors now have the same options-aliasing semantics β€” no surprise where one mutates the caller's struct and the other doesn't.

Out of scope (informational)

  • Build/Test/Vet/Lint pending at review time; the delta is a self-contained guard test + a one-line defensive copy + test nits. pkg/redis is green locally.
  • Minor scope note (not a finding): the guard excludes _test.go, so a test could still build an uninstrumented raw client β€” acceptable, since tests don't emit production metrics, and the author migrated the one such site anyway.

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is in scope and correctly routes Redis control-plane clients through an instrumentation chokepoint.

πŸ”΄ Blocking: None.

πŸ’¬ Non-blocking:

  • 🟑 Warning: pkg/redis/options.go uses liboredis.Instrument, whose idempotency guard retains client references globally. Current migrated call sites are process-lived, so this is acceptable, but future short-lived uses of this public constructor could leak clients. Consider documenting β€œstartup/singleton clients only” more explicitly or exposing a no-retain constructor from the library later.
  • πŸ”΅ Suggestion: pkg/redis/redis.go still mutates caller-owned opts.MaxRetries before the new copying constructor runs. This was existing behavior, but it now differs from the updated library wrapper’s copy-before-default pattern. Copying first would make the wrapper semantics cleaner.
  • πŸ”΅ Suggestion: pkg/redis/chokepoint_guard_test.go only catches rd.NewClient and redis.NewClient aliases. That matches current code, but an AST/import-path based guard would be harder to bypass accidentally.

βœ… Highlights:

  • The relevance gate passes: Redis observability is part of this backend’s runtime behavior.
  • The central constructors preserve Redis config/TLS flow and install hooks before returning clients.
  • The hook-install tests cover both new construction paths without requiring a live Redis.
  • go test ./pkg/redis/... passed. go test ./pkg/redis/... ./modules/common/... failed in modules/common due an existing local test database migration-state issue, not this patch path.

@OctoBoooot OctoBoooot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurring with @Jerry-Xin's 🟑#2 β€” byte-verified, and it corrects an overstatement in my APPROVE. The defensive copy is real but one frame too late on the Conn path:

  • InstrumentedClientFromOptions now does local := *opts before rd.NewClient βœ“.
  • But NewWithOptions (redis.go:31-33) mutates the caller's struct in place β€” opts.MaxRetries = 3 β€” before calling InstrumentedClientFromOptions. So a caller passing a shared *rd.Options to NewWithOptions still gets MaxRetries written under them; the copy guards the go-redis-internal writes but not this pre-copy mutation.

I praised the copy as giving "the same options-aliasing semantics" as the lib path β€” that's not accurate for NewWithOptions, which still mutates the input. My mistake.

Reachability today: both callers pass a freshly-built Options (&rd.Options{...} literal; MustBuildOptions(cfg) allocates fresh), and MaxRetries=3 is benign β€” so no live bug, hence 🟑 not a defect. Fix is one line: copy first, then default β€” local := *opts; if local.MaxRetries == 0 { local.MaxRetries = 3 }; return &Conn{client: InstrumentedClientFromOptions(&local)} (and InstrumentedClientFromOptions can keep its own copy or take the pre-copied value).

🟑#1 (document Instrument call-before-share / client lifecycle) and 🟑#3 (harden the guard with import-path/AST matching instead of the \b(rd|redis)\.NewClient regex, which a differently-aliased import could slip past) are both fair non-blocking improvements. Verdict unchanged: APPROVE stands.

@yujiawei yujiawei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review β€” PR #495 (octo-server)

Reviewed at head SHA 8f7bce0af322de3723e716f1f88e5f41fb227b75 against merge-base f438ff42. Build (go build ./...), go vet ./pkg/redis/..., and the new pkg/redis tests all pass locally. This is the follow-up that closes the dependency="redis" coverage gap left by the earlier dependency-metrics work β€” the ~15 bare rd.NewClient(...) control-plane clients (rate limiters, OIDC stores/locks, bot registry, auth, health probe, ETL lock) were previously invisible to the redis latency metric.

1. Spec compliance

Spec: βœ…

  • Missing work: none. Two chokepoint constructors are added (NewInstrumentedClient, InstrumentedClientFromOptions), every previously-bare rd.NewClient(octoredis.MustBuildOptions(...)) site is routed through them, a source guard test pins the invariant, two tests prove the hook actually fires, and the octo-lib dependency is bumped to the version exposing Instrument().
  • Over-build: none. No new flags, fields, or endpoints beyond the stated goal.
  • Deviation: none. TLS is still funnelled through BuildOptions; instrumentation happens at construction time, before the client is shared β€” satisfying octo-lib's "instrument before share/clone" contract. The one WithContext(ctx) clone (health readiness probe, modules/common/health.go:175) is correct because the original client is instrumented at construction and go-redis v6 clone() copies the wrapped process field.

The instrument tests are the key strength here: they assert a real GET reaches the observer rather than just asserting the constructor returns non-nil, so this PR demonstrably does what its commit message claims.

2. Code quality

Quality: Approved

No P0/P1 issues. The defensive shallow-copy in InstrumentedClientFromOptions (pkg/redis/options.go:78) is correct β€” rd.NewClient calls opt.init() which mutates the options in place, so copying before construction is the right call and matches octo-lib's own pattern.

The following are non-blocking P2 suggestions (consider for a follow-up; none should hold up merge):

  • P2 β€” exported Conn path can pin clients in a process-global map. pkg/redis/redis.go:36 now routes NewWithOptions through InstrumentedClientFromOptions, which calls the public liboredis.Instrument(). That registers the client in octo-lib's never-evicted instrumented map. octo-lib deliberately avoids this for its own Conn by using the internal wrapClient. This is harmless today: octoredis.NewWithOptions / octoredis.New / pkg/db.NewRedis* have zero production callers (verified across the repo), and all 19 bare-client sites are startup singletons (the SharedUIDRateLimiter path is additionally guarded by a once-flag, so its 22 call sites build a single client). It only becomes a leak if a future caller uses the exported Conn constructor for short-lived clients. Worth a doc-comment on NewWithOptions warning it's for process-lifetime clients only.

  • P2 β€” NewWithOptions doesn't fully mirror octo-lib's copy/nil semantics. It mutates the caller's *rd.Options (MaxRetries) before the copy and panics on a nil opts, whereas octo-lib copies first and tolerates nil. The new InstrumentedClientFromOptions itself also dereferences opts without a nil check (pkg/redis/options.go:78). A if opts == nil { return nil } guard (or explicit panic message) would make the exported helper friendlier. Low impact since current callers always pass non-nil.

  • P2 β€” source guard has evasion blind spots. pkg/redis/chokepoint_guard_test.go:23 matches only rd.NewClient( / redis.NewClient(. A differently-aliased import (rdb.NewClient, dot import) or a different constructor (NewFailoverClient, NewClusterClient, NewRing, NewUniversalClient) would silently bypass both the guard and the instrumentation. None of those constructors exist in the repo today, so this is forward-looking hardening only β€” broaden the regex (e.g. \.New(Client|FailoverClient|ClusterClient|Ring|UniversalClient)\() if Sentinel/Cluster support is ever added.

3. Security notes (security-sensitive PR)

The instrumented clients sit on auth-adjacent paths (login/OIDC rate limiters, OIDC state/bind/logout stores, token setter). The change is observe-only: WrapProcess times the command and forwards to the observer without altering the returned error, and redis.Nil is normalised to non-error so cache misses don't pollute error rates. No behavioural change to rate-limiting or auth logic, no secrets touched, TLS config preserved through BuildOptions. Nothing here requires manual security sign-off beyond confirming the metric wiring, which the tests cover.

4. Verdict

Spec βœ… and Quality Approved β†’ APPROVE. The three P2s are polish, not blockers.

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q Β· automated review]

Verdict: Approve β€” no blocking findings; notes below (data-flow traced).


octo-server PR#495 Review Report

Reviewer: Octo-Q (automated review)
PR: #495
Head SHA: 8f7bce0af322de3723e716f1f88e5f41fb227b75
Base: f438ff42 (main)
Scope: 24 files, +235 / -46

1. Summary

PR introduces octoredis.NewInstrumentedClient and octoredis.InstrumentedClientFromOptions as a chokepoint constructor for raw *rd.Client instances, ensuring all ~15 previously-uninstrumented call sites feed into dependency="redis" metrics. Mechanical replacement of rd.NewClient(octoredis.MustBuildOptions(...)) β†’ octoredis.NewInstrumentedClient(...). Adds a source-code guard test (TestNoRawRedisClientOutsideChokepoint) and instrumentation smoke tests.

2. Verification Conclusions

Item Status Evidence
liboredis.Instrument exists in pinned octo-lib version βœ… octo-lib@v0.0.0-20260628015025-0c34e6f108c4/pkg/redis/instrument.go β€” full implementation with mutex-guarded idempotence map
Instrument is idempotent (no double-wrap) βœ… instrumentedMu sync.Mutex + instrumented = map[*rd.Client]struct{}{} β€” checked before wrapClient; TestInstrumentIdempotent confirms
No double-instrumentation on ctx.GetRedisConn() path βœ… octo-lib's Conn creates its own *rd.Client via internal NewWithOptions β†’ Instrument; octo-server's Conn creates independent *rd.Client via InstrumentedClientFromOptions β€” two separate client instances, never cross-wrapped
Shallow copy (local := *opts) is safe for go-redis v6 βœ… go-redis v6 NewClient only mutates value-type fields via opt.init(); no pointer/slice mutation. Consistent with octo-lib's identical pattern
SetRedisObserver is process-global atomic singleton βœ… atomic.Pointer[RedisObserver] β€” Store/Load, panic-recovery in reportRedis, tested
All 15+ call sites migrated consistently βœ… Each call site: rd.NewClient(octoredis.MustBuildOptions(cfg, func(o *rd.Options){...})) β†’ octoredis.NewInstrumentedClient(cfg, func(o *rd.Options){...}). Same overrides preserved.
Only one rd.NewClient remains in production code (inside chokepoint) βœ… pkg/redis/options.go:79 β€” correctly excluded by guard test
Guard test regex covers known aliases βœ… `\b(rd
InstrumentedClientFromOptions used correctly in health.go βœ… readinessRedisOptions returns a fresh *rd.Options from MustBuildOptions β€” no shared pointer risk
NewWithOptions in pkg/redis/redis.go routes through chokepoint βœ… Creates independent *rd.Client, single wrap. Production callers: dead code only (pkg/db/redis.go has 0 callers)

3. Findings

P2 β€” NewWithOptions mutates caller's *rd.Options before copy

File: pkg/redis/redis.go:36-38
Severity: P2 (maintainability / side-effect hygiene)
Diff-scope: pre-existing behavior, unchanged by this PR (was already mutating via direct rd.NewClient(opts))

func NewWithOptions(opts *rd.Options) *Conn {
    if opts.MaxRetries == 0 {
        opts.MaxRetries = 3   // mutates caller's struct
    }
    return &Conn{client: InstrumentedClientFromOptions(opts)} // copies here
}

InstrumentedClientFromOptions does a defensive copy (local := *opts), but opts.MaxRetries = 3 runs on the original struct before the copy. octo-lib's equivalent copies first, then mutates. Currently not reachable in production (the only caller path pkg/db/redis.go has 0 callers), but if future code reuses the same *rd.Options, MaxRetries would be silently changed.

Suggestion: Move the mutation after the copy:

func NewWithOptions(opts *rd.Options) *Conn {
    local := *opts
    if local.MaxRetries == 0 {
        local.MaxRetries = 3
    }
    return &Conn{client: InstrumentedClientFromOptions(&local)}
}

P2 β€” Guard test regex does not cover struct-literal rd.Client{} construction

File: pkg/redis/chokepoint_guard_test.go:32
Severity: P2 (test coverage gap, low risk)
Diff-scope: new code (the guard itself is introduced by this PR)

The regex \b(rd|redis)\.NewClient\( catches the standard constructor path but would miss &rd.Client{} or rd.Options{...} literal construction. In practice go-redis v6 doesn't expose a useful zero-value Client, so the bypass risk is theoretical. But if someone wraps &rd.Client{} for testing or edge cases, the guard won't catch it.

Suggestion: Consider adding &\s*(rd|redis)\.Client\{ to the regex as defense-in-depth.

No P0 or P1 findings.

4. Data-Flow Trace

Consumed Data Upstream Source Runtime Flow Verified
*rd.Client returned by NewInstrumentedClient rd.NewClient(&local) inside InstrumentedClientFromOptions β†’ liboredis.Instrument(c) wraps WrapProcess βœ… β€” WrapProcess installed before client is returned to caller; all subsequent commands flow through hook β†’ reportRedis β†’ SetRedisObserver callback
*rd.Client returned by InstrumentedClientFromOptions (health probe) readinessRedisOptions β†’ MustBuildOptions(cfg, overrides) returns fresh *rd.Options β†’ defensive copy β†’ rd.NewClient β†’ Instrument βœ… β€” single wrap, no shared pointer
Observer callback (metrics.ObserveRedisCmd) Set once at main.go:248 via libredis.SetRedisObserver β†’ stored in atomic.Pointer βœ… β€” every instrumented client's WrapProcess calls reportRedis which Loads the pointer; panic-recovery in reportRedis prevents observer crash from taking down Redis calls
TLS config in instrumented clients MustBuildOptions β†’ BuildOptions β†’ liboredis.BuildTLSConfig when cfg.DB.RedisTLS == true β†’ stored in opts.TLSConfig βœ… β€” defensive copy shares TLS pointer (read-only by go-redis v6 init()), TLS config correctly applied
Conn wrapping in pkg/redis/redis.go NewWithOptions β†’ InstrumentedClientFromOptions β†’ fresh *rd.Client + single Instrument βœ… β€” independent from ctx.GetRedisConn()'s client; no double-wrap

5. Blind-Point Checklist (R5)

C1 β€” Dual-path parity: N/A. No symmetric add/remove or subscribe/unsubscribe paths touched. All changes are parallel mechanical substitutions of the same pattern.

C2 β€” Control-flow ordering / nested reuse: Clear. Instrument is idempotent (mutex + map guard), so even if accidentally called twice on the same client, no double-wrap occurs. WrapProcess hook ordering: installed once at construction, before any commands flow. No nested or re-entrant call sites.

C3 β€” Authorization boundary β‰  capability boundary: N/A. No permission/jail/tool exposure changes. This is a metrics instrumentation PR.

C4 β€” Authorization lifecycle / container-member state cascade: N/A. No auth changes.

C5 β€” Build β‰  runtime path: Clear. The changes are pure Go source-level substitutions. No build artifacts, extensions, CLI tools, relative paths, or environment-dependent behavior introduced. The go.mod bump to octo-lib is a standard dependency update; the new Instrument / SetRedisObserver functions exist in the pinned version.

C6 β€” Governance / policy / security document self-consistency: N/A. No documentation or policy changes.

6. Cross-Round Blocker Recheck (R6)

N/A β€” first review of this PR.


[Octo-Q] verdict: APPROVE + This PR cleanly closes the dependency="redis" observability gap by routing all raw *rd.Client construction through an instrumented chokepoint. The implementation is mechanically correct: Instrument is idempotent, no double-instrumentation exists (verified by tracing both ctx.GetRedisConn() and bare-client paths), the shallow copy is safe for go-redis v6, and the guard test effectively prevents regression. Two P2 suggestions (pre-copy mutation in NewWithOptions, guard regex coverage for struct literals) are non-blocking improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies-changed This PR modifies dependency files needs-human-review size/L PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants