Skip to content

feat(cache): add in flight deduping#4459

Open
MasterPtato wants to merge 1 commit into03-18-fix_cache_clean_up_libfrom
03-19-feat_cache_add_in_flight_deduping
Open

feat(cache): add in flight deduping#4459
MasterPtato wants to merge 1 commit into03-18-fix_cache_clean_up_libfrom
03-19-feat_cache_add_in_flight_deduping

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Mar 19, 2026

🚅 Deployed to the rivet-pr-4459 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Mar 24, 2026 at 7:53 am
frontend-inspector 😴 Sleeping (View Logs) Web Mar 24, 2026 at 1:03 am
website ❌ Build Failed (View Logs) Web Mar 19, 2026 at 8:53 pm
mcp-hub ✅ Success (View Logs) Web Mar 19, 2026 at 8:52 pm
ladle ❌ Build Failed (View Logs) Web Mar 19, 2026 at 8:51 pm

Copy link
Contributor Author

MasterPtato commented Mar 19, 2026

@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 175706a to 17d21f5 Compare March 19, 2026 20:53
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 75e1e38 to bc5d6d3 Compare March 19, 2026 20:53
@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 19, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4459

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4459

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4459

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4459

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4459

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4459

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4459

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4459

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4459

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4459

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4459

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4459

commit: 6570bf2

@claude
Copy link

claude bot commented Mar 19, 2026

Code Review: feat(cache): add in flight deduping

The core idea of in-flight request deduplication is sound and addresses a real stampede problem. Here are my findings:

Critical Issue 1: Lease not released on getter error. If the getter returns an Err, try_join short-circuits via ?, so the lease release loop never executes. This permanently stalls subsequent requests for those keys (5-second wait per attempt) until the process restarts. A guard/defer pattern would ensure cleanup regardless of the error path.

Critical Issue 2: Broadcast not sent on empty resolution or getter error. broadcast_tx.send is only called when entries_values is non-empty. If the getter resolves no values or errors, waiters are stuck until IN_FLIGHT_TIMEOUT (5 seconds). The broadcast should be sent unconditionally after the getter completes. Combined with issue 1, a getter error causes both a leaked lease and a 5-second stall for all waiters.

Moderate Issue 3: HashMap iteration order creates implicit coupling. In req_config.rs, keys and cache_keys are unzipped from ctx.entries() with non-deterministic HashMap iteration order, then keys is zipped with cached_values from the driver. This works because both were derived from the same iterator in the same pass, but it is fragile. A Vec pairing (Key, RawCacheKey) would make the relationship explicit and safe. The same issue applies in the waiting-keys path (succeeded_keys / succeeded_cache_keys).

Moderate Issue 4: Rate limit tests silently removed. integration.rs contained test_rate_limit_basic and test_rate_limit_ip_isolation. These do not appear in any of the new test files (fetch.rs, in_flight.rs, ttl.rs). If rate limiting is still a feature of this crate, these tests should be preserved.

Minor Issue 5: timeout_falls_back_to_getter test adds 5 seconds to the test suite. The test necessarily waits for IN_FLIGHT_TIMEOUT. Making the timeout configurable via a cfg(test) override or a parameter on CacheInner would allow faster test runs.

Minor Issue 6: Inline await changes latency behavior on cache misses. The cache write was previously done in a background task; now it is awaited inline. This is likely intentional (to ensure broadcast happens after write), but it adds write latency to every cache miss response. A short comment documenting the trade-off would help future readers.

Positive Observations: Switching GetterCtx from Vec to HashMap removes O(n) deduplication and makes key lookups O(1). scc::HashMap is an appropriate choice for concurrent in-flight tracking. Test coverage for the deduplication logic is solid: single waiter, multiple waiters, independent keys, mixed cached/in-flight, and timeout fallback are all covered. Removing the anyhow glob import aligns with the project style guidelines. The Driver::get signature change to a slice of RawCacheKey is a correct ergonomics improvement.

@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from bc5d6d3 to b395fae Compare March 21, 2026 01:55
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 17d21f5 to de18421 Compare March 21, 2026 01:55
@MasterPtato MasterPtato mentioned this pull request Mar 21, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from de18421 to e65f84d Compare March 24, 2026 00:30
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from b395fae to 6570bf2 Compare March 24, 2026 00:30
@MasterPtato MasterPtato mentioned this pull request Mar 24, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 6570bf2 to b46226c Compare March 24, 2026 00:36
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch 2 times, most recently from 65200cf to ea32d90 Compare March 25, 2026 00:05
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from b46226c to 66ec30f Compare March 25, 2026 00:05
@claude
Copy link

claude bot commented Mar 25, 2026

Good overall approach. Deduplicating concurrent cache misses with a broadcast-based mechanism is the right design. The refactor of GetterCtx from Vec to HashMap is a clean improvement. A few issues need attention before merge.


Critical: Lease not released on getter error

In req_config.rs, if the lease-holder getter fails, tokio::try_join! short-circuits and returns early. The broadcast_tx.send and in_flight.remove_async calls are never reached.

Effect: Waiters hang for the full IN_FLIGHT_TIMEOUT (5 s) before falling back. The in_flight entries are never removed and leak for the lifetime of CacheInner. Every future request for those keys will find a stale entry, subscribe to a dead broadcast channel, and always time out.

Fix: use a RAII guard so cleanup always runs on drop, even on error.


Bug: Broadcast skipped when getter resolves no values

The broadcast send lives inside if !entries_values.is_empty(), but lease removal happens unconditionally after. If the getter resolves nothing (entity not found), entries_values is empty, the broadcast is never sent, but leases are removed. Waiters hang the full 5 s before timing out. Correct result, but severe latency penalty for a common not-found path.

Fix: move broadcast_tx.send outside the if so it fires unconditionally after the write attempt.


Ordering: remove from in_flight before broadcasting

Current order: (1) write to cache, (2) broadcast, (3) remove lease from in_flight. Between steps 2 and 3, a new request can find the stale entry, subscribe to the already-consumed channel, and wait 5 s. Swap the order: remove from in_flight first, then broadcast. New requests after removal will do a fresh cache read (already populated) and return immediately.


Test coverage gaps

The new in_flight.rs tests are excellent. Two additional cases worth adding:

  1. Getter error propagation: lease-holder getter returns Err -- verify in_flight is cleaned up and waiters do not hang indefinitely.
  2. All-miss getter: lease-holder getter resolves no values -- verify waiters receive the broadcast promptly rather than timing out after 5 s.

Minor

broadcast::channel with capacity 16 is created even when leased_keys is empty, and only one message is ever sent. Capacity 1 is sufficient.


Overall: the core mechanism is sound, but the error-path cleanup is a real correctness/leak bug and the broadcast-skip-on-empty is a meaningful latency regression on not-found paths. Both should be fixed before merging.

@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 66ec30f to 97b9cfd Compare March 26, 2026 01:18
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from ea32d90 to ddfa969 Compare March 26, 2026 01:18
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from ddfa969 to bed6ca4 Compare March 26, 2026 20:50
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 97b9cfd to 3fc4f7f Compare March 26, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant