fix: Use worker binding for R2 cache uploads to avoid API rate limits#1099
fix: Use worker binding for R2 cache uploads to avoid API rate limits#1099isaacrowntree wants to merge 3 commits intoopennextjs:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 37e5981 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Thanks for the PR @isaacrowntree |
|
Sorry @isaacrowntree |
There was a problem hiding this comment.
I had something simpler in mind.
- Derive a wrangler config from the project's wrangler config. What needs to be propagated is the R2 cache binding. It should have
remoteset totrueif we want to populate the remote R2 (falseotherwise). - Start a local worker using this config with a POSt endpoint
- Send rquests to this post endpoint.
Disclaimer: quick review, not much time tonight but I hope it makes at least some sense.
packages/cloudflare/src/cli/build/open-next/compile-cache-populate-handler.ts
Outdated
Show resolved
Hide resolved
|
Hey @vicb, thanks for the review! I've reworked the approach based on your feedback. What changed: Replaced the deploy→populate→redeploy cycle with your suggested approach: a local
What was removed:
Net result is ~470 fewer lines. The All 206 tests pass. Let me know if this is closer to what you had in mind. |
| const tempWranglerConfig = { | ||
| name: "open-next-cache-populate", | ||
| main: handlerPath, | ||
| compatibility_date: "2024-12-01", |
There was a problem hiding this comment.
| compatibility_date: "2024-12-01", | |
| compatibility_date: "2026-01-01", |
There was a problem hiding this comment.
Done — updated to "2026-01-01". Also removed the temp config file entirely by switching to unstable_startWorker with programmatic bindings (see comment on #305).
| binding: R2_CACHE_BINDING_NAME, | ||
| bucket_name: binding.bucket_name, | ||
| ...(binding.jurisdiction && { jurisdiction: binding.jurisdiction }), | ||
| ...(useRemote && { remote: true }), |
There was a problem hiding this comment.
| ...(useRemote && { remote: true }), | |
| remote: useRemote, |
There was a problem hiding this comment.
Done — simplified to remote: useRemote. Now passed directly in the unstable_startWorker bindings config.
| env: { | ||
| ...process.env, | ||
| CLOUDFLARE_LOAD_DEV_VARS_FROM_DOT_ENV: "false", | ||
| }, |
There was a problem hiding this comment.
I don't think we need it for this simple worker
| env: { | |
| ...process.env, | |
| CLOUDFLARE_LOAD_DEV_VARS_FROM_DOT_ENV: "false", | |
| }, |
There was a problem hiding this comment.
Removed — the entire spawn approach is gone now that we use unstable_startWorker, so this env var is no longer relevant.
| * | ||
| * @returns The local URL and a function to stop the worker. | ||
| */ | ||
| function startWranglerDev( |
There was a problem hiding this comment.
Could you please use the unstable_startWorker?
This should make the code quite simpler.
There was a problem hiding this comment.
Done — replaced the manual spawn + stdout parsing with unstable_startWorker. This also eliminated the temp wrangler config file since bindings are now passed programmatically. The code is significantly simpler:
const worker = await unstable_startWorker({
name: "open-next-cache-populate",
entrypoint: handlerPath,
compatibilityDate: "2026-01-01",
bindings: {
[R2_CACHE_BINDING_NAME]: {
type: "r2_bucket",
bucket_name: binding.bucket_name,
remote: useRemote,
},
},
dev: { server: { port: 0 }, inspector: false, watch: false, liveReload: false },
});Uses worker.ready, worker.url, and worker.dispose() for lifecycle management.
| value: fs.readFileSync(fullPath, "utf8"), | ||
| })); | ||
|
|
||
| const result = await sendBatchWithRetry(workerUrl, entries); |
There was a problem hiding this comment.
Given that the server is local, there is probably no need to batch (which might consumes a lot of memory if you have a lot of assets).
Sending a few assets at a time should be good enough and less memory intensive.
There was a problem hiding this comment.
Good call — reduced the default batch size from 100 to 5. Since the server is local, the overhead per request is negligible and this keeps memory usage low even for projects with thousands of prerendered pages.
There was a problem hiding this comment.
The templates stores part of the worker that will be deployed as the app.
What about creating a workers folder at the same level and add r2 to the file name?
There was a problem hiding this comment.
Done — moved to packages/cloudflare/src/cli/workers/r2-cache-populate-handler.ts. Created a new workers/ directory at the same level as templates/, with "r2" in the filename.
| } | ||
|
|
||
| export default { | ||
| fetch: (request: Request, env: CachePopulateEnv) => handleCachePopulate(request, env), |
There was a problem hiding this comment.
Maybe the fetch handler should check the method (POSTà, the pathname and if ok call popuateCache.
I think it would be easier to test populateCache that way.
There was a problem hiding this comment.
Done — the fetch handler now checks method and pathname, then delegates to populateCache:
export default {
async fetch(request: Request, env: CachePopulateEnv): Promise<Response> {
const url = new URL(request.url);
if (request.method === "POST" && url.pathname === "/populate") {
return populateCache(request, env);
}
return new Response("Not found", { status: 404 });
},
};populateCache is exported separately and can be tested directly without going through routing. Added routing tests as well.
vicb
left a comment
There was a problem hiding this comment.
Nice to see this PR taking a great shape!
Thanks for your work on this
commit: |
Uses a local wrangler dev worker with a remote R2 binding to populate the R2 incremental cache, bypassing Cloudflare API rate limits that cause failures with large numbers of prerendered pages. Flow: 1. Derive temp config with only the R2 binding (remote: true) 2. Start wrangler dev locally via unstable_startWorker 3. Send batched cache entries to the local worker 4. Worker writes directly to R2 via binding (no API limits) 5. Stop worker, then proceed with normal deploy Features: - Exponential backoff retry (3 attempts) - Partial failure handling (207 Multi-Status) - Progress bar with tqdm - Jurisdiction support Squashed from 5 commits for clean rebase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0f3be28 to
9bd33d4
Compare
Change cwd to a temp directory before calling unstable_startWorker to prevent it from reading the project's wrangler.jsonc and merging all bindings (DOs, KV, services, AI) into the cache populate worker. This fixes hangs in CI when the project has Durable Objects or service bindings that the cache populate worker doesn't export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes for R2 cache population: 1. Change cwd to a temp directory before calling unstable_startWorker to prevent it from reading the project's wrangler.jsonc and merging all bindings (DOs, KV, services, AI) into the cache populate worker. This fixes hangs in CI when the project has Durable Objects or service bindings that the cache populate worker doesn't export. 2. Retry individual failed entries from 207 (partial failure) responses with exponential backoff, rather than only retrying on full HTTP errors. Transient R2 503 errors no longer cause permanent data loss. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hey @vicb — saw your follow-up in #1121, great to see this being explored further! We've been running this version in production on campermate.com and wanted to share some findings from getting it working end-to-end: Config isolation is criticalThe biggest issue we hit was Fix: Pass
|
Summary
Fixes the R2 cache upload failure for large Next.js applications with thousands of prerendered pages. The issue is caused by the Cloudflare API rate limit of 1,200 requests per 5 minutes when using wrangler's
r2 bulk putcommand.This approach has been successfully running in production on campermate.com since the initial fix, populating 15+ cache entries per deploy with zero failures.
Approach
Uses
unstable_startWorker(wrangler) to spin up an isolated local worker with a remote R2 binding, bypassing Cloudflare API rate limits entirely.How it works
wrangler.tomlin a temp directory to isolate the worker from the project's configunstable_startWorkerwith only the R2 cache bindingworker.fetchdirectlyKey design decisions
Config isolation via empty
wrangler.toml—unstable_startWorkerusesfindUpSyncfrompath.dirname(entrypoint)to discover config. Without isolation, it picks up the project's fullwrangler.tomland binds ALL project resources (DOs, KV, AI, etc.) to the worker, causing hangs and errors. Passingconfig: emptyConfigPathshort-circuits the search.worker.fetchinstead ofglobal.fetch(worker.url)— Using the worker's fetch method directly avoids an unnecessary HTTP hop through localhost. TheDispatchFetchtype (miniflare) is compatible at runtime, requiring only a type cast:worker.fetch as unknown as typeof globalThis.fetch.Retry logic with exponential backoff — R2 writes can fail transiently (especially remote). The worker returns 207 with error details on partial failure, and the caller retries only the failed entries.
Single code path — always uses
unstable_startWorkerfor both local and remote targets. Theremoteflag on the R2 binding controls which bucket is used.Why
unstable_startWorkerover Miniflare directly?We evaluated using
new Miniflare()directly, which would avoid the config discovery issue entirely (fully programmatic, nofindUpSync). However:unstable_startWorkerwithremote: truehandles all auth/proxy plumbing to write to actual Cloudflare R2 buckets automatically. With raw Miniflare, you'd need to manually construct aRemoteProxyConnectionStringand handle the auth flow.target: "local"), Miniflare would actually be cleaner — no config discovery, no temp files, and direct access viamf.getR2Bucket().wrangler.tomlworkaround is a one-liner fix for a well-understood problem, making it an acceptable trade-off vs reimplementing wrangler's remote proxy logic.Files changed
populate-cache.tswrangler r2 bulk putwithunstable_startWorkerapproach, config isolation, retry logicr2-cache-populate-handler.tspopulate-cache.spec.tsworker.fetchpatternTest plan
NEXT_INC_CACHE_R2_BUCKETbinding (no leaked project bindings)Related