Skip to content

fix: Use worker binding for R2 cache uploads to avoid API rate limits#1099

Open
isaacrowntree wants to merge 3 commits intoopennextjs:mainfrom
isaacrowntree:fix/r2-cache-upload-via-binding
Open

fix: Use worker binding for R2 cache uploads to avoid API rate limits#1099
isaacrowntree wants to merge 3 commits intoopennextjs:mainfrom
isaacrowntree:fix/r2-cache-upload-via-binding

Conversation

@isaacrowntree
Copy link

@isaacrowntree isaacrowntree commented Jan 28, 2026

Summary

Fixes the R2 cache upload failure for large Next.js applications with thousands of prerendered pages. The issue is caused by the Cloudflare API rate limit of 1,200 requests per 5 minutes when using wrangler's r2 bulk put command.

This approach has been successfully running in production on campermate.com since the initial fix, populating 15+ cache entries per deploy with zero failures.

Approach

Uses unstable_startWorker (wrangler) to spin up an isolated local worker with a remote R2 binding, bypassing Cloudflare API rate limits entirely.

How it works

  1. Create an empty wrangler.toml in a temp directory to isolate the worker from the project's config
  2. Start a local worker via unstable_startWorker with only the R2 cache binding
  3. Send batched cache entries to the worker via worker.fetch directly
  4. Retry on failures — handles both partial failures (207) and network errors with exponential backoff
  5. Clean up — dispose worker and remove temp directory

Key design decisions

  • Config isolation via empty wrangler.tomlunstable_startWorker uses findUpSync from path.dirname(entrypoint) to discover config. Without isolation, it picks up the project's full wrangler.toml and binds ALL project resources (DOs, KV, AI, etc.) to the worker, causing hangs and errors. Passing config: emptyConfigPath short-circuits the search.

  • worker.fetch instead of global.fetch(worker.url) — Using the worker's fetch method directly avoids an unnecessary HTTP hop through localhost. The DispatchFetch type (miniflare) is compatible at runtime, requiring only a type cast: worker.fetch as unknown as typeof globalThis.fetch.

  • Retry logic with exponential backoff — R2 writes can fail transiently (especially remote). The worker returns 207 with error details on partial failure, and the caller retries only the failed entries.

  • Single code path — always uses unstable_startWorker for both local and remote targets. The remote flag on the R2 binding controls which bucket is used.

Why unstable_startWorker over Miniflare directly?

We evaluated using new Miniflare() directly, which would avoid the config discovery issue entirely (fully programmatic, no findUpSync). However:

  • Remote R2 is the dealbreakerunstable_startWorker with remote: true handles all auth/proxy plumbing to write to actual Cloudflare R2 buckets automatically. With raw Miniflare, you'd need to manually construct a RemoteProxyConnectionString and handle the auth flow.
  • For local-only R2 (e.g. target: "local"), Miniflare would actually be cleaner — no config discovery, no temp files, and direct access via mf.getR2Bucket().
  • The empty wrangler.toml workaround is a one-liner fix for a well-understood problem, making it an acceptable trade-off vs reimplementing wrangler's remote proxy logic.

Files changed

File Change
populate-cache.ts Replace wrangler r2 bulk put with unstable_startWorker approach, config isolation, retry logic
r2-cache-populate-handler.ts Standalone worker for R2 writes (batch POST endpoint, concurrent writes)
populate-cache.spec.ts Updated tests for worker.fetch pattern

Test plan

  • TypeScript compiles without errors
  • All tests pass
  • Production validated — deployed to campermate.com, 15 cache entries populated in ~9 seconds with isolated worker bindings
  • Worker shows only NEXT_INC_CACHE_R2_BUCKET binding (no leaked project bindings)
  • Retry logic handles 207 partial failures and network errors

Related

@changeset-bot
Copy link

changeset-bot bot commented Jan 28, 2026

🦋 Changeset detected

Latest commit: 37e5981

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@opennextjs/cloudflare Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vicb
Copy link
Contributor

vicb commented Jan 28, 2026

Thanks for the PR @isaacrowntree
I can't promise I'll get to it today but by tomorrow EOD for sure

@vicb
Copy link
Contributor

vicb commented Jan 29, 2026

Sorry @isaacrowntree
I wasted my day today... it happens...
I'll try to review tomorrow

Copy link
Contributor

@vicb vicb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something simpler in mind.

  1. Derive a wrangler config from the project's wrangler config. What needs to be propagated is the R2 cache binding. It should have remote set to true if we want to populate the remote R2 (false otherwise).
  2. Start a local worker using this config with a POSt endpoint
  3. Send rquests to this post endpoint.

Disclaimer: quick review, not much time tonight but I hope it makes at least some sense.

@isaacrowntree
Copy link
Author

Hey @vicb, thanks for the review! I've reworked the approach based on your feedback.

What changed:

Replaced the deploy→populate→redeploy cycle with your suggested approach: a local wrangler dev worker with a remote R2 binding. The flow is now:

  1. Derive a temp wrangler config from the project's config (just the R2 binding with remote: true)
  2. Start wrangler dev locally with a standalone worker that has a POST endpoint
  3. Send batched cache entries to the local worker, which writes to R2 via the binding
  4. Stop the worker, clean up, then deploy normally

What was removed:

  • --cacheMethod flag and dual code paths — single path for all cache sizes
  • Token auth system — local worker doesn't need it
  • The esbuild compilation step — wrangler handles TS natively
  • Changes to deploy.ts, worker.ts, cloudflare-context.ts, build.ts, run-wrangler.ts — all reverted to main
  • compile-cache-populate-handler.ts — deleted

Net result is ~470 fewer lines. The remote flag on the R2 binding controls local vs remote target, so the same code path handles both populateCache local and populateCache remote.

All 206 tests pass. Let me know if this is closer to what you had in mind.

const tempWranglerConfig = {
name: "open-next-cache-populate",
main: handlerPath,
compatibility_date: "2024-12-01",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compatibility_date: "2024-12-01",
compatibility_date: "2026-01-01",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — updated to "2026-01-01". Also removed the temp config file entirely by switching to unstable_startWorker with programmatic bindings (see comment on #305).

binding: R2_CACHE_BINDING_NAME,
bucket_name: binding.bucket_name,
...(binding.jurisdiction && { jurisdiction: binding.jurisdiction }),
...(useRemote && { remote: true }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
...(useRemote && { remote: true }),
remote: useRemote,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — simplified to remote: useRemote. Now passed directly in the unstable_startWorker bindings config.

Comment on lines +324 to +327
env: {
...process.env,
CLOUDFLARE_LOAD_DEV_VARS_FROM_DOT_ENV: "false",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it for this simple worker

Suggested change
env: {
...process.env,
CLOUDFLARE_LOAD_DEV_VARS_FROM_DOT_ENV: "false",
},

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — the entire spawn approach is gone now that we use unstable_startWorker, so this env var is no longer relevant.

*
* @returns The local URL and a function to stop the worker.
*/
function startWranglerDev(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the unstable_startWorker?

This should make the code quite simpler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — replaced the manual spawn + stdout parsing with unstable_startWorker. This also eliminated the temp wrangler config file since bindings are now passed programmatically. The code is significantly simpler:

const worker = await unstable_startWorker({
  name: "open-next-cache-populate",
  entrypoint: handlerPath,
  compatibilityDate: "2026-01-01",
  bindings: {
    [R2_CACHE_BINDING_NAME]: {
      type: "r2_bucket",
      bucket_name: binding.bucket_name,
      remote: useRemote,
    },
  },
  dev: { server: { port: 0 }, inspector: false, watch: false, liveReload: false },
});

Uses worker.ready, worker.url, and worker.dispose() for lifecycle management.

value: fs.readFileSync(fullPath, "utf8"),
}));

const result = await sendBatchWithRetry(workerUrl, entries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the server is local, there is probably no need to batch (which might consumes a lot of memory if you have a lot of assets).

Sending a few assets at a time should be good enough and less memory intensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — reduced the default batch size from 100 to 5. Since the server is local, the overhead per request is negligible and this keeps memory usage low even for projects with thousands of prerendered pages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templates stores part of the worker that will be deployed as the app.

What about creating a workers folder at the same level and add r2 to the file name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved to packages/cloudflare/src/cli/workers/r2-cache-populate-handler.ts. Created a new workers/ directory at the same level as templates/, with "r2" in the filename.

}

export default {
fetch: (request: Request, env: CachePopulateEnv) => handleCachePopulate(request, env),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the fetch handler should check the method (POSTà, the pathname and if ok call popuateCache.

I think it would be easier to test populateCache that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — the fetch handler now checks method and pathname, then delegates to populateCache:

export default {
  async fetch(request: Request, env: CachePopulateEnv): Promise<Response> {
    const url = new URL(request.url);
    if (request.method === "POST" && url.pathname === "/populate") {
      return populateCache(request, env);
    }
    return new Response("Not found", { status: 404 });
  },
};

populateCache is exported separately and can be tested directly without going through routing. Added routing tests as well.

Copy link
Contributor

@vicb vicb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this PR taking a great shape!

Thanks for your work on this

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 4, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@opennextjs/cloudflare@1099

commit: 07c3ce9

Uses a local wrangler dev worker with a remote R2 binding to populate
the R2 incremental cache, bypassing Cloudflare API rate limits that
cause failures with large numbers of prerendered pages.

Flow:
1. Derive temp config with only the R2 binding (remote: true)
2. Start wrangler dev locally via unstable_startWorker
3. Send batched cache entries to the local worker
4. Worker writes directly to R2 via binding (no API limits)
5. Stop worker, then proceed with normal deploy

Features:
- Exponential backoff retry (3 attempts)
- Partial failure handling (207 Multi-Status)
- Progress bar with tqdm
- Jurisdiction support

Squashed from 5 commits for clean rebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@isaacrowntree isaacrowntree force-pushed the fix/r2-cache-upload-via-binding branch from 0f3be28 to 9bd33d4 Compare March 11, 2026 01:14
isaacrowntree and others added 2 commits March 11, 2026 12:28
Change cwd to a temp directory before calling unstable_startWorker to
prevent it from reading the project's wrangler.jsonc and merging all
bindings (DOs, KV, services, AI) into the cache populate worker.

This fixes hangs in CI when the project has Durable Objects or service
bindings that the cache populate worker doesn't export.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes for R2 cache population:

1. Change cwd to a temp directory before calling unstable_startWorker
   to prevent it from reading the project's wrangler.jsonc and merging
   all bindings (DOs, KV, services, AI) into the cache populate worker.
   This fixes hangs in CI when the project has Durable Objects or
   service bindings that the cache populate worker doesn't export.

2. Retry individual failed entries from 207 (partial failure) responses
   with exponential backoff, rather than only retrying on full HTTP
   errors. Transient R2 503 errors no longer cause permanent data loss.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@isaacrowntree
Copy link
Author

Hey @vicb — saw your follow-up in #1121, great to see this being explored further!

We've been running this version in production on campermate.com and wanted to share some findings from getting it working end-to-end:

Config isolation is critical

The biggest issue we hit was unstable_startWorker picking up the project's full wrangler.toml via findUpSync from path.dirname(entrypoint). This meant the worker got ALL project bindings (Durable Objects, KV, AI, etc.), causing it to hang indefinitely during cache population.

Fix: Pass config: emptyConfigPath pointing to an empty wrangler.toml in a temp directory. This short-circuits the config file discovery so the worker only gets the explicitly-passed R2 binding.

worker.fetch > global.fetch(worker.url)

Using worker.fetch directly instead of fetching via the URL avoids an unnecessary localhost HTTP hop. The DispatchFetch type from miniflare is compatible at runtime — just needs a type cast.

Retry logic for transient R2 failures

Remote R2 writes can fail transiently. The worker returns 207 with error details on partial failure, and the caller retries only the failed entries with exponential backoff.

Why not Miniflare directly?

We also evaluated new Miniflare() directly, which would sidestep the config discovery issue entirely. The blocker is remote R2 — unstable_startWorker with remote: true handles all the auth/proxy plumbing automatically. For a local-only target, Miniflare would be cleaner (fully programmatic, no temp files, direct mf.getR2Bucket() access).

Happy to coordinate if you'd like to incorporate any of these fixes into #1121!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] R2 cache upload fails with 500 Internal Server Error during deploy (2400 prerendered pages)

2 participants