Skip to content

[WIP] Use Workers binding for R2 cache population#1121

Open
vicb wants to merge 10 commits intomainfrom
vicb/r2-remote-binding
Open

[WIP] Use Workers binding for R2 cache population#1121
vicb wants to merge 10 commits intomainfrom
vicb/r2-remote-binding

Conversation

@vicb
Copy link
Contributor

@vicb vicb commented Feb 9, 2026

Based on #1099

Fixes #1110, #1088

Key differences:

  • The Worker is simpler and only upload 1 entry
  • The retry logic is at the calling site (in the populateCache command)
  • Better error handling

TODO

  • Better handling of the case where the remote R2 does not exist
  • Handle jurisdiction, ...
  • Cleanup

OpenClaude/Opus helped me write that code

isaacrowntree and others added 5 commits February 9, 2026 18:01
For large deployments with 500+ prerendered pages, the R2 cache upload
now uses the worker's R2 binding directly instead of wrangler's r2 bulk
put command. This bypasses the Cloudflare API rate limit.

New deploy flow for large R2 caches:
1. Deploy worker with a temporary cache populate token
2. Send cache entries directly to /_open-next/cache/populate endpoint
3. Worker writes to R2 using its binding (no API rate limits)
4. Redeploy without the token to secure the endpoint

Features:
- Automatic threshold: binding approach for caches with 500+ entries
- Batched uploads with configurable batch size (default: 100)
- Retry logic with exponential backoff
- Secure: Temporary token removed after cache population

Closes #1088

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add tests for generateCachePopulateToken
- Add comprehensive tests for handleCachePopulate handler:
  - Request method validation
  - Token authentication
  - Invalid JSON handling
  - Successful writes
  - Partial failures
  - Empty entries

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Users can now explicitly choose the cache population method:
- auto: Use binding for 500+ entries, wrangler CLI otherwise (default)
- wrangler: Always use wrangler CLI (original behavior)
- binding: Always use worker binding (bypasses API rate limits)

This provides full backward compatibility while giving users control
over the cache population strategy.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… remote binding

Reworks the R2 cache upload approach per reviewer feedback. Instead of
deploying the worker with a token, populating, then redeploying, we now
use a local wrangler dev worker with a remote R2 binding.

Key changes:
- Replace deploy/populate/redeploy cycle with local wrangler dev approach
- Remove --cacheMethod flag and dual code paths
- Remove token auth system (local worker needs no auth)
- Remove esbuild compilation step (wrangler accepts TS natively)
- Revert deploy.ts, worker.ts, cloudflare-context.ts, build.ts, run-wrangler.ts to main
- Delete compile-cache-populate-handler.ts

The new flow derives a temp wrangler config with the R2 binding set to
remote: true, starts wrangler dev locally, sends batched cache entries
to the worker, then stops it. Single code path for all cache sizes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ndler to workers/

- Replace manual spawn + stdout parsing with wrangler's unstable_startWorker API
- Pass R2 bindings programmatically, eliminating temp config files
- Move handler from templates/ to workers/r2-cache-populate-handler.ts
- Separate fetch routing from populateCache logic for testability
- Reduce default batch size from 100 to 5 (local server, lower memory)
- Update compatibility_date to 2026-01-01
- Simplify remote binding config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Feb 9, 2026

🦋 Changeset detected

Latest commit: f25e9c0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@opennextjs/cloudflare Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 9, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@opennextjs/cloudflare@1121

commit: f25e9c0

@grabmateusz
Copy link

@vicb FYI while consuming this PR on my project with same setup as the one on which rclone deployment succeeds, pipeline fails on:

Populating remote R2 incremental cache...
<--- Last few GCs --->
[657:0x44626000]     8669 ms: Scavenge (reduce) (interleaved) 4091.6 (4103.8) -> 4091.6 (4103.8) MB, pooled: 0 MB, 0.71 / 0.00 ms  (average mu = 0.887, current mu = 0.740) allocation failure; 
[657:0x44626000]     8711 ms: Mark-Compact (reduce) 4094.5 (4106.7) -> 4094.5 (4106.7) MB, pooled: 0 MB, 39.05 / 0.00 ms  (+ 2.5 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 64 ms) (average mu = 0.758, current
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0xe36068 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0x1202550 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0x1202827 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x1430105  [node]
 5: 0x1449999 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 6: 0x139b758 v8::internal::StackGuard::HandleInterrupts(v8::internal::StackGuard::InterruptLevel) [node]
 7: 0x1858a77 v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [node]
 8: 0x1db6476  [node]
Aborted (core dumped)

@vicb
Copy link
Contributor Author

vicb commented Feb 9, 2026

@vicb FYI while consuming this PR on my project with same setup as the one on which rclone deployment succeeds, pipeline fails on:

Populating remote R2 incremental cache...
<--- Last few GCs --->
[657:0x44626000]     8669 ms: Scavenge (reduce) (interleaved) 4091.6 (4103.8) -> 4091.6 (4103.8) MB, pooled: 0 MB, 0.71 / 0.00 ms  (average mu = 0.887, current mu = 0.740) allocation failure; 
[657:0x44626000]     8711 ms: Mark-Compact (reduce) 4094.5 (4106.7) -> 4094.5 (4106.7) MB, pooled: 0 MB, 39.05 / 0.00 ms  (+ 2.5 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 64 ms) (average mu = 0.758, current
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0xe36068 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0x1202550 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0x1202827 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x1430105  [node]
 5: 0x1449999 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 6: 0x139b758 v8::internal::StackGuard::HandleInterrupts(v8::internal::StackGuard::InterruptLevel) [node]
 7: 0x1858a77 v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [node]
 8: 0x1db6476  [node]
Aborted (core dumped)

Thanks for trying.

Could you please let me know how many files you have an the total size?

@vicb
Copy link
Contributor Author

vicb commented Feb 9, 2026

@vicb FYI while consuming this PR on my project with same setup as the one on which rclone deployment succeeds, pipeline fails on:

Oops it was missing my last change here.
Before that the code would load ALL the files into memory before starting, no wonder you got an OOM.
It should be better now.

@vicb vicb force-pushed the vicb/r2-remote-binding branch from ff208d4 to 9c69f84 Compare February 9, 2026 21:38
@grabmateusz
Copy link

Hi @vicb I gave a try to your newest branch and it fails as well. This time it generates regression on the fact that we use https://opennext.js.org/cloudflare/howtos/custom-worker.
Error is as follows:

Populating remote R2 incremental cache...
file:<project>/node_modules/@opennextjs/cloudflare/dist/cli/commands/populate-cache.js:165
            reject(new Error(message));
                   ^

Error: Your Worker depends on the following Durable Objects, which are not exported in your entrypoint file: DOQueueHandler, <OurCustomDurableObjects>.
You should export these objects from your entrypoint, node_modules/@opennextjs/cloudflare/dist/cli/workers/r2-cache.js.

@vicb
Copy link
Contributor Author

vicb commented Feb 10, 2026

Hi @vicb I gave a try to your newest branch and it fails as well. This time it generates regression on the fact that we use https://opennext.js.org/cloudflare/howtos/custom-worker. Error is as follows:

Populating remote R2 incremental cache...
file:<project>/node_modules/@opennextjs/cloudflare/dist/cli/commands/populate-cache.js:165
            reject(new Error(message));
                   ^

Error: Your Worker depends on the following Durable Objects, which are not exported in your entrypoint file: DOQueueHandler, <OurCustomDurableObjects>.
You should export these objects from your entrypoint, node_modules/@opennextjs/cloudflare/dist/cli/workers/r2-cache.js.

Any chance you can share you code privately to move forward?

@grabmateusz
Copy link

Hi @vicb,
You've written in the previous comments:

Could you please let me know how many files you have an the total size?

During build time for tenant that I use for tests, we build a cache with 13.3k elements of total size 2.7 GB (it is our biggest tenant, however it doesn't mean this is the biggest set of data on which code needs to work ;) ).

Any chance you can share you code privately to move forward?

The only proposal I have is I can work on the reproduction repository which will be my project agnostic and will let you ensure your code passes, however I will manage to do it by the end of the week.

@vicb
Copy link
Contributor Author

vicb commented Feb 10, 2026

The only proposal I have is I can work on the reproduction repository which will be my project agnostic and will let you ensure your code passes, however I will manage to do it by the end of the week.

That would be great and no rush.

I guess you can use the rclone pre-release for the time being?

It looks like we are close to landing this PR, if it turns out to be longer, we can revisit landing rclone. SGTY?

@grabmateusz
Copy link

Sounds good, I will update you once I will have reproduction repository.

@grabmateusz
Copy link

grabmateusz commented Feb 16, 2026

Hi @vicb,
I've taken my time to create reproduction repository
Most of the code (API / Next.js app) is written by Claude Sonnet, but it is just a mock "issue reproduction" repository.
Please get familiar with README.md where I've described crucial part.

In this repository you will be able to consume prepared by you version of opennextjs-cloudflare with your approach for R2 population, by default on main branch right now package from #1116 is consumed to make the deployment pass.

Have fun / ping me in case of problems / doubts.

P.S. I've left routes size on 1k, but in the end feel free to have it "challenging" and support bigger values (5-10 k) in opennextjs-cloudflare.
Example of the R2 upload for 10k routes on current main of the reproduction repository:
Screenshot 2026-02-16 at 11 09 11

@grabmateusz
Copy link

Hi @vicb ,
last week you wrote:

I guess you can use the rclone pre-release for the time being?

Unfortunately that's not an option, we need to have a stable release consumed on our project for internal reasons.

It looks like we are close to landing this PR, if it turns out to be longer, we can revisit landing rclone. SGTY?

Will you be able to look into this PR and to make it work on reproduction repository, or should we revisit landing rclone?

@vicb
Copy link
Contributor Author

vicb commented Feb 19, 2026

I'm out this week but I'll move this forward next week

@grabmateusz
Copy link

Great, thanks

@vicb
Copy link
Contributor Author

vicb commented Feb 25, 2026

Great, thanks

Looking today

@vicb
Copy link
Contributor Author

vicb commented Feb 26, 2026

Hi @vicb, I've taken my time to create reproduction repository Most of the code (API / Next.js app) is written by Claude Sonnet, but it is just a mock "issue reproduction" repository. Please get familiar with README.md where I've described crucial part.

In this repository you will be able to consume prepared by you version of opennextjs-cloudflare with your approach for R2 population, by default on main branch right now package from #1116 is consumed to make the deployment pass.

Have fun / ping me in case of problems / doubts.

P.S. I've left routes size on 1k, but in the end feel free to have it "challenging" and support bigger values (5-10 k) in opennextjs-cloudflare. Example of the R2 upload for 10k routes on current main of the reproduction repository: Screenshot 2026-02-16 at 11 09 11

Sorry, got diverted but looking at this right now.

Could you add instructions on how to reproduce, what is the expected behavior and observed behavior?

Do I need to deploy the API, can/should I run it locally?

What error do you see, ...

Happy to have a chat/VC if it helps

@mgallagher
Copy link

mgallagher commented Mar 2, 2026

We're having the same original issue as @grabmateusz as we're attempting to migrate our large site from Pages to Workers. The vast majority of the OpenNext builds fail on cache population - so I was hoping this branch could help. I won't be able to provide a reproduction repository, but please let me know if there are any details I can provide to help narrow down the issue. I would also be happy to get on a chat or a call any time. In the meantime, here is the error we're seeing and the configs we have:

Original error on @opennext/cloudflare v1.16.5
Populating local R2 incremental cache...
<monorepo-project>/node_modules/@opennextjs/cloudflare/dist/cli/commands/populate-cache.js:165
            reject(new Error(message));
                   ^

Error: Your Worker depends on the following Durable Objects, which are not exported in your entrypoint file: DOQueueHandler.
You should export these objects from your entrypoint, node_modules/@opennextjs/cloudflare/dist/cli/workers/r2-cache.js.
    at exports.unstable_DevEnv.<anonymous> (<monorepo-project>/chroma-cboe-com/node_modules/@opennextjs/cloudflare/dist/cli/commands/populate-cache.js:165:20)

After installing the build from the pkg-pr-new bot in this thread, I am seeing this:

09:36:18.547	Populating remote R2 incremental cache...
09:36:18.959	ERROR Failed to populate the remote R2 cache. Does the bucket "chroma-cboe-com-cache" exist?
09:36:19.007	  0%|
| 1/1652 [00:00:<00:00:, 0.00it/s]  0%|
| 2/1652 [00:00:<00:10:, 153.85it/s]  0%|
| 3/1652 [00:00:<00:09:, 166.67it/s]  0%|
| 4/1652 [00:00:<00:09:, 166.67it/s]  0%|
| 5/1652 [00:00:<00:08:, 200.00it/s]  0%|
| 6/1652 [00:00:<00:08:, 193.55it/s]  0%|
| 7/1652 [00:00:<00:07:, 218.75it/s]  0%|
| 8/1652 [00:00:<00:06:, 250.00it/s]  0%|
| 9/1652 [00:00:<00:06:, 236.84it/s]  0%|
| 10/1652 [00:00:<00:06:, 256.41it/s]  0%|
| 11/1652 [00:00:<00:06:, 250.00it/s]  0%|
| 12/1652 [00:00:<00:06:, 240.00it/s]  0%|
| 13/1652 [00:00:<00:06:, 236.36it/s]  0%|
| 14/1652 [00:00:<00:06:, 250.00it/s]  0%|
| 15/1652 [00:00:<00:06:, 267.86it/s]  0%|
| 16/1652 [00:00:<00:05:, 285.71it/s]  1%|
| 17/1652 [00:00:<00:06:, 269.84it/s]  1%|
| 18/1652 [00:00:<00:06:, 246.58it/s]  1%|
| 19/1652 [00:00:<00:06:, 237.50it/s]  1%|
| 20/1652 [00:00:<00:06:, 235.29it/s]  1%|
| 21/1652 [00:00:<00:07:, 230.77it/s]  1%|
| 22/1652 [00:00:<00:07:, 226.80it/s]  1%|
| 23/1652 [00:00:<00:07:, 225.49it/s]  1%|
| 24/1652 [00:00:<00:07:, 224.30it/s]  1%|
| 25/1652 [00:00:<00:07:, 223.21it/s]  1%|
| 26/1652 [00:00:<00:07:, 220.34it/s]
09:36:19.101	Failed: error occurred while running deploy command

The bucket it references definitely exists. We've had a few successful builds on v1.16.5, and the bucket has 15 GB of data.

Configs:

wrangler.jsonc
{
  "$schema": "../../node_modules/wrangler/config-schema.json",
  "main": ".open-next/worker.js",
  "name": "chroma-cboe-com",
  "compatibility_date": "2025-03-25",
  "compatibility_flags": [
    "nodejs_compat",
    "global_fetch_strictly_public"
  ],
  "build": {
    "watch_dir": "sites/chroma-cboe-com/*",
    "cwd": "sites/chroma-cboe-com/"
  },
  "assets": {
    "directory": ".open-next/assets",
    "binding": "ASSETS"
  },
  "vars": {
    "NODE_VERSION": "24.11.1"
  },
  "services": [
    {
      "binding": "WORKER_SELF_REFERENCE",
      "service": "chroma-cboe-com"
    }
  ],
  "r2_buckets": [
    {
      "binding": "NEXT_INC_CACHE_R2_BUCKET",
      "bucket_name": "chroma-cboe-com-cache"
    }
  ],
  "durable_objects": {
    "bindings": [
      {
        "name": "NEXT_CACHE_DO_QUEUE",
        "class_name": "DOQueueHandler"
      }
    ]
  },
  "migrations": [
    {
      "tag": "v1",
      "new_sqlite_classes": ["DOQueueHandler"]
    }
  ]
}
open-next.config.ts
import { defineCloudflareConfig } from '@opennextjs/cloudflare';
import r2IncrementalCache from '@opennextjs/cloudflare/overrides/incremental-cache/r2-incremental-cache';
import { withRegionalCache } from '@opennextjs/cloudflare/overrides/incremental-cache/regional-cache';
import doQueue from '@opennextjs/cloudflare/overrides/queue/do-queue';

// Note that this config mainly supports time-based revalidation for ISR and fetch
// We do not need On-Demand revalidation at this time
export default defineCloudflareConfig({
  incrementalCache: withRegionalCache(r2IncrementalCache, {
    mode: 'long-lived',
  }),
  queue: doQueue,
});

@grabmateusz
Copy link

@mgallagher you might want to check #1116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Can't upload R2 incremental cache - recurring 503 Service Unavailable

4 participants