Improve backup/restore transfer speeds with parallelization by scuffi · Pull Request #666 · cloudflare/sandbox-sdk

scuffi · 2026-05-01T14:14:27Z

Summary

This PR improves backup and restore performance for larger archives by:

Switching the default squashfs compression to lz4 with configurable compression threads.
Adding multipart R2 uploads for large backup archives.
Adding parallel ranged downloads during restore to reduce archive download time.
Introducing backup options for compression, compressThreads, and multipart.

Details

Large backups now upload through R2 multipart upload when enabled, while smaller backups continue using the existing single presigned PUT path. Restores now download larger archives in parallel ranges before mounting the squashfs archive in the container.

The new options allow callers to tune backup behavior:

compression: choose lz4, zstd, or gzip
compressThreads: configure mksquashfs processor count
multipart: disable multipart upload when needed

changeset-bot · 2026-05-01T14:14:36Z

🦋 Changeset detected

Latest commit: cf36149

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@cloudflare/sandbox	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-05-01T14:23:06Z

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@666

commit: cf36149

github-actions · 2026-05-01T14:23:07Z

🐳 Docker Images Published

Variant	Image
Default	`cloudflare/sandbox:0.0.0-pr-666-cf36149`
Python	`cloudflare/sandbox:0.0.0-pr-666-cf36149-python`
OpenCode	`cloudflare/sandbox:0.0.0-pr-666-cf36149-opencode`
Musl	`cloudflare/sandbox:0.0.0-pr-666-cf36149-musl`
Desktop	`cloudflare/sandbox:0.0.0-pr-666-cf36149-desktop`

Usage:

FROM cloudflare/sandbox:0.0.0-pr-666-cf36149

Version: 0.0.0-pr-666-cf36149

📦 Standalone Binary

For arbitrary Dockerfiles:

COPY --from=cloudflare/sandbox:0.0.0-pr-666-cf36149 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]

Download via GitHub CLI:

gh run download 25219761951 -n sandbox-binary

Extract from Docker:

docker run --rm cloudflare/sandbox:0.0.0-pr-666-cf36149 cat /container-server/sandbox > sandbox && chmod +x sandbox

whoiskatrin · 2026-05-01T14:42:52Z

/bonk review this PR

ask-bonk

PR Review: Improve backup/restore transfer speeds with parallelization

Good feature — switching from s3fs lazy mounts to direct downloads with parallel range fetches, and adding multipart upload for large archives, should meaningfully improve throughput. The architecture is sound: the DO orchestrates presigned URLs and S3 multipart lifecycle while the container does the actual I/O via shell pipelines.

I have several findings ranging from a likely bug to minor nits.

Issues

1. doCreateBackupLocal does not pass compression or compressThreads to createArchive (bug)

doCreateBackup correctly destructures and forwards the new options, but doCreateBackupLocal only destructures the old fields and calls createArchive without compression or compressThreads:

// sandbox.ts:4949-4954
const {
  dir, name, ttl = BACKUP_DEFAULT_TTL_SECONDS,
  gitignore = false, excludes = []
} = options;

// sandbox.ts:5042-5046
const createResult = await this.client.backup.createArchive(
  dir, archivePath, backupSession,
  { gitignore, excludes: normalizedExcludes }  // missing compression, compressThreads
);

This means local-dev backups always use server defaults (lz4/8 threads), ignoring user-provided compression and compressThreads. Should be an easy fix to destructure and forward them.

2. No upload verification after multipart upload (correctness gap)

uploadBackupPresigned does a bucket.head(r2Key) after upload to verify the object landed correctly (and even has a helpful local-dev mismatch hint). uploadBackupMultipart relies solely on the CompleteMultipartUpload response being 2xx. Consider adding the same bucket.head() size verification for consistency — it's a cheap call and catches the same class of local-dev/remote-R2 mismatch issues.

3. compression parameter is interpolated directly into the shell command without validation (security concern)

In backup-service.ts:237:

`-comp ${compression}`,

The compression parameter is typed as 'gzip' | 'lz4' | 'zstd' in TypeScript, but at the HTTP handler boundary it arrives as an untrusted JSON field. The handler passes it through with only a ?? default:

body.compression ?? 'lz4'

There is no runtime validation that the value is actually one of the three allowed strings. If a caller sends compression: "lz4; rm -rf /", it would be interpolated into the shell command. While the handler is only reachable from the DO (not directly from user code), defense-in-depth suggests validating the value against the allowed set in either the handler or the service, similar to how archivePath is validated.

4. handleUploadParts does not validate individual part fields (defense-in-depth)

The handler validates that parts is a non-empty array but doesn't check that each part has valid partNumber, url, offset, and size fields. The url field in particular is passed directly to curl in a shell command. While shellEscape is used, validating that url starts with https:// and that numeric fields are non-negative integers would be a stronger defensive boundary.

5. Stale JSDoc on downloadBackupParallel (nit)

The JSDoc says "Parts are assembled with cat then atomically moved to the final path" but the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file, then mv. The test even explicitly asserts expect(downloadCommand).not.toContain('cat '). The comment should be updated to reflect the actual implementation.

6. wait $J{i} captures exit code of the last pipeline component only (subtle correctness)

In the upload script (backup-service.ts:457):

wait $J0; E0=$?

When backgrounding dd ... | curl ... &, $! captures the PID of the last process in the pipeline (curl). wait $Jn returns curl's exit code. If dd fails (e.g. the archive file was deleted or offset is wrong), dd exits non-zero but curl may still exit 0 (it just uploads truncated data). The upload would appear to succeed but the part would be incomplete.

The download path has the same pattern but is lower risk since curl -f would fail if the server returned a partial response. For upload, consider using set -o pipefail in the script prefix, or at minimum documenting this limitation.

7. BACKUP_DOWNLOAD_MAX_PARTS is declared but never used (dead code)

const BACKUP_DOWNLOAD_MAX_PARTS = 64;

calculatePartCount uses BACKUP_MULTIPART_MAX_PARTS for the cap. If this constant is intended for future use, it should be removed until needed.

Minor observations

The sizeCheck.stdout in the restore path (sandbox.ts:5309) uses .catch(() => ({ stdout: '0' })) which swallows errors cleanly, but the returned object doesn't have exitCode or stderr properties that execWithSession normally returns. This works because only .stdout is read, but it's a bit fragile.
The changeset looks good — single @cloudflare/sandbox patch entry with a user-facing description.
Test coverage is reasonable: the new parallel download test verifies the shell script structure (truncate, dd seek, no cat). Consider adding a test for the multipart upload path (at least verifying that uploadBackupMultipart falls back to presigned when numParts <= 1).

ask-bonk

Inline comments on key findings.

ask-bonk · 2026-05-01T14:47:15Z

        shellEscape(dir),
        shellEscape(archivePath),
-        '-comp zstd',
+        `-comp ${compression}`,


The compression parameter is interpolated directly into the shell command without runtime validation. While the TypeScript type constrains it to 'gzip' | 'lz4' | 'zstd', the HTTP handler accepts it from untrusted JSON with only a ?? default — no allowlist check.

Consider validating at the top of createArchive:

const ALLOWED_COMPRESSION = new Set(['gzip', 'lz4', 'zstd']); if (!ALLOWED_COMPRESSION.has(compression)) { return serviceError({ message: `Invalid compression algorithm: ${compression}`, code: ErrorCode.INVALID_BACKUP_CONFIG, details: { compression } }); }

ask-bonk · 2026-05-01T14:47:15Z

+   * For archives >= BACKUP_DOWNLOAD_PARALLEL_MIN_SIZE, uses BACKUP_DOWNLOAD_PARALLEL_PARTS
+   * concurrent curl processes (each downloading a byte-range) to maximise both
+   * network and disk-write throughput. Parts are assembled with cat then
+   * atomically moved to the final path.


Nit: Stale JSDoc — the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file (via truncate -s), not cat. The test at line 2065 even explicitly asserts expect(downloadCommand).not.toContain('cat ').

Suggested update:

* ...concurrent curl processes (each downloading a byte-range) that write * directly into a pre-allocated temp file via dd seek, then mv to the final path.

ask-bonk · 2026-05-01T14:47:30Z

The review has been posted. Here's a summary of what I found:

Review Summary for PR #666

Posted a detailed review comment and 2 inline comments on the PR. Key findings:

Bug:

doCreateBackupLocal does not forward compression or compressThreads to createArchive, so local-dev backups silently ignore these user-provided options.

Security/correctness:

The compression parameter is interpolated directly into a shell command without runtime validation against the allowed set (gzip, lz4, zstd). TypeScript types don't protect at the HTTP boundary.
No upload verification (bucket.head()) after multipart upload, unlike the presigned upload path which has it.
handleUploadParts doesn't validate individual part fields (especially url which goes into shell commands).
wait $J{i} only captures the exit code of the last pipeline component (curl), not dd. A failing dd could result in truncated part uploads that appear successful.

Minor:

Stale JSDoc referencing cat assembly when the implementation uses dd with oflag=seek_bytes.
BACKUP_DOWNLOAD_MAX_PARTS constant is declared but never used.

github run

Improve backup/restore transfer speeds with parallelization

f556633

ask-bonk Bot reviewed May 1, 2026

View reviewed changes

fix backup issues

cf36149

scuffi marked this pull request as ready for review May 1, 2026 15:23

scuffi requested review from aron-cf, ghostwriternr and whoiskatrin as code owners May 1, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve backup/restore transfer speeds with parallelization#666

Improve backup/restore transfer speeds with parallelization#666
scuffi wants to merge 2 commits intomainfrom
patch/improve-backup-restore-speeds

scuffi commented May 1, 2026

Uh oh!

changeset-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented May 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

whoiskatrin commented May 1, 2026

Uh oh!

ask-bonk Bot left a comment

Uh oh!

ask-bonk Bot left a comment

Uh oh!

ask-bonk Bot May 1, 2026

Uh oh!

ask-bonk Bot May 1, 2026

Uh oh!

ask-bonk Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scuffi commented May 1, 2026

Summary

Details

Uh oh!

changeset-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐳 Docker Images Published

📦 Standalone Binary

Uh oh!

whoiskatrin commented May 1, 2026

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

PR Review: Improve backup/restore transfer speeds with parallelization

Issues

Minor observations

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented May 1, 2026

Review Summary for PR #666

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented May 1, 2026 •

edited

Loading

pkg-pr-new Bot commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading