Skip to content

Improve backup/restore transfer speeds with parallelization#666

Open
scuffi wants to merge 2 commits intomainfrom
patch/improve-backup-restore-speeds
Open

Improve backup/restore transfer speeds with parallelization#666
scuffi wants to merge 2 commits intomainfrom
patch/improve-backup-restore-speeds

Conversation

@scuffi
Copy link
Copy Markdown
Contributor

@scuffi scuffi commented May 1, 2026

Summary

This PR improves backup and restore performance for larger archives by:

  • Switching the default squashfs compression to lz4 with configurable compression threads.
  • Adding multipart R2 uploads for large backup archives.
  • Adding parallel ranged downloads during restore to reduce archive download time.
  • Introducing backup options for compression, compressThreads, and multipart.

Details

Large backups now upload through R2 multipart upload when enabled, while smaller backups continue using the existing single presigned PUT path. Restores now download larger archives in parallel ranges before mounting the squashfs archive in the container.

The new options allow callers to tune backup behavior:

  • compression: choose lz4, zstd, or gzip
  • compressThreads: configure mksquashfs processor count
  • multipart: disable multipart upload when needed

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: cf36149

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 1, 2026

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@666

commit: cf36149

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🐳 Docker Images Published

Variant Image
Default cloudflare/sandbox:0.0.0-pr-666-cf36149
Python cloudflare/sandbox:0.0.0-pr-666-cf36149-python
OpenCode cloudflare/sandbox:0.0.0-pr-666-cf36149-opencode
Musl cloudflare/sandbox:0.0.0-pr-666-cf36149-musl
Desktop cloudflare/sandbox:0.0.0-pr-666-cf36149-desktop

Usage:

FROM cloudflare/sandbox:0.0.0-pr-666-cf36149

Version: 0.0.0-pr-666-cf36149


📦 Standalone Binary

For arbitrary Dockerfiles:

COPY --from=cloudflare/sandbox:0.0.0-pr-666-cf36149 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]

Download via GitHub CLI:

gh run download 25219761951 -n sandbox-binary

Extract from Docker:

docker run --rm cloudflare/sandbox:0.0.0-pr-666-cf36149 cat /container-server/sandbox > sandbox && chmod +x sandbox

@whoiskatrin
Copy link
Copy Markdown
Collaborator

/bonk review this PR

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Improve backup/restore transfer speeds with parallelization

Good feature — switching from s3fs lazy mounts to direct downloads with parallel range fetches, and adding multipart upload for large archives, should meaningfully improve throughput. The architecture is sound: the DO orchestrates presigned URLs and S3 multipart lifecycle while the container does the actual I/O via shell pipelines.

I have several findings ranging from a likely bug to minor nits.


Issues

1. doCreateBackupLocal does not pass compression or compressThreads to createArchive (bug)

doCreateBackup correctly destructures and forwards the new options, but doCreateBackupLocal only destructures the old fields and calls createArchive without compression or compressThreads:

// sandbox.ts:4949-4954
const {
  dir, name, ttl = BACKUP_DEFAULT_TTL_SECONDS,
  gitignore = false, excludes = []
} = options;
// sandbox.ts:5042-5046
const createResult = await this.client.backup.createArchive(
  dir, archivePath, backupSession,
  { gitignore, excludes: normalizedExcludes }  // missing compression, compressThreads
);

This means local-dev backups always use server defaults (lz4/8 threads), ignoring user-provided compression and compressThreads. Should be an easy fix to destructure and forward them.

2. No upload verification after multipart upload (correctness gap)

uploadBackupPresigned does a bucket.head(r2Key) after upload to verify the object landed correctly (and even has a helpful local-dev mismatch hint). uploadBackupMultipart relies solely on the CompleteMultipartUpload response being 2xx. Consider adding the same bucket.head() size verification for consistency — it's a cheap call and catches the same class of local-dev/remote-R2 mismatch issues.

3. compression parameter is interpolated directly into the shell command without validation (security concern)

In backup-service.ts:237:

`-comp ${compression}`,

The compression parameter is typed as 'gzip' | 'lz4' | 'zstd' in TypeScript, but at the HTTP handler boundary it arrives as an untrusted JSON field. The handler passes it through with only a ?? default:

body.compression ?? 'lz4'

There is no runtime validation that the value is actually one of the three allowed strings. If a caller sends compression: "lz4; rm -rf /", it would be interpolated into the shell command. While the handler is only reachable from the DO (not directly from user code), defense-in-depth suggests validating the value against the allowed set in either the handler or the service, similar to how archivePath is validated.

4. handleUploadParts does not validate individual part fields (defense-in-depth)

The handler validates that parts is a non-empty array but doesn't check that each part has valid partNumber, url, offset, and size fields. The url field in particular is passed directly to curl in a shell command. While shellEscape is used, validating that url starts with https:// and that numeric fields are non-negative integers would be a stronger defensive boundary.

5. Stale JSDoc on downloadBackupParallel (nit)

The JSDoc says "Parts are assembled with cat then atomically moved to the final path" but the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file, then mv. The test even explicitly asserts expect(downloadCommand).not.toContain('cat '). The comment should be updated to reflect the actual implementation.

6. wait $J{i} captures exit code of the last pipeline component only (subtle correctness)

In the upload script (backup-service.ts:457):

wait $J0; E0=$?

When backgrounding dd ... | curl ... &, $! captures the PID of the last process in the pipeline (curl). wait $Jn returns curl's exit code. If dd fails (e.g. the archive file was deleted or offset is wrong), dd exits non-zero but curl may still exit 0 (it just uploads truncated data). The upload would appear to succeed but the part would be incomplete.

The download path has the same pattern but is lower risk since curl -f would fail if the server returned a partial response. For upload, consider using set -o pipefail in the script prefix, or at minimum documenting this limitation.

7. BACKUP_DOWNLOAD_MAX_PARTS is declared but never used (dead code)

const BACKUP_DOWNLOAD_MAX_PARTS = 64;

calculatePartCount uses BACKUP_MULTIPART_MAX_PARTS for the cap. If this constant is intended for future use, it should be removed until needed.

Minor observations

  • The sizeCheck.stdout in the restore path (sandbox.ts:5309) uses .catch(() => ({ stdout: '0' })) which swallows errors cleanly, but the returned object doesn't have exitCode or stderr properties that execWithSession normally returns. This works because only .stdout is read, but it's a bit fragile.

  • The changeset looks good — single @cloudflare/sandbox patch entry with a user-facing description.

  • Test coverage is reasonable: the new parallel download test verifies the shell script structure (truncate, dd seek, no cat). Consider adding a test for the multipart upload path (at least verifying that uploadBackupMultipart falls back to presigned when numParts <= 1).

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments on key findings.

shellEscape(dir),
shellEscape(archivePath),
'-comp zstd',
`-comp ${compression}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compression parameter is interpolated directly into the shell command without runtime validation. While the TypeScript type constrains it to 'gzip' | 'lz4' | 'zstd', the HTTP handler accepts it from untrusted JSON with only a ?? default — no allowlist check.

Consider validating at the top of createArchive:

const ALLOWED_COMPRESSION = new Set(['gzip', 'lz4', 'zstd']);
if (!ALLOWED_COMPRESSION.has(compression)) {
  return serviceError({ message: `Invalid compression algorithm: ${compression}`, code: ErrorCode.INVALID_BACKUP_CONFIG, details: { compression } });
}

* For archives >= BACKUP_DOWNLOAD_PARALLEL_MIN_SIZE, uses BACKUP_DOWNLOAD_PARALLEL_PARTS
* concurrent curl processes (each downloading a byte-range) to maximise both
* network and disk-write throughput. Parts are assembled with cat then
* atomically moved to the final path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Stale JSDoc — the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file (via truncate -s), not cat. The test at line 2065 even explicitly asserts expect(downloadCommand).not.toContain('cat ').

Suggested update:

 * ...concurrent curl processes (each downloading a byte-range) that write
 * directly into a pre-allocated temp file via dd seek, then mv to the final path.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 1, 2026

The review has been posted. Here's a summary of what I found:

Review Summary for PR #666

Posted a detailed review comment and 2 inline comments on the PR. Key findings:

Bug:

  • doCreateBackupLocal does not forward compression or compressThreads to createArchive, so local-dev backups silently ignore these user-provided options.

Security/correctness:

  • The compression parameter is interpolated directly into a shell command without runtime validation against the allowed set (gzip, lz4, zstd). TypeScript types don't protect at the HTTP boundary.
  • No upload verification (bucket.head()) after multipart upload, unlike the presigned upload path which has it.
  • handleUploadParts doesn't validate individual part fields (especially url which goes into shell commands).
  • wait $J{i} only captures the exit code of the last pipeline component (curl), not dd. A failing dd could result in truncated part uploads that appear successful.

Minor:

  • Stale JSDoc referencing cat assembly when the implementation uses dd with oflag=seek_bytes.
  • BACKUP_DOWNLOAD_MAX_PARTS constant is declared but never used.

github run

@scuffi scuffi marked this pull request as ready for review May 1, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants