Improve backup/restore transfer speeds with parallelization#666
Improve backup/restore transfer speeds with parallelization#666
Conversation
🦋 Changeset detectedLatest commit: cf36149 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
🐳 Docker Images Published
Usage: FROM cloudflare/sandbox:0.0.0-pr-666-cf36149Version: 📦 Standalone BinaryFor arbitrary Dockerfiles: COPY --from=cloudflare/sandbox:0.0.0-pr-666-cf36149 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]Download via GitHub CLI: gh run download 25219761951 -n sandbox-binaryExtract from Docker: docker run --rm cloudflare/sandbox:0.0.0-pr-666-cf36149 cat /container-server/sandbox > sandbox && chmod +x sandbox |
|
/bonk review this PR |
There was a problem hiding this comment.
PR Review: Improve backup/restore transfer speeds with parallelization
Good feature — switching from s3fs lazy mounts to direct downloads with parallel range fetches, and adding multipart upload for large archives, should meaningfully improve throughput. The architecture is sound: the DO orchestrates presigned URLs and S3 multipart lifecycle while the container does the actual I/O via shell pipelines.
I have several findings ranging from a likely bug to minor nits.
Issues
1. doCreateBackupLocal does not pass compression or compressThreads to createArchive (bug)
doCreateBackup correctly destructures and forwards the new options, but doCreateBackupLocal only destructures the old fields and calls createArchive without compression or compressThreads:
// sandbox.ts:4949-4954
const {
dir, name, ttl = BACKUP_DEFAULT_TTL_SECONDS,
gitignore = false, excludes = []
} = options;// sandbox.ts:5042-5046
const createResult = await this.client.backup.createArchive(
dir, archivePath, backupSession,
{ gitignore, excludes: normalizedExcludes } // missing compression, compressThreads
);This means local-dev backups always use server defaults (lz4/8 threads), ignoring user-provided compression and compressThreads. Should be an easy fix to destructure and forward them.
2. No upload verification after multipart upload (correctness gap)
uploadBackupPresigned does a bucket.head(r2Key) after upload to verify the object landed correctly (and even has a helpful local-dev mismatch hint). uploadBackupMultipart relies solely on the CompleteMultipartUpload response being 2xx. Consider adding the same bucket.head() size verification for consistency — it's a cheap call and catches the same class of local-dev/remote-R2 mismatch issues.
3. compression parameter is interpolated directly into the shell command without validation (security concern)
In backup-service.ts:237:
`-comp ${compression}`,The compression parameter is typed as 'gzip' | 'lz4' | 'zstd' in TypeScript, but at the HTTP handler boundary it arrives as an untrusted JSON field. The handler passes it through with only a ?? default:
body.compression ?? 'lz4'There is no runtime validation that the value is actually one of the three allowed strings. If a caller sends compression: "lz4; rm -rf /", it would be interpolated into the shell command. While the handler is only reachable from the DO (not directly from user code), defense-in-depth suggests validating the value against the allowed set in either the handler or the service, similar to how archivePath is validated.
4. handleUploadParts does not validate individual part fields (defense-in-depth)
The handler validates that parts is a non-empty array but doesn't check that each part has valid partNumber, url, offset, and size fields. The url field in particular is passed directly to curl in a shell command. While shellEscape is used, validating that url starts with https:// and that numeric fields are non-negative integers would be a stronger defensive boundary.
5. Stale JSDoc on downloadBackupParallel (nit)
The JSDoc says "Parts are assembled with cat then atomically moved to the final path" but the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file, then mv. The test even explicitly asserts expect(downloadCommand).not.toContain('cat '). The comment should be updated to reflect the actual implementation.
6. wait $J{i} captures exit code of the last pipeline component only (subtle correctness)
In the upload script (backup-service.ts:457):
wait $J0; E0=$?When backgrounding dd ... | curl ... &, $! captures the PID of the last process in the pipeline (curl). wait $Jn returns curl's exit code. If dd fails (e.g. the archive file was deleted or offset is wrong), dd exits non-zero but curl may still exit 0 (it just uploads truncated data). The upload would appear to succeed but the part would be incomplete.
The download path has the same pattern but is lower risk since curl -f would fail if the server returned a partial response. For upload, consider using set -o pipefail in the script prefix, or at minimum documenting this limitation.
7. BACKUP_DOWNLOAD_MAX_PARTS is declared but never used (dead code)
const BACKUP_DOWNLOAD_MAX_PARTS = 64;calculatePartCount uses BACKUP_MULTIPART_MAX_PARTS for the cap. If this constant is intended for future use, it should be removed until needed.
Minor observations
-
The
sizeCheck.stdoutin the restore path (sandbox.ts:5309) uses.catch(() => ({ stdout: '0' }))which swallows errors cleanly, but the returned object doesn't haveexitCodeorstderrproperties thatexecWithSessionnormally returns. This works because only.stdoutis read, but it's a bit fragile. -
The changeset looks good — single
@cloudflare/sandboxpatch entry with a user-facing description. -
Test coverage is reasonable: the new parallel download test verifies the shell script structure (truncate, dd seek, no cat). Consider adding a test for the multipart upload path (at least verifying that
uploadBackupMultipartfalls back to presigned whennumParts <= 1).
| shellEscape(dir), | ||
| shellEscape(archivePath), | ||
| '-comp zstd', | ||
| `-comp ${compression}`, |
There was a problem hiding this comment.
The compression parameter is interpolated directly into the shell command without runtime validation. While the TypeScript type constrains it to 'gzip' | 'lz4' | 'zstd', the HTTP handler accepts it from untrusted JSON with only a ?? default — no allowlist check.
Consider validating at the top of createArchive:
const ALLOWED_COMPRESSION = new Set(['gzip', 'lz4', 'zstd']);
if (!ALLOWED_COMPRESSION.has(compression)) {
return serviceError({ message: `Invalid compression algorithm: ${compression}`, code: ErrorCode.INVALID_BACKUP_CONFIG, details: { compression } });
}| * For archives >= BACKUP_DOWNLOAD_PARALLEL_MIN_SIZE, uses BACKUP_DOWNLOAD_PARALLEL_PARTS | ||
| * concurrent curl processes (each downloading a byte-range) to maximise both | ||
| * network and disk-write throughput. Parts are assembled with cat then | ||
| * atomically moved to the final path. |
There was a problem hiding this comment.
Nit: Stale JSDoc — the implementation uses dd with oflag=seek_bytes to write ranges directly into a pre-allocated file (via truncate -s), not cat. The test at line 2065 even explicitly asserts expect(downloadCommand).not.toContain('cat ').
Suggested update:
* ...concurrent curl processes (each downloading a byte-range) that write
* directly into a pre-allocated temp file via dd seek, then mv to the final path.
|
The review has been posted. Here's a summary of what I found: Review Summary for PR #666Posted a detailed review comment and 2 inline comments on the PR. Key findings: Bug:
Security/correctness:
Minor:
|
Summary
This PR improves backup and restore performance for larger archives by:
lz4with configurable compression threads.compression,compressThreads, andmultipart.Details
Large backups now upload through R2 multipart upload when enabled, while smaller backups continue using the existing single presigned PUT path. Restores now download larger archives in parallel ranges before mounting the squashfs archive in the container.
The new options allow callers to tune backup behavior:
compression: chooselz4,zstd, orgzipcompressThreads: configure mksquashfs processor countmultipart: disable multipart upload when needed