feat(download): resume via Range header on transient stream failures#52
feat(download): resume via Range header on transient stream failures#52rubenhensen merged 1 commit intomainfrom
Conversation
Cryptify's `FileServer` already supports HTTP Range, so a stream-level failure mid-download (network drop, idle timeout) can now resume from the byte offset reached rather than restarting from zero. The consumer sees a single contiguous stream regardless of how many internal retries happened. Implements the design @dobby-coder proposed on #48, which pushes back on the original "wrap withRetry" sketch in the issue body. Key shape changes from that initial sketch: - **Source-owned retry loop inside the outer `ReadableStream`.** `withRetry`'s `Promise<T>` shape resolves before the stream is read, leaving mid-stream errors with no way to re-enter the loop. The new driver lives in the stream's `start()` and turns each underlying GET into one `attempt`. Same backoff helpers (`delayWithFullJitter`, `sleep`, `classifyCryptifyError`) — different driver, no parallel implementation. - **No `tee()`** for the byte counter. Bytes flow through the source's own `read()` → `enqueue()` path, so `received += value.byteLength` at the single forwarding site. One source of truth for the resume offset. - **Strict `206` + `Content-Range` check on resume.** Range is a request hint — caching proxies and misconfigured CDNs can ignore it and return `200 OK` with the full body from byte 0. Splicing that onto a consumer at offset N corrupts the file silently. Resume now asserts `status === 206` and that `Content-Range`'s start byte equals the requested offset; either mismatch is classified as fail-not-retry (the upstream isn't going to start honouring Range on the next try). - **Attempt counter shared across resumes.** A flapping connection that delivers some bytes per attempt now exhausts the budget instead of looping forever. Test pins this — one byte per attempt, three attempts, then errors. Public-API note: `downloadFileWithRetry` now returns the `ReadableStream` synchronously (no `Promise` wrapper). Stream-level errors propagate to the consumer's first failing `read()` call rather than via the function's promise. The single existing call site in `src/crypto/decrypt.ts:45` works unchanged because `Promise.all` accepts non-thenable values. Tests: - Clean download (1 attempt) — bytes match, no Range header on first GET. - 502 → 200 retry — fresh GET (no Range) since `received === 0`. - Mid-stream failure → 206 resume from offset — bytes contiguous. - 200 on resume → fail-not-retry (silent-rewind protection). - 206 with mismatched Content-Range start → fail-not-retry. - 404 on first read → no retry, propagates as NetworkError(404). - Flapping (1 byte per attempt × 3) — exhausts budget, errors after 3. - Caller-driven abort → AbortError, no retry. Helper test fix worth noting: `controller.error()` empties any queued chunks, so the test-only `streamThenError` is now pull-based to guarantee chunks are delivered before the error fires. 72 tests pass; type-check clean. Closes #48. Refs #117, #136, #145, #47.
|
/dobby please review this implementation against your design feedback on #48. Key things to verify: the source-owned retry loop replaces withRetry on the download path, the strict 206 + Content-Range checks (incl. start-offset match), the shared attempt counter across resumes, and that classifyCryptifyError works as-is for stream-level TypeErrors. The public-API shift to a synchronous ReadableStream return is also worth a sanity check at the decrypt.ts:45 call site. |
|
Dobby is happy to help! Dobby will pull #48 to refresh on the design feedback, then check the source-owned retry loop, the strict 206 + Content-Range guards, the shared attempt counter across resumes, the classifyCryptifyError reuse, and the synchronous ReadableStream return at decrypt.ts:45. Dobby will run typecheck and tests too. Findings coming as a PR review! |
There was a problem hiding this comment.
Review against design feedback on #48
Verified the implementation against each point of the design feedback. Built (npm run build), typechecked (tsc --noEmit), and ran tests (vitest run — 72 passed, including the 8 new download-retry cases) on the branch.
✅ 1. Source-owned retry loop replaces withRetry on the download path
downloadFileWithRetry (src/api/cryptify.ts:305-380) is now a synchronous factory that returns a ReadableStream whose start() owns the while(true) loop. Each underlying GET is one attempt. withRetry is no longer in the import-actually-used list for this path — confirmed by reading the file. The chunk-PUT path (storeChunkWithRetry) still uses withRetry, which is right: that one fits the Promise<T> shape.
This matches the §1 sketch from the design feedback essentially line-for-line, including reuse of classifyCryptifyError, delayWithFullJitter, sleep, and withTimeout.
✅ 2. Strict 206 + Content-Range checks (incl. start-offset match)
downloadRange (cryptify.ts:238-277) does the right thing in the right order:
!response.ok→NetworkErrorwith the response status (so 5xx routes through toclassifyCryptifyError === 'retry'and 4xx through to'fail', which is correct).response.status !== 206(e.g. a200 OKfrom a Range-stripping proxy) →NetworkError. Sincestatusis a 2xx,classifyCryptifyErrorwill see it as< 500and classify asfail— exactly the silent-rewind protection §4 asked for.parseContentRangeStart(contentRange) !== offset→ alsoNetworkError, also fail-not-retry.
parseContentRangeStart (cryptify.ts:283-289) is tight: ^\s*bytes\s+(\d+)- — anything not matching the standard form returns null, which then mismatches offset and trips the assertion. Conservative and correct.
Both branches are covered by tests (tests/api.test.ts:684-734), and I confirmed the 200 → fails test exercises the offset path (after 2 bytes have been delivered) — corruption would be silent without this guard.
✅ 3. Shared attempt counter across resumes
let attempt = 0 is declared at cryptify.ts:312 — outside the while loop — and attempt += 1 runs unconditionally at the top of every iteration (cryptify.ts:320). A flapping connection that delivers one byte per attempt now exhausts the budget. The "shares the attempt counter across resumes" test (api.test.ts:753-827) constructs exactly that adversarial server (one byte per attempt → error) and asserts exactly 3 calls before the consumer's read() rejects. Solid.
✅ 4. classifyCryptifyError works as-is for stream-level TypeErrors
Confirmed by reading src/util/retry.ts:68-79 — the existing classifier already returns 'retry' for TypeError. The test at api.test.ts:654-682 uses new TypeError('network reset') thrown from inside the stream's pull, and that error successfully drives a Range-based resume. No classifier changes needed, as the design feedback predicted (§3).
One small empirical note: in real fetch, mid-stream reads can also surface as DOMException with name === 'TypeError' in some browsers' historical implementations. The classifier's instanceof TypeError may not catch those. Cryptify SDK targets are modern enough that the standard TypeError branch covers the live cases, but it's worth being aware of if a future browser reverts. Not a blocker.
✅ 5. Public-API shift to synchronous ReadableStream return — decrypt.ts:45 sanity check
decrypt.ts:43-46:
const [vk, fileStream] = await Promise.all([
fetchVerificationKey(pkgUrl, headers),
downloadFileWithRetry(cryptifyUrl, uuid, retry, signal),
]);Promise.all accepts non-thenable values and resolves them as-is, so this works unchanged. fileStream is still typed as ReadableStream<Uint8Array>. The PR body's claim that "404 now surfaces on the consumer's first read() rather than the function's promise" is the right framing — and the errors stream on first read with 404 (no retry) test (api.test.ts:736-751) pins that behaviour. StreamUnsealer.new(readable, vk) will see the error via the consumer's read() and reject from within WASM, which is consistent with how it already handles other stream errors.
The only thing this changes for the existing call site is where the error surfaces (consumer vs. caller of downloadFileWithRetry), not whether errors propagate. The single existing call site is fine.
Minor observations (non-blocking)
cancel()is a no-op for the in-flight fetch.cryptify.ts:370-378documents this honestly. If a caller cancels the consumer-facingReadableStreamwithout aborting viasignal, the underlying GET still runs to completion (its body is then dropped). This is a small resource leak per cancellation but matches what the design feedback implied (caller'ssignalis the primary abort mechanism). Could be tightened later by maintaining an internalAbortControllercombined with the caller's signal — out of scope here.cleanup()is called both on success and incatch— that's fine becauseclearTimeouton a fired/cleared timer is a no-op. Worth a one-line comment if the file ever sees a refactor, but not needed now.- Stall watchdog still out of scope — the design feedback (§6) explicitly punted this, and the PR body confirms. Good discipline; tracking it separately is the right call.
Verdict
This implementation lands cleanly against the §1–§7 feedback from #48. No changes needed. Recommend marking ready and merging once any remaining manual smoke tests (StreamUnsealer over a spliced stream, killed-mid-download against local cryptify) check out.
|
Thanks @dobby-coder for the thorough review and the approval. Acknowledging the three non-blocking observations:
Stall watchdog stays out — separate axis, will track when there's a concrete need. No code changes, but I wanted to leave the trail visible on the PR for the next reviewer. |
|
🎉 This PR is included in version 1.5.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
|
Thanks @dobby-coder for the approval — no changes needed on this PR. The three non-blocking observations (cancel() no-op for in-flight fetch, cleanup() called on both paths, stall watchdog out of scope) match my reading. Stall detection will be its own issue when there's a concrete need. |
Picks up encryption4all/postguard-js#52 (resumable downloads via Range header on transient stream failures). Public-API change worth flagging for reviewers: `downloadFileWithRetry` now returns the ReadableStream synchronously rather than wrapped in a Promise — but the only consumer in this repo (via `Opened.decrypt` / `inspectSealed` in pg-js itself) is unaffected, since `Promise.all` accepts non-thenable values as-is. No website-side code change needed. `npm run check`, `lint`, and `build` clean.
|
@rubenhensen all dobby checks green — ready for your review and merge when you've run the manual smoke tests called out in the test plan. |
Implements the resumable-download design from #48 — incorporating the review feedback @dobby-coder posted on the issue, which pushed back hard (and correctly) on the original "wrap withRetry" sketch in the issue body.
Shape
ReadableStream.withRetry'sPromise<T>shape resolves before the stream is read, leaving mid-stream errors with no way to re-enter the loop. The new driver lives in the stream'sstart()and turns each underlying GET into oneattempt. Same backoff helpers — different driver.tee()— bytes flow through the source's ownread()→enqueue()path, soreceived += value.byteLengthat the single forwarding site.206+Content-Rangecheck on resume. Range is a request hint, not a contract. Caching proxies and misconfigured CDNs can ignore it and return200 OKwith the full body from byte 0; splicing that onto a consumer at offset N corrupts the file silently. Resume asserts bothstatus === 206and thatContent-Range's start byte equals the requested offset — either mismatch is classified as fail-not-retry (the upstream isn't going to start honouring Range on the next try).classifyCryptifyErrorfor stream-level errors — confirmed by test rather than pre-emptively changed.Public-API change
downloadFileWithRetrynow returns theReadableStreamsynchronously (noPromisewrapper). Stream-level errors propagate to the consumer's first failingread()call rather than via the function's promise. The single existing call site insrc/crypto/decrypt.ts:45works unchanged becausePromise.allaccepts non-thenable values.This is a behavioural shift only for callers that did
await downloadFileWithRetry(...)and caught a synchronous rejection for non-stream-level errors (e.g. 404). With the new shape, those errors surface on the consumer's firstread()call, which is the right place — they're stream-level events, not function-result events.Test plan
received === 0Content-Rangestart → fail-not-retryNetworkError(404)AbortError, no retrynpx tsc --noEmitcleannpx vitest run— 72 passed (was 66, added 6)ncbetween Cryptify and the browser, kill the connection mid-download → SDK resumes when reconnected, file downloads complete and decrypts cleanlyStreamUnsealer.new(stream, vk)consumes an internally-spliced stream without choking — the only existing consumer atsrc/crypto/decrypt.ts:62Refs
RetryOptions,classifyCryptifyError, jitter helpers reused)Out of scope (tracked separately)
enqueue) — separate axis fromdownloadTimeoutMs, deserves its own design.