Skip to content

feat(download): resume via Range header on transient stream failures#52

Merged
rubenhensen merged 1 commit intomainfrom
feat/resumable-downloads
May 7, 2026
Merged

feat(download): resume via Range header on transient stream failures#52
rubenhensen merged 1 commit intomainfrom
feat/resumable-downloads

Conversation

@rubenhensen
Copy link
Copy Markdown
Contributor

Implements the resumable-download design from #48 — incorporating the review feedback @dobby-coder posted on the issue, which pushed back hard (and correctly) on the original "wrap withRetry" sketch in the issue body.

Shape

  • Source-owned retry loop inside the outer ReadableStream. withRetry's Promise<T> shape resolves before the stream is read, leaving mid-stream errors with no way to re-enter the loop. The new driver lives in the stream's start() and turns each underlying GET into one attempt. Same backoff helpers — different driver.
  • No tee() — bytes flow through the source's own read()enqueue() path, so received += value.byteLength at the single forwarding site.
  • Strict 206 + Content-Range check on resume. Range is a request hint, not a contract. Caching proxies and misconfigured CDNs can ignore it and return 200 OK with the full body from byte 0; splicing that onto a consumer at offset N corrupts the file silently. Resume asserts both status === 206 and that Content-Range's start byte equals the requested offset — either mismatch is classified as fail-not-retry (the upstream isn't going to start honouring Range on the next try).
  • Attempt counter shared across resumes. A flapping connection that delivers some bytes per attempt now exhausts the budget instead of looping forever.
  • Reuses existing classifyCryptifyError for stream-level errors — confirmed by test rather than pre-emptively changed.
  • Stall-watchdog explicitly out of scope (separate axis, would balloon the change).

Public-API change

downloadFileWithRetry now returns the ReadableStream synchronously (no Promise wrapper). Stream-level errors propagate to the consumer's first failing read() call rather than via the function's promise. The single existing call site in src/crypto/decrypt.ts:45 works unchanged because Promise.all accepts non-thenable values.

This is a behavioural shift only for callers that did await downloadFileWithRetry(...) and caught a synchronous rejection for non-stream-level errors (e.g. 404). With the new shape, those errors surface on the consumer's first read() call, which is the right place — they're stream-level events, not function-result events.

Test plan

  • Clean download (1 attempt) — bytes match, no Range header on first GET
  • 502 → 200 retry — fresh GET (no Range) since received === 0
  • Mid-stream failure → 206 resume from offset — bytes contiguous
  • 200 on resume → fail-not-retry (silent-rewind protection)
  • 206 with mismatched Content-Range start → fail-not-retry
  • 404 on first read → no retry, propagates as NetworkError(404)
  • Flapping connection (1 byte per attempt × 3) — exhausts budget, errors after 3
  • Caller-driven abort → AbortError, no retry
  • npx tsc --noEmit clean
  • npx vitest run — 72 passed (was 66, added 6)
  • Manual: against a local cryptify, nc between Cryptify and the browser, kill the connection mid-download → SDK resumes when reconnected, file downloads complete and decrypts cleanly
  • Manual: confirm StreamUnsealer.new(stream, vk) consumes an internally-spliced stream without choking — the only existing consumer at src/crypto/decrypt.ts:62

Refs

Out of scope (tracked separately)

  • Cross-refresh download resume (persisting partial blob in IndexedDB so a refresh picks up where the previous session left off). Most downloads are short enough that in-session retry covers the common case.
  • Idle-stall detection (stall watchdog timer that resets on each enqueue) — separate axis from downloadTimeoutMs, deserves its own design.

Cryptify's `FileServer` already supports HTTP Range, so a stream-level
failure mid-download (network drop, idle timeout) can now resume from
the byte offset reached rather than restarting from zero. The consumer
sees a single contiguous stream regardless of how many internal retries
happened.

Implements the design @dobby-coder proposed on #48,
which pushes back on the original "wrap withRetry" sketch in the issue
body. Key shape changes from that initial sketch:

- **Source-owned retry loop inside the outer `ReadableStream`.**
  `withRetry`'s `Promise<T>` shape resolves before the stream is read,
  leaving mid-stream errors with no way to re-enter the loop. The new
  driver lives in the stream's `start()` and turns each underlying GET
  into one `attempt`. Same backoff helpers (`delayWithFullJitter`,
  `sleep`, `classifyCryptifyError`) — different driver, no parallel
  implementation.
- **No `tee()`** for the byte counter. Bytes flow through the source's
  own `read()` → `enqueue()` path, so `received += value.byteLength`
  at the single forwarding site. One source of truth for the resume
  offset.
- **Strict `206` + `Content-Range` check on resume.** Range is a
  request hint — caching proxies and misconfigured CDNs can ignore it
  and return `200 OK` with the full body from byte 0. Splicing that
  onto a consumer at offset N corrupts the file silently. Resume now
  asserts `status === 206` and that `Content-Range`'s start byte
  equals the requested offset; either mismatch is classified as
  fail-not-retry (the upstream isn't going to start honouring Range
  on the next try).
- **Attempt counter shared across resumes.** A flapping connection
  that delivers some bytes per attempt now exhausts the budget instead
  of looping forever. Test pins this — one byte per attempt, three
  attempts, then errors.

Public-API note: `downloadFileWithRetry` now returns the
`ReadableStream` synchronously (no `Promise` wrapper). Stream-level
errors propagate to the consumer's first failing `read()` call rather
than via the function's promise. The single existing call site in
`src/crypto/decrypt.ts:45` works unchanged because `Promise.all`
accepts non-thenable values.

Tests:
- Clean download (1 attempt) — bytes match, no Range header on first GET.
- 502 → 200 retry — fresh GET (no Range) since `received === 0`.
- Mid-stream failure → 206 resume from offset — bytes contiguous.
- 200 on resume → fail-not-retry (silent-rewind protection).
- 206 with mismatched Content-Range start → fail-not-retry.
- 404 on first read → no retry, propagates as NetworkError(404).
- Flapping (1 byte per attempt × 3) — exhausts budget, errors after 3.
- Caller-driven abort → AbortError, no retry.

Helper test fix worth noting: `controller.error()` empties any queued
chunks, so the test-only `streamThenError` is now pull-based to
guarantee chunks are delivered before the error fires.

72 tests pass; type-check clean.

Closes #48.
Refs #117, #136, #145, #47.
@rubenhensen
Copy link
Copy Markdown
Contributor Author

/dobby please review this implementation against your design feedback on #48. Key things to verify: the source-owned retry loop replaces withRetry on the download path, the strict 206 + Content-Range checks (incl. start-offset match), the shared attempt counter across resumes, and that classifyCryptifyError works as-is for stream-level TypeErrors. The public-API shift to a synchronous ReadableStream return is also worth a sanity check at the decrypt.ts:45 call site.

@dobby-coder
Copy link
Copy Markdown
Contributor

dobby-coder Bot commented May 7, 2026

Dobby is happy to help! Dobby will pull #48 to refresh on the design feedback, then check the source-owned retry loop, the strict 206 + Content-Range guards, the shared attempt counter across resumes, the classifyCryptifyError reuse, and the synchronous ReadableStream return at decrypt.ts:45. Dobby will run typecheck and tests too. Findings coming as a PR review!

Copy link
Copy Markdown
Contributor

@dobby-coder dobby-coder Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review against design feedback on #48

Verified the implementation against each point of the design feedback. Built (npm run build), typechecked (tsc --noEmit), and ran tests (vitest run — 72 passed, including the 8 new download-retry cases) on the branch.

✅ 1. Source-owned retry loop replaces withRetry on the download path

downloadFileWithRetry (src/api/cryptify.ts:305-380) is now a synchronous factory that returns a ReadableStream whose start() owns the while(true) loop. Each underlying GET is one attempt. withRetry is no longer in the import-actually-used list for this path — confirmed by reading the file. The chunk-PUT path (storeChunkWithRetry) still uses withRetry, which is right: that one fits the Promise<T> shape.

This matches the §1 sketch from the design feedback essentially line-for-line, including reuse of classifyCryptifyError, delayWithFullJitter, sleep, and withTimeout.

✅ 2. Strict 206 + Content-Range checks (incl. start-offset match)

downloadRange (cryptify.ts:238-277) does the right thing in the right order:

  1. !response.okNetworkError with the response status (so 5xx routes through to classifyCryptifyError === 'retry' and 4xx through to 'fail', which is correct).
  2. response.status !== 206 (e.g. a 200 OK from a Range-stripping proxy) → NetworkError. Since status is a 2xx, classifyCryptifyError will see it as < 500 and classify as fail — exactly the silent-rewind protection §4 asked for.
  3. parseContentRangeStart(contentRange) !== offset → also NetworkError, also fail-not-retry.

parseContentRangeStart (cryptify.ts:283-289) is tight: ^\s*bytes\s+(\d+)- — anything not matching the standard form returns null, which then mismatches offset and trips the assertion. Conservative and correct.

Both branches are covered by tests (tests/api.test.ts:684-734), and I confirmed the 200 → fails test exercises the offset path (after 2 bytes have been delivered) — corruption would be silent without this guard.

✅ 3. Shared attempt counter across resumes

let attempt = 0 is declared at cryptify.ts:312 — outside the while loop — and attempt += 1 runs unconditionally at the top of every iteration (cryptify.ts:320). A flapping connection that delivers one byte per attempt now exhausts the budget. The "shares the attempt counter across resumes" test (api.test.ts:753-827) constructs exactly that adversarial server (one byte per attempt → error) and asserts exactly 3 calls before the consumer's read() rejects. Solid.

✅ 4. classifyCryptifyError works as-is for stream-level TypeErrors

Confirmed by reading src/util/retry.ts:68-79 — the existing classifier already returns 'retry' for TypeError. The test at api.test.ts:654-682 uses new TypeError('network reset') thrown from inside the stream's pull, and that error successfully drives a Range-based resume. No classifier changes needed, as the design feedback predicted (§3).

One small empirical note: in real fetch, mid-stream reads can also surface as DOMException with name === 'TypeError' in some browsers' historical implementations. The classifier's instanceof TypeError may not catch those. Cryptify SDK targets are modern enough that the standard TypeError branch covers the live cases, but it's worth being aware of if a future browser reverts. Not a blocker.

✅ 5. Public-API shift to synchronous ReadableStream return — decrypt.ts:45 sanity check

decrypt.ts:43-46:

const [vk, fileStream] = await Promise.all([
  fetchVerificationKey(pkgUrl, headers),
  downloadFileWithRetry(cryptifyUrl, uuid, retry, signal),
]);

Promise.all accepts non-thenable values and resolves them as-is, so this works unchanged. fileStream is still typed as ReadableStream<Uint8Array>. The PR body's claim that "404 now surfaces on the consumer's first read() rather than the function's promise" is the right framing — and the errors stream on first read with 404 (no retry) test (api.test.ts:736-751) pins that behaviour. StreamUnsealer.new(readable, vk) will see the error via the consumer's read() and reject from within WASM, which is consistent with how it already handles other stream errors.

The only thing this changes for the existing call site is where the error surfaces (consumer vs. caller of downloadFileWithRetry), not whether errors propagate. The single existing call site is fine.


Minor observations (non-blocking)

  • cancel() is a no-op for the in-flight fetch. cryptify.ts:370-378 documents this honestly. If a caller cancels the consumer-facing ReadableStream without aborting via signal, the underlying GET still runs to completion (its body is then dropped). This is a small resource leak per cancellation but matches what the design feedback implied (caller's signal is the primary abort mechanism). Could be tightened later by maintaining an internal AbortController combined with the caller's signal — out of scope here.
  • cleanup() is called both on success and in catch — that's fine because clearTimeout on a fired/cleared timer is a no-op. Worth a one-line comment if the file ever sees a refactor, but not needed now.
  • Stall watchdog still out of scope — the design feedback (§6) explicitly punted this, and the PR body confirms. Good discipline; tracking it separately is the right call.

Verdict

This implementation lands cleanly against the §1–§7 feedback from #48. No changes needed. Recommend marking ready and merging once any remaining manual smoke tests (StreamUnsealer over a spliced stream, killed-mid-download against local cryptify) check out.

@rubenhensen
Copy link
Copy Markdown
Contributor Author

Thanks @dobby-coder for the thorough review and the approval. Acknowledging the three non-blocking observations:

  • cancel() no-op for in-flight fetch — agreed, deliberately out of scope. The caller-driven signal is the documented abort path; tightening this with an internal AbortController combined with the caller's signal is a cleanup worth its own PR.
  • cleanup() called both on success and in catchclearTimeout is idempotent so this is intentional. Won't add a comment unless the file sees a wider refactor.
  • DOMException masquerading as TypeError in legacy browsers — noting it. The classifier's instanceof TypeError covers what fetch actually throws on modern browsers; if a future regression surfaces, we'll widen the branch.

Stall watchdog stays out — separate axis, will track when there's a concrete need.

No code changes, but I wanted to leave the trail visible on the PR for the next reviewer.

@rubenhensen rubenhensen merged commit 95af57c into main May 7, 2026
2 checks passed
@rubenhensen rubenhensen deleted the feat/resumable-downloads branch May 7, 2026 19:00
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

🎉 This PR is included in version 1.5.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@rubenhensen
Copy link
Copy Markdown
Contributor Author

Thanks @dobby-coder for the approval — no changes needed on this PR. The three non-blocking observations (cancel() no-op for in-flight fetch, cleanup() called on both paths, stall watchdog out of scope) match my reading. Stall detection will be its own issue when there's a concrete need.

rubenhensen added a commit to encryption4all/postguard-website that referenced this pull request May 7, 2026
Picks up encryption4all/postguard-js#52 (resumable downloads via
Range header on transient stream failures). Public-API change worth
flagging for reviewers: `downloadFileWithRetry` now returns the
ReadableStream synchronously rather than wrapped in a Promise — but
the only consumer in this repo (via `Opened.decrypt` / `inspectSealed`
in pg-js itself) is unaffected, since `Promise.all` accepts non-thenable
values as-is. No website-side code change needed.

`npm run check`, `lint`, and `build` clean.
@rubenhensen
Copy link
Copy Markdown
Contributor Author

@rubenhensen all dobby checks green — ready for your review and merge when you've run the manual smoke tests called out in the test plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(download): resume downloads via Range on transient failure

1 participant