icloud-photos-downloader · MadsenDev · Mar 3, 2026 · Mar 3, 2026 · Apr 23, 2026
diff --git a/.python-version b/.python-version
@@ -0,0 +1 @@
+3.13.2
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,7 @@
+# Agent Workflow Notes
+
+## Mandatory checklist discipline
+- Before starting implementation work, review `TODO.md` and select the items being addressed.
+- After every code change, immediately update `TODO.md` by checking completed items and adjusting status text when needed.
+- Do not finish a coding task without reconciling `TODO.md` to match the actual repository state.
+
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,18 @@
 
 ## Unreleased
 
+## 1.33.0 (2026-03-03)
+
+- feat: add resilient stateful engine mode with SQLite task/checkpoint persistence and resumable leases (`--state-db`)
+- feat: add unified retry/backoff configuration for metadata and downloads (`--max-retries`, backoff controls, `--respect-retry-after`)
+- feat: add bounded adaptive download concurrency and chunked streaming controls (`--download-workers`, `--download-chunk-bytes`)
+- feat: add download integrity verification controls (`--verify-size`, `--verify-checksum`) and URL-refresh retry path for expired download URLs
+- feat: add machine-readable run outputs and observability options (`--log-format json`, `--metrics-json`)
+- feat: add state DB maintenance controls (`--state-db-prune-completed-days`, `--state-db-vacuum`)
+- feat: add explicit mode contract, exit semantics, graceful cancellation/requeue behavior, and repeated-throttling alerting
+- docs: add architecture note, migration/concurrency/troubleshooting guides, and benchmark docs for workers/chunk sizing
+- test: extend integration coverage for checkpoint resume, range edge cases, shutdown behavior, mode parity, and URL refresh flows
+
 ## 1.32.2 (2025-09-01)
 
 - fix: HTTP response content not captured for authentication and non-streaming requests [#1240](https://github.com/icloud-photos-downloader/icloud_photos_downloader/issues/1240)

diff --git a/Improving iCloud Photos Downloader resilience and efficiency at scale.pdf b/Improving iCloud Photos Downloader resilience and efficiency at scale.pdf
diff --git a/README.md b/README.md
@@ -1,16 +1,48 @@
-# !!!! [Looking for MAINTAINER for this project](https://github.com/icloud-photos-downloader/icloud_photos_downloader/issues/1305) !!!!
-
 # iCloud Photos Downloader [![Quality Checks](https://github.com/icloud-photos-downloader/icloud_photos_downloader/workflows/Quality%20Checks/badge.svg)](https://github.com/icloud-photos-downloader/icloud_photos_downloader/actions/workflows/quality-checks.yml) [![Build and Package](https://github.com/icloud-photos-downloader/icloud_photos_downloader/workflows/Produce%20Artifacts/badge.svg)](https://github.com/icloud-photos-downloader/icloud_photos_downloader/actions/workflows/produce-artifacts.yml) [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 
-- A command-line tool to download all your iCloud photos.
+A command-line tool to download all your iCloud photos.
+
 - Works on Linux, Windows, and macOS; laptop, desktop, and NAS
 - Available as an executable for direct downloading and through package managers/ecosystems ([Docker](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#docker), [PyPI](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#pypi), [AUR](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#aur), [npm](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#npm))
-- Developed and maintained by volunteers (we are always looking for [help](CONTRIBUTING.md)). 
+- Developed and maintained by volunteers (we are always looking for [help](CONTRIBUTING.md))
+
+See [Documentation](https://icloud-photos-downloader.github.io/icloud_photos_downloader/) for more details. Also, check [Issues](https://github.com/icloud-photos-downloader/icloud_photos_downloader/issues).
 
-See [Documentation](https://icloud-photos-downloader.github.io/icloud_photos_downloader/) for more details. Also, check [Issues](https://github.com/icloud-photos-downloader/icloud_photos_downloader/issues)
+> [!IMPORTANT]
+> The project is currently looking for a maintainer. See [issue #1305](https://github.com/icloud-photos-downloader/icloud_photos_downloader/issues/1305).
 
 We aim to release new versions once a week (Friday), if there is something worth delivering.
 
+## New in 1.33.0 (Resilient Engine)
+
+- Unified retry + exponential backoff across metadata and downloads (`429`/`503` aware)
+- Adaptive throttling cooldown and bounded download workers (`--download-workers`)
+- Optional persistent SQLite task/checkpoint state (`--state-db`) for resumable long runs
+- Configurable streaming chunk size (`--download-chunk-bytes`)
+- Optional integrity verification (`--verify-size`, `--verify-checksum`)
+- JSON logging and run metrics export (`--log-format json`, `--metrics-json`)
+- State DB maintenance controls (`--state-db-prune-completed-days`, `--state-db-vacuum`)
+
+## Engine Modes
+
+| Mode | Best For |
+| --- | --- |
+| Classic (stateless) | Small libraries and simple one-off runs |
+| Stateful engine | Large libraries and long-running/resumable jobs |
+
+Stateful mode is enabled with `--state-db`. It stores per-asset tasks/checkpoints, supports deterministic resume after interruption, and requeues stale in-progress tasks safely on restart.
+
+Backward compatibility: if `--state-db` is not used, `icloudpd` behaves like previous stateless versions (filesystem skip-based behavior).
+
+### Key Performance and Resilience Flags
+
+- `--state-db [PATH]`
+- `--download-workers N`
+- `--max-retries N`
+- `--download-chunk-bytes N`
+- `--verify-size` / `--verify-checksum`
+- `--no-remote-count`
+
 ## iCloud Prerequisites
 
 To make iCloud Photo Downloader work, ensure the iCloud account is configured with the following settings, otherwise Apple Servers will return an ACCESS_DENIED error:
@@ -22,7 +54,7 @@ To make iCloud Photo Downloader work, ensure the iCloud account is configured wi
 ## Install and Run
 
 There are three ways to run `icloudpd`:
-1. Download executable for your platform from the GitHub [Release](https://github.com/icloud-photos-downloader/icloud_photos_downloader/releases/tag/v1.32.2) and run it
+1. Download executable for your platform from the GitHub [Release](https://github.com/icloud-photos-downloader/icloud_photos_downloader/releases/tag/v1.33.0) and run it
 1. Use package manager to install, update, and, in some cases, run ([Docker](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#docker), [PyPI](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#pypi), [AUR](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#aur), [npm](https://icloud-photos-downloader.github.io/icloud_photos_downloader/install.html#npm))
 1. Build and run from the source
 

diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,180 @@
+# iCloud Photos Downloader Improvement Checklist
+
+Last updated: 2026-03-03
+
+Use this as the source of truth for implementation progress. Every code change should update this file.
+
+## 0. Project hygiene and tracking
+- [x] Add/update architecture note describing current pipeline and target pipeline.
+- [x] Keep this checklist aligned with actual implemented code and tests.
+- [ ] For each completed task, reference the related PR/commit in this file.
+- [x] Keep changelog/release notes in sync when user-facing flags/behavior change.
+- [x] Ensure local development/testing uses Python 3.13 in `.venv` to match project constraints.
+
+## 1. Unified retry and backoff (metadata + downloads)
+### 1.1 Policy and configuration
+- [x] Define one retry policy module shared by metadata calls and file downloads.
+- [x] Add CLI option: `--max-retries` (default target: 6).
+- [x] Add CLI option: `--backoff-base-seconds`.
+- [x] Add CLI option: `--backoff-max-seconds`.
+- [x] Add CLI option: `--respect-retry-after/--no-respect-retry-after`.
+- [x] Add CLI option: `--throttle-cooldown-seconds`.
+- [x] Ensure defaults preserve safe behavior for existing users.
+
+### 1.2 Error classification
+- [x] Classify fatal auth/config errors as no-retry (invalid creds, MFA unavailable, ADP/web-disabled).
+- [x] Classify session-invalid errors as re-auth-then-retry.
+- [x] Classify transient errors as retryable (429, 503, timeouts, connection resets, throttling-like denials).
+- [x] Centralize retry decision logging (attempt, reason, next delay).
+
+### 1.3 Integration points
+- [x] Apply shared retry policy to album/asset enumeration calls.
+- [x] Apply shared retry policy to download calls.
+- [x] Remove/replace duplicated ad-hoc retry loops in existing code paths.
+- [x] Add jitter to exponential backoff.
+- [x] Honor `Retry-After` when present on retryable responses.
+
+### 1.4 Verification
+- [x] Unit tests for retry classifier.
+- [x] Unit tests for backoff math and jitter bounds.
+- [x] Unit tests for `Retry-After` handling.
+- [x] Integration tests: metadata retry behavior under simulated 429/503.
+- [x] Integration tests: download retry behavior under simulated 429/503/reset.
+
+## 2. Persistent state DB and resumable task queue
+### 2.1 Data model
+- [x] Add `--state-db` option (or equivalent path option) with sensible default.
+- [x] Create DB initialization/migration path.
+- [x] Create `assets` table.
+- [x] Create `tasks` table with status/attempt/error fields.
+- [x] Create `checkpoints` table for pagination progress.
+- [x] Add indexes for task leasing and status filtering.
+
+### 2.2 Enumeration persistence
+- [x] Persist enumerated assets in batches.
+- [x] Persist tasks per asset version.
+- [x] Save checkpoint every page (or configurable page interval).
+- [x] Resume enumeration from checkpoint after restart.
+
+### 2.3 Worker/task lifecycle
+- [x] Add task states: `pending`, `in_progress`, `done`, `failed`.
+- [x] Add lease timestamp/owner for `in_progress`.
+- [x] Requeue stale leased tasks on startup.
+- [x] Track per-task attempts and last error.
+
+### 2.4 Verification
+- [x] Unit tests for DB schema creation and migrations.
+- [x] Unit tests for lease/requeue behavior.
+- [x] Integration test: crash mid-run and resume without redoing completed tasks.
+- [x] Integration test: checkpoint resume after partial enumeration.
+
+### 2.5 URL freshness
+- [x] Detect expired/invalid persisted download URLs and refresh asset version metadata.
+- [x] Add task/state marker for URL refresh path (e.g., `needs_url_refresh`) and retry flow.
+
+## 3. Bounded adaptive concurrency
+### 3.1 CLI and defaults
+- [x] Add `--download-workers` option (default target: 4).
+- [x] Keep metadata enumeration single-threaded by default.
+- [x] Document deprecation relationship with `--threads-num`.
+
+### 3.2 Limiting and adaptation
+- [x] Implement shared account-level limiter for download workers.
+- [x] Separate metadata and download request budgets (if needed by code design).
+- [x] Implement AIMD or equivalent adaptive reduction on throttling events.
+- [x] Add global cool-down behavior when repeated throttle signals occur.
+
+### 3.3 Session/cookie safety
+- [x] Audit all session/cookie writes under concurrent access.
+- [x] Add locking or redesign to avoid concurrent write races.
+- [x] Ensure no cookie/session corruption under multithreaded runs.
+
+### 3.4 Verification
+- [x] Unit tests for limiter/token bucket behavior.
+- [x] Concurrency tests for session persistence safety.
+- [x] Integration tests for worker pool drain/stop/restart behavior.
+- [x] Benchmark runs at workers = 1, 2, 4, 8 and record throughput + error rate.
+
+## 4. Download efficiency and integrity
+### 4.1 Throughput improvements
+- [x] Add `--download-chunk-bytes` option (default target: 262144).
+- [x] Replace fixed 1 KiB streaming chunk with configurable larger chunk.
+- [x] Verify memory usage remains bounded by worker count and chunk size.
+- [x] Benchmark chunk-size/verification combinations for throughput vs CPU tradeoff.
+
+### 4.2 Integrity checks
+- [x] Add `--verify-size/--no-verify-size` option.
+- [x] Add `--verify-checksum/--no-verify-checksum` option.
+- [x] Validate downloaded file size against expected metadata.
+- [x] Implement optional checksum validation strategy.
+- [x] Store local checksum/result in state DB when enabled.
+
+### 4.3 Range resume hardening
+- [x] Keep `.part` resume behavior with `Range` requests.
+- [x] Detect non-`206` response when resuming and safely restart partial file.
+- [x] Add corruption-safe handling for mismatched range behavior.
+
+### 4.4 Verification
+- [x] Unit tests for chunk-size configuration and defaults.
+- [x] Unit tests for size verification success/failure.
+- [x] Unit tests for checksum verification success/failure.
+- [x] Integration tests for resume with partial files and range edge cases.
+
+## 5. Request volume and enumeration efficiency
+- [x] Add `--album-page-size` option (target range: 50-500).
+- [x] Add `--no-remote-count` option to skip expensive album count calls.
+- [x] Reduce redundant metadata queries where possible.
+- [x] Add/align chunked date-based run options (`since/until added date` behavior).
+- [x] Document clear behavior differences between added-date and created-date usage.
+- [x] Add tests for new pagination and remote-count toggles.
+
+## 6. Observability and operations
+### 6.1 Logging
+- [x] Add structured JSON log mode.
+- [x] Include stable fields (`run_id`, `asset_id`, `attempt`, `http_status`, etc.).
+- [x] Ensure sensitive data redaction remains enforced.
+
+### 6.2 Metrics and health
+- [x] Add metrics endpoint or export path (if compatible with current stack).
+- [x] Track throughput, retries, throttle events, queue depth, success gap.
+- [x] Add low-disk-space warning/error classification.
+- [x] Provide JSON stats snapshot output suitable for GUI wrappers (`--metrics-json`).
+
+### 6.3 Alerts and notifications
+- [x] Add alert condition for repeated throttling.
+- [x] Keep MFA expiry notification path working with new engine.
+- [x] Add docs for recommended operational thresholds.
+
+## 7. Documentation and migration
+- [x] Update CLI reference docs for all new options.
+- [x] Add migration guide: stateless mode vs stateful mode.
+- [x] Document compatibility and unchanged default behavior.
+- [x] Document concurrency limitations and safe defaults.
+- [x] Add troubleshooting guide for throttling/session issues.
+
+## 9. Runtime Semantics and Operability Hardening
+### 9.1 Mode contract
+- [x] Define explicit legacy/stateless mode contract (no DB required, filesystem skip semantics).
+- [x] Define explicit stateful engine mode contract (resume guarantees, task-state semantics).
+- [x] Add integration tests asserting mode-specific behavior and parity expectations.
+
+### 9.2 Exit and summary semantics
+- [x] Define process exit code contract (success, partial success, fatal auth/config, cancelled, stalled).
+- [x] Emit machine-readable end-of-run summary with totals/failures/error location hints.
+
+### 9.3 Cancellation and shutdown
+- [x] Handle SIGINT/SIGTERM with graceful stop (drain or safe requeue of in-flight work).
+- [x] Ensure clean shutdown is distinguishable from crash and restart behavior is deterministic.
+
+### 9.4 State DB growth and retention
+- [x] Add DB retention/pruning policy (completed task cleanup / capped error history).
+- [x] Document and/or automate WAL checkpointing and vacuum guidance.
+
+## 8. Final validation before release
+- [ ] Full test suite passes.
+- [x] New tests added for each new subsystem.
+- [x] Lint/type checks pass.
+- [ ] Manual end-to-end dry run on small sample library.
+- [ ] Manual end-to-end run with injected transient failures.
+- [x] Confirm no regressions in naming/dedup/folder behavior.
+- [x] Confirm watch mode behavior is unchanged unless explicitly modified.
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,42 @@
+# Engine Architecture
+
+This note describes the current downloader execution pipeline and the target resilient pipeline now implemented.
+
+## Modes
+
+- `legacy_stateless`:
+  - No state DB required.
+  - Filesystem existence checks drive skip/retry behavior.
+  - Preserves legacy CLI expectations.
+- `stateful_engine`:
+  - Uses SQLite state DB (`--state-db`) for assets, tasks, and checkpoints.
+  - Supports deterministic resume with leased task recovery.
+
+## Pipeline Stages
+
+1. Authenticate and initialize per-user run context (`run_id`, retry/limiter/metrics).
+2. Enumerate remote assets (single-threaded) and persist checkpoints/tasks in stateful mode.
+3. Download via bounded worker pool with adaptive limiter.
+4. Apply unified retry/backoff policy for metadata and downloads (with jitter and `Retry-After`).
+5. Verify integrity (size and optional checksum).
+6. Persist task outcomes and emit end-of-run summary (machine-readable + human logs).
+
+## Resilience Guarantees
+
+- Shared retry classifier for transient vs fatal errors.
+- Re-auth on session-invalid failures.
+- URL freshness path for expired download URLs (`401`/`403`/`410`) with one metadata refresh retry.
+- Graceful cancellation (`SIGINT`/`SIGTERM`) with safe requeue semantics.
+- Restart safety for stale leases and pagination checkpoints.
+
+## Throughput and Safety Controls
+
+- Configurable chunked streaming (`--download-chunk-bytes`) with bounded memory behavior.
+- Adaptive concurrency (`--download-workers`) with throttle backoff/cooldown.
+- Optional remote-count skip and page-size tuning for lower API pressure.
+
+## Operability
+
+- Structured JSON logs (`--log-format json`).
+- JSON metrics snapshot (`--metrics-json`) for wrappers/GUI integration.
+- State DB maintenance options (`--state-db-prune-completed-days`, `--state-db-vacuum`).
diff --git a/docs/benchmark_download_chunks.md b/docs/benchmark_download_chunks.md
@@ -0,0 +1,42 @@
+# Download Chunk Benchmark (Synthetic)
+
+Date: 2026-03-03
+
+Command used:
+
+```bash
+.venv/bin/python scripts/benchmark_download_chunks.py \
+  --workers 4 \
+  --size-mib 8 \
+  --iterations 2 \
+  --chunk-bytes 65536 262144 1048576
+```
+
+## Throughput vs CPU
+
+| Chunk bytes | Verify checksum | Avg Throughput (MiB/s) | Avg CPU seconds |
+|---|---|---:|---:|
+| 65536 | no | 277.90 | 0.0505 |
+| 65536 | yes | 137.00 | 0.1496 |
+| 262144 | no | 663.89 | 0.0297 |
+| 262144 | yes | 207.23 | 0.1291 |
+| 1048576 | no | 704.66 | 0.0293 |
+| 1048576 | yes | 181.65 | 0.1437 |
+
+Notes:
+- Larger chunks improve throughput significantly vs 64 KiB in this synthetic stream test.
+- Enabling checksum verification increases CPU cost and reduces throughput, as expected.
+- `262144` and `1048576` are close on CPU cost when checksum is disabled; `262144` remains a good default.
+
+## Memory boundedness verification
+
+A dedicated integration test verifies streaming memory remains bounded during large transfers:
+
+```bash
+.venv/bin/python -m pytest \
+  tests/test_download_config.py::DownloadConfigTestCase::test_download_response_streaming_memory_is_bounded -q
+```
+
+Test behavior:
+- Streams a 64 MiB response to disk using `--download-chunk-bytes=65536`.
+- Asserts peak traced memory stays below 8 MiB (well below transferred bytes), confirming bounded streaming behavior.