Skip to content

Latest commit

 

History

History
180 lines (150 loc) · 8.55 KB

File metadata and controls

180 lines (150 loc) · 8.55 KB

iCloud Photos Downloader Improvement Checklist

Last updated: 2026-03-03

Use this as the source of truth for implementation progress. Every code change should update this file.

0. Project hygiene and tracking

  • Add/update architecture note describing current pipeline and target pipeline.
  • Keep this checklist aligned with actual implemented code and tests.
  • For each completed task, reference the related PR/commit in this file.
  • Keep changelog/release notes in sync when user-facing flags/behavior change.
  • Ensure local development/testing uses Python 3.13 in .venv to match project constraints.

1. Unified retry and backoff (metadata + downloads)

1.1 Policy and configuration

  • Define one retry policy module shared by metadata calls and file downloads.
  • Add CLI option: --max-retries (default target: 6).
  • Add CLI option: --backoff-base-seconds.
  • Add CLI option: --backoff-max-seconds.
  • Add CLI option: --respect-retry-after/--no-respect-retry-after.
  • Add CLI option: --throttle-cooldown-seconds.
  • Ensure defaults preserve safe behavior for existing users.

1.2 Error classification

  • Classify fatal auth/config errors as no-retry (invalid creds, MFA unavailable, ADP/web-disabled).
  • Classify session-invalid errors as re-auth-then-retry.
  • Classify transient errors as retryable (429, 503, timeouts, connection resets, throttling-like denials).
  • Centralize retry decision logging (attempt, reason, next delay).

1.3 Integration points

  • Apply shared retry policy to album/asset enumeration calls.
  • Apply shared retry policy to download calls.
  • Remove/replace duplicated ad-hoc retry loops in existing code paths.
  • Add jitter to exponential backoff.
  • Honor Retry-After when present on retryable responses.

1.4 Verification

  • Unit tests for retry classifier.
  • Unit tests for backoff math and jitter bounds.
  • Unit tests for Retry-After handling.
  • Integration tests: metadata retry behavior under simulated 429/503.
  • Integration tests: download retry behavior under simulated 429/503/reset.

2. Persistent state DB and resumable task queue

2.1 Data model

  • Add --state-db option (or equivalent path option) with sensible default.
  • Create DB initialization/migration path.
  • Create assets table.
  • Create tasks table with status/attempt/error fields.
  • Create checkpoints table for pagination progress.
  • Add indexes for task leasing and status filtering.

2.2 Enumeration persistence

  • Persist enumerated assets in batches.
  • Persist tasks per asset version.
  • Save checkpoint every page (or configurable page interval).
  • Resume enumeration from checkpoint after restart.

2.3 Worker/task lifecycle

  • Add task states: pending, in_progress, done, failed.
  • Add lease timestamp/owner for in_progress.
  • Requeue stale leased tasks on startup.
  • Track per-task attempts and last error.

2.4 Verification

  • Unit tests for DB schema creation and migrations.
  • Unit tests for lease/requeue behavior.
  • Integration test: crash mid-run and resume without redoing completed tasks.
  • Integration test: checkpoint resume after partial enumeration.

2.5 URL freshness

  • Detect expired/invalid persisted download URLs and refresh asset version metadata.
  • Add task/state marker for URL refresh path (e.g., needs_url_refresh) and retry flow.

3. Bounded adaptive concurrency

3.1 CLI and defaults

  • Add --download-workers option (default target: 4).
  • Keep metadata enumeration single-threaded by default.
  • Document deprecation relationship with --threads-num.

3.2 Limiting and adaptation

  • Implement shared account-level limiter for download workers.
  • Separate metadata and download request budgets (if needed by code design).
  • Implement AIMD or equivalent adaptive reduction on throttling events.
  • Add global cool-down behavior when repeated throttle signals occur.

3.3 Session/cookie safety

  • Audit all session/cookie writes under concurrent access.
  • Add locking or redesign to avoid concurrent write races.
  • Ensure no cookie/session corruption under multithreaded runs.

3.4 Verification

  • Unit tests for limiter/token bucket behavior.
  • Concurrency tests for session persistence safety.
  • Integration tests for worker pool drain/stop/restart behavior.
  • Benchmark runs at workers = 1, 2, 4, 8 and record throughput + error rate.

4. Download efficiency and integrity

4.1 Throughput improvements

  • Add --download-chunk-bytes option (default target: 262144).
  • Replace fixed 1 KiB streaming chunk with configurable larger chunk.
  • Verify memory usage remains bounded by worker count and chunk size.
  • Benchmark chunk-size/verification combinations for throughput vs CPU tradeoff.

4.2 Integrity checks

  • Add --verify-size/--no-verify-size option.
  • Add --verify-checksum/--no-verify-checksum option.
  • Validate downloaded file size against expected metadata.
  • Implement optional checksum validation strategy.
  • Store local checksum/result in state DB when enabled.

4.3 Range resume hardening

  • Keep .part resume behavior with Range requests.
  • Detect non-206 response when resuming and safely restart partial file.
  • Add corruption-safe handling for mismatched range behavior.

4.4 Verification

  • Unit tests for chunk-size configuration and defaults.
  • Unit tests for size verification success/failure.
  • Unit tests for checksum verification success/failure.
  • Integration tests for resume with partial files and range edge cases.

5. Request volume and enumeration efficiency

  • Add --album-page-size option (target range: 50-500).
  • Add --no-remote-count option to skip expensive album count calls.
  • Reduce redundant metadata queries where possible.
  • Add/align chunked date-based run options (since/until added date behavior).
  • Document clear behavior differences between added-date and created-date usage.
  • Add tests for new pagination and remote-count toggles.

6. Observability and operations

6.1 Logging

  • Add structured JSON log mode.
  • Include stable fields (run_id, asset_id, attempt, http_status, etc.).
  • Ensure sensitive data redaction remains enforced.

6.2 Metrics and health

  • Add metrics endpoint or export path (if compatible with current stack).
  • Track throughput, retries, throttle events, queue depth, success gap.
  • Add low-disk-space warning/error classification.
  • Provide JSON stats snapshot output suitable for GUI wrappers (--metrics-json).

6.3 Alerts and notifications

  • Add alert condition for repeated throttling.
  • Keep MFA expiry notification path working with new engine.
  • Add docs for recommended operational thresholds.

7. Documentation and migration

  • Update CLI reference docs for all new options.
  • Add migration guide: stateless mode vs stateful mode.
  • Document compatibility and unchanged default behavior.
  • Document concurrency limitations and safe defaults.
  • Add troubleshooting guide for throttling/session issues.

9. Runtime Semantics and Operability Hardening

9.1 Mode contract

  • Define explicit legacy/stateless mode contract (no DB required, filesystem skip semantics).
  • Define explicit stateful engine mode contract (resume guarantees, task-state semantics).
  • Add integration tests asserting mode-specific behavior and parity expectations.

9.2 Exit and summary semantics

  • Define process exit code contract (success, partial success, fatal auth/config, cancelled, stalled).
  • Emit machine-readable end-of-run summary with totals/failures/error location hints.

9.3 Cancellation and shutdown

  • Handle SIGINT/SIGTERM with graceful stop (drain or safe requeue of in-flight work).
  • Ensure clean shutdown is distinguishable from crash and restart behavior is deterministic.

9.4 State DB growth and retention

  • Add DB retention/pruning policy (completed task cleanup / capped error history).
  • Document and/or automate WAL checkpointing and vacuum guidance.

8. Final validation before release

  • Full test suite passes.
  • New tests added for each new subsystem.
  • Lint/type checks pass.
  • Manual end-to-end dry run on small sample library.
  • Manual end-to-end run with injected transient failures.
  • Confirm no regressions in naming/dedup/folder behavior.
  • Confirm watch mode behavior is unchanged unless explicitly modified.