Skip to content

feat: callback-based progress tracking for uploads and downloads#19

Open
assafvayner wants to merge 11 commits intomainfrom
assaf/progress
Open

feat: callback-based progress tracking for uploads and downloads#19
assafvayner wants to merge 11 commits intomainfrom
assaf/progress

Conversation

@assafvayner
Copy link
Copy Markdown
Collaborator

@assafvayner assafvayner commented Apr 6, 2026

Summary

  • Adds a ProgressHandler trait and structured ProgressEvent types (UploadEvent, DownloadEvent) that allow callers to receive real-time progress callbacks during file transfers
  • Upload events carry phase information (PreparingCheckingUploadModeUploadingCommitting) plus aggregate byte progress
  • Download events carry per-file byte progress (StartedInProgressComplete) and aggregate xet progress
  • All five transfer params structs gain an optional progress: Progress field (Option<Arc<dyn ProgressHandler>>) that defaults to None — existing callers are unaffected
  • The hfrs CLI wires up an indicatif-based handler (CliProgressHandler) that renders multi-progress bars, respects --quiet and HF_HUB_DISABLE_PROGRESS_BARS
  • Xet uploads emit progress before/after commit via GroupProgressReport
  • Removes the polling-based progress tracking design spec (superseded by the callback approach)

New files

File Purpose
huggingface_hub/src/types/progress.rs Core types: ProgressHandler, ProgressEvent, UploadEvent, DownloadEvent, FileProgress, FileStatus, UploadPhase, Progress, emit()
huggingface_hub/src/bin/hfrs/progress.rs CliProgressHandler with indicatif MultiProgress

Key design decisions

  • Nested enum (ProgressEvent::Upload(UploadEvent)) separates upload/download concerns at the type level
  • Phase on every upload event so consumers always know the current phase from any single event — no state tracking needed
  • Delta-only download progress — only files whose state changed since last event
  • Progress type alias (Option<Arc<dyn ProgressHandler>>) keeps params structs clean
  • Blocking API unaffected — progress passes through as part of the params struct via existing sync_api! macros

Test plan

  • Unit tests in types/progress.rs — event recording, phase progression, batched file complete, None handler is no-op, Send+Sync compile check (6 tests)
  • Integration tests in tests/download_test.rs — download with progress to local dir, download with progress to cache, download without progress handler (3 tests)
  • cargo +nightly fmt --check passes
  • cargo clippy --all-features -D warnings passes
  • cargo test -p huggingface-hub --lib — 67 tests pass
  • cargo build --release succeeds
  • CI runs cargo test -p huggingface-hub --all-features which picks up all new unit and integration tests automatically

Callback-based progress system for upload/download with ProgressHandler
trait, ProgressEvent enum, xet polling bridge, and indicatif CLI bars
matching the Python hf CLI style.
Design for adding TransferTask<T> with ProgressHandle to download/upload
APIs, using IntoFuture for backward-compatible .await and lazy polling of
hf-xet progress objects.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll not push

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push, not poll

Account for blocking API macro refactor (sync_api!/sync_api_stream!/
sync_api_async_stream! in macros.rs), new download methods
(download_file_stream, download_file_to_bytes), and lib.rs glob
re-export pattern. The polling design gains a sync_api_transfer! macro
variant and updated blocking API section. The callback design notes
that macros forward progress transparently via params structs.
Add a ProgressHandler trait and structured ProgressEvent types that
allow callers to receive real-time progress updates during file
transfers. The library emits events for upload phases (preparing,
checking upload mode, uploading, committing) and per-file download
byte progress. The hfrs CLI wires up an indicatif-based handler
that renders progress bars matching the Python hf CLI style.
@assafvayner assafvayner changed the title progress plan feat: callback-based progress tracking for uploads and downloads Apr 9, 2026
- Move inline `use` statements to top of file (xet.rs, download_test.rs)
- Update Project Layout in CLAUDE.md with new progress.rs module
- Add examples/progress.rs demonstrating ProgressHandler usage
- Replace unwrap() with expect()/unwrap_or_else() in CLI progress.rs
- Remove trivial doc comments that restate type names
- Wire progress through snapshot_download (was being ignored)
Doc comments (///) on public types generate cargo doc output and
should not be stripped as trivial comments. Updated CLAUDE.md to
distinguish inline comments (no restating code) from doc comments
(required on all public API surface).
Add doc comments to ProgressHandler::on_progress(), FileStatus
variants, CliProgressHandler, and progress_disabled().
@assafvayner assafvayner marked this pull request as ready for review April 9, 2026 07:27
- Fix nested lifecycle: snapshot_download no longer emits duplicate
  Start/Complete through child download_file calls. Introduced
  download_file_inner() without lifecycle events; download_concurrently
  uses it while the public download_file() wraps with Start/Complete.

- Add xet download progress: all three xet download methods now accept
  a Progress handle and spawn a polling task that emits AggregateProgress
  every 100ms during group.finish(). Cache path also emits per-file
  Complete (was missing entirely).

- Add xet upload progress: replace single-shot before/after emit with
  a polling task that reports UploadEvent::Progress{Uploading} every
  100ms during commit().await.

- Emit per-file Complete events for xet batch downloads in
  snapshot_download so the CLI files_bar increments correctly.
Resolve conflicts in repo_params (params moved to types/repo_params.rs
on main), add progress fields to the new param location.

Update download_test.rs:
- Read-only tests use prod_api() (HF_PROD_TOKEN in CI)
- Upload progress tests use hub_ci_api() (HF_CI_TOKEN in CI)
- New tests: upload_file_with_progress, create_commit_with_progress
  (multiple files), upload_with_no_progress_handler
- All write tests create temporary repos on hub-ci and clean up
@assafvayner
Copy link
Copy Markdown
Collaborator Author

Code review

Found 2 issues:

  1. download_file emits Start { total_bytes: 0 } before the HEAD request, so the CLI bytes progress bar is never created for single-file non-xet downloads. CliProgressHandler only creates the bar when total_bytes > 0 && total_files == 1, which never holds. The design spec says total_bytes should be known at Start (from the HEAD response).

pub async fn download_file(&self, params: &RepoDownloadFileParams) -> Result<PathBuf> {
progress::emit(
&params.progress,
ProgressEvent::Download(DownloadEvent::Start {
total_files: 1,
total_bytes: 0,
}),
);
let result = self.download_file_inner(params).await;
if result.is_ok() {

  1. For inline (non-xet/non-LFS) uploads, the bytes_bar is created in the Start handler but never updated. bytes_bar.set_position() is only called when phase == UploadPhase::Uploading (line 213), but inline uploads skip the Uploading phase entirely -- they go Preparing -> CheckingUploadMode -> Committing. The bar stays at position 0 for the entire upload.

if *phase == UploadPhase::Uploading
&& let Some(ref bar) = state.bytes_bar
{
bar.set_length(*total_bytes);
bar.set_position(*bytes_completed);
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant