Skip to content

feat: add progress bar. ref #44.#89

Merged
tompollard merged 2 commits intomainfrom
tp/progress_bar
Apr 19, 2026
Merged

feat: add progress bar. ref #44.#89
tompollard merged 2 commits intomainfrom
tp/progress_bar

Conversation

@tompollard
Copy link
Copy Markdown
Member

As highlighted in #44, it would be helpful to provide the user with feedback on progress during creation of Croissant metadata, particularly for large datasets.

This pull request:

  • Moves creator parsing and CSV warning logic before the progress context
  • Adds per-file progress bar during metadata generation
  • Keeps the spinner for the save/validate phase.

Testing:

  • Run croissant-baker --input <dataset_dir> on a dataset with multiple files and confirm per-file progress bar is displayed
  • Verify the progress bar shows file count, percentage, and current file name
  • Verify save/validate spinner still works as before

@tompollard tompollard requested a review from rafiattrach April 16, 2026 21:28
Copy link
Copy Markdown
Collaborator

@rafiattrach rafiattrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks for taking care of it @tompollard ! I tested this locally on both the bundled MIMIC-IV Demo (32 files) and full MIMIC-IV (~10 GB, 31 compressed CSVs). Progress bar works well, no performance regression, and output is identical.

One thing I noticed: the progress bar total includes discovered files that have no handler (e.g. index.html, README.txt). On the demo dataset it shows 38/38 but the summary says "Files: 32". Since we don't know which files have handlers until we try each one, the simplest fix would be changing the label from "Processing files..." to "Scanning files..." — that way the count covering all discovered files feels natural.

@tompollard
Copy link
Copy Markdown
Member Author

Thanks Rafi!

Since we don't know which files have handlers until we try each one, the simplest fix would be changing the label from "Processing files..." to "Scanning files..."

Good catch, I updated the text as suggested.

@tompollard tompollard merged commit 8c77c11 into main Apr 19, 2026
3 checks passed
@tompollard tompollard deleted the tp/progress_bar branch April 19, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants