Conversation
d5ff5e0 to
6025575
Compare
rafiattrach
left a comment
There was a problem hiding this comment.
Nice work! Thanks for taking care of it @tompollard ! I tested this locally on both the bundled MIMIC-IV Demo (32 files) and full MIMIC-IV (~10 GB, 31 compressed CSVs). Progress bar works well, no performance regression, and output is identical.
One thing I noticed: the progress bar total includes discovered files that have no handler (e.g. index.html, README.txt). On the demo dataset it shows 38/38 but the summary says "Files: 32". Since we don't know which files have handlers until we try each one, the simplest fix would be changing the label from "Processing files..." to "Scanning files..." — that way the count covering all discovered files feels natural.
|
Thanks Rafi!
Good catch, I updated the text as suggested. |
As highlighted in #44, it would be helpful to provide the user with feedback on progress during creation of Croissant metadata, particularly for large datasets.
This pull request:
Testing:
croissant-baker --input <dataset_dir>on a dataset with multiple files and confirm per-file progress bar is displayed