Save labels.txt in artifact for imagefolderdataset #3145

AshAnand34 · 2025-05-03T05:19:21Z

Pull Request Template

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Tip

Want more detailed macro error diagnostics? This is especially useful for debugging tensor-related tests:

RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Zmacro-backtrace" cargo run-checks

Related Issues/PRs

#2761

Changes

The issue requested adding functionality to save a labels.txt file in the artifact directory when using the ImageFolderDataset. This feature helps preserve the mapping between class indices and their corresponding label names, which is particularly valuable for:

Maintaining label order and names without requiring the original dataset structure
Reducing errors in model prediction interpretation
Simplifying the mapping of numerical predictions back to their corresponding labels

The solution implemented:

Modified the with_items method in ImageFolderDataset to automatically create a labels.txt file
Added functionality to write class names to the file in index order
Set the default artifact directory to /tmp/burn-dataset
Added proper error handling for file operations
Updated the documentation to reflect this new feature

Testing

Existing test suite for ImageFolderDataset
Error handling with directory creation failures and invalid paths.

…ifact directory for ImageFolderDataset

codecov · 2025-05-03T12:40:37Z

Codecov Report

Attention: Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.31%. Comparing base (5a437b0) to head (6fc0d47).

Files with missing lines	Patch %	Lines
crates/burn-dataset/src/vision/image_folder.rs	92.85%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3145      +/-   ##
==========================================
- Coverage   81.32%   81.31%   -0.02%     
==========================================
  Files         817      817              
  Lines      117804   117818      +14     
==========================================
- Hits        95802    95800       -2     
- Misses      22002    22018      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…atasets

laggui

Thanks for taking on this request!

I have a couple of comments

laggui · 2025-05-06T14:18:51Z

crates/burn-dataset/Cargo.toml

-serde = { workspace = true, features = ["std", "derive"] }
-serde_json = { workspace = true, features = ["std"] }


We should keep pointing to workspace dependencies

laggui · 2025-05-06T14:19:27Z

crates/burn-dataset/src/audio/speech_commands.rs

+/// Format for saving labels
+#[derive(Debug, Clone, Copy)]
+pub enum LabelFormat {
+    /// Text format with one label per line
+    Txt,
+    /// JSON format with an array of labels
+    Json,
+    /// YAML format with an array of labels
+    Yaml,
+}


While the flexibility is nice, I would stick to a single format. We can remove the enum and just save a text file.

laggui · 2025-05-06T14:24:31Z

crates/burn-dataset/src/dataset/dataframe.rs

+///
+/// This struct provides a way to access data and labels from a Polars DataFrame
+/// as if it were a Dataset of type I with labels of type L.
+pub struct LabeledDataframeDataset<I, L> {


Not sure this specialization is required 🤔 a dataframe dataset can be used by the user to get the labels from the appropriate field for each item.

Maybe I am missing something. Can you provide more details if you think this should be added?

laggui · 2025-05-06T14:25:16Z

crates/burn-dataset/src/dataset/in_memory.rs


 /// Dataset where all items are stored in ram.
 pub struct InMemDataset<I> {
    items: Vec<I>,
 }

+/// Dataset where all items and their labels are stored in ram.
+pub struct LabeledInMemDataset<I, L> {


Similar argument here. The InMemDataset is generic over the items. So if items contain a label/class, it is entirely up to the user 🤔

laggui · 2025-05-06T14:27:11Z

crates/burn-dataset/src/dataset/base.rs

+
+/// The labeled dataset trait defines a dataset that contains labeled data.
+/// It extends the basic Dataset trait with functionality to handle labels.
+pub trait LabeledDataset<I, L>: Dataset<I>


Trait might be overkill

laggui · 2025-05-06T14:29:54Z

crates/burn-dataset/src/vision/image_folder.rs

+        // Sort classes by index to ensure consistent ordering
+        let mut sorted_classes: Vec<_> = classes_map.iter().collect();
+        sorted_classes.sort_by_key(|(_, idx)| *idx);


This would also sort classes provided by the user given that public constructors also go through this path.

When parsing folder names automatically, we already sort the names. But I don't think we should sort classes provided by the user. This could lead to unexpected behavior.

laggui · 2025-05-06T14:33:30Z

crates/burn-dataset/src/vision/image_folder.rs

+        // Save labels.txt in the artifact directory
+        let artifact_dir = "/tmp/burn-dataset";
+        let labels_path = std::path::Path::new(&artifact_dir).join("labels.txt");
+
+        // Create parent directories if they don't exist
+        if let Some(parent) = labels_path.parent() {
+            std::fs::create_dir_all(parent)
+                .map_err(|e| ImageLoaderError::IOError(e.to_string()))?;
+        }
+
+        // Write labels to file
+        let mut labels_file = std::fs::File::create(&labels_path)
+            .map_err(|e| ImageLoaderError::IOError(e.to_string()))?;
+
+        // Sort classes by index to ensure consistent ordering
+        let mut sorted_classes: Vec<_> = classes_map.iter().collect();
+        sorted_classes.sort_by_key(|(_, idx)| *idx);
+
+        for (class_name, _) in sorted_classes {
+            writeln!(labels_file, "{}", class_name)
+                .map_err(|e| ImageLoaderError::IOError(e.to_string()))?;
+        }
+


I think I would move this to a separate method instead, e.g., dataset.save_classes(path) where one could pass the current artifact/experiment dir.

It is not always required, but having an opt-in method could make more sense.

github-actions · 2025-06-06T12:12:07Z

This PR has been marked as stale because it has not been updated for over a month

AshAnand34 added 3 commits May 2, 2025 21:50

using hard-coded value temporarily

3c37644

Added documentation about labels.txt, which is created inside the art…

ea79799

…ifact directory for ImageFolderDataset

minor formatting fixes

6fc0d47

Created a LabeledDataset trait to be applied to many other types of d…

88863c2

…atasets

AshAnand34 marked this pull request as ready for review May 4, 2025 00:46

Add json and yaml support to labels

82b4303

laggui requested changes May 6, 2025

View reviewed changes

github-actions bot added the stale The issue or pr has been open for too long label Jun 6, 2025

laggui closed this Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Save labels.txt in artifact for imagefolderdataset #3145

Save labels.txt in artifact for imagefolderdataset #3145

Uh oh!

AshAnand34 commented May 3, 2025

Uh oh!

codecov bot commented May 3, 2025

Uh oh!

laggui left a comment

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

laggui May 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

Uh oh!

		serde = { workspace = true, features = ["std", "derive"] }
		serde_json = { workspace = true, features = ["std"] }

Save labels.txt in artifact for imagefolderdataset #3145

Save labels.txt in artifact for imagefolderdataset #3145

Uh oh!

Conversation

AshAnand34 commented May 3, 2025

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Uh oh!

codecov bot commented May 3, 2025

Codecov Report

Uh oh!

laggui left a comment

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

laggui May 6, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

Uh oh!