Merge pull request #228 from huggingface/update-dataset-duration

Deep-unlearning · web-flow · commit 57b779cbc21c · 2025-11-26T12:05:13.000+01:00
change the way to get audio duration
diff --git a/chapters/en/chapter1/preprocessing.mdx b/chapters/en/chapter1/preprocessing.mdx
@@ -95,7 +95,9 @@ dataset. However, we can create one, filter based on the values in that column,
 
 ```py
 # use librosa to get example's duration from the audio file
-new_column = [librosa.get_duration(path=x) for x in minds["path"]]
+new_column = [
+    librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]
+]
 minds = minds.add_column("duration", new_column)
 
 # use 🤗 Datasets' `filter` method to apply the filtering function