Add audio resampling to fix Whisper sampling rate mismatch #205

McVyp · 2025-04-23T08:31:40Z

This addresses an issue in the Whisper feature extraction example in Preprocessing Audio Data documentation where the example code doesn't handle sampling rate mismatches.

I encountered a ValueError when using the example code with the MINDS14 dataset (8 kHz) since Whisper requires 16 kHz audio. While the documentation mentions using cast_column for resampling in another section, this approach didn't resolve the error when used with the Whisper feature extractor.

Changes made:

Updated the prepare_dataset function to explicitly handle sampling rate mismatches
Added direct resampling with librosa which successfully resolves the error
Ensured the example works end-to-end with datasets that have different sampling rates

HuggingFaceDocBuilderDev · 2025-10-06T19:57:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

McVyp and others added 2 commits April 23, 2025 17:19

Add audio resampling to fix Whisper sampling rate mismatch

5827be5

Merge branch 'main' into fix-whisper-sampling-rate-doc

e501f20

fix code formatting

283cb2c

Deep-unlearning self-requested a review October 13, 2025 08:30

Deep-unlearning approved these changes Oct 13, 2025

View reviewed changes

Deep-unlearning merged commit b5a2f28 into huggingface:main Oct 13, 2025
2 checks passed

McVyp deleted the fix-whisper-sampling-rate-doc branch October 14, 2025 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add audio resampling to fix Whisper sampling rate mismatch #205

Add audio resampling to fix Whisper sampling rate mismatch #205

Uh oh!

McVyp commented Apr 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add audio resampling to fix Whisper sampling rate mismatch #205

Add audio resampling to fix Whisper sampling rate mismatch #205

Uh oh!

Conversation

McVyp commented Apr 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants