Hi,
I’m looking for hands-on expert guidance on fine-tuning Whisper for Heimish / Hasidic Yiddish (Hebrew script).
Goal
High-quality transcription from seconds-long clips up to multi-hour audio
End-to-end transcription in ~1–3 minutes
ASR core must be production-ready and handoff-ready for later web deployment
(not building the website yet)
Current state
Whisper-small experimental fine-tune
Output quality inconsistent (phonetic drift, mixed words)
What I already have
Clean 16kHz WAV audio
Human-verified transcripts (I’m an experienced transcriber)
Proper UTF-8 CSV metadata
Long-audio chunking is understood (not my blocker)
What I need
Practical guidance from someone who has:
fine-tuned Whisper for dialects / low-resource languages
improved linguistic accuracy and output stability
Not looking for
setup / CUDA issues
basic Whisper explanations
model theory
If you’ve done similar work or can point me to the right person, I’d appreciate it.
Happy to continue privately.
Thanks,
Hi,
I’m looking for hands-on expert guidance on fine-tuning Whisper for Heimish / Hasidic Yiddish (Hebrew script).
Goal
High-quality transcription from seconds-long clips up to multi-hour audio
End-to-end transcription in ~1–3 minutes
ASR core must be production-ready and handoff-ready for later web deployment
(not building the website yet)
Current state
Whisper-small experimental fine-tune
Output quality inconsistent (phonetic drift, mixed words)
What I already have
Clean 16kHz WAV audio
Human-verified transcripts (I’m an experienced transcriber)
Proper UTF-8 CSV metadata
Long-audio chunking is understood (not my blocker)
What I need
Practical guidance from someone who has:
fine-tuned Whisper for dialects / low-resource languages
improved linguistic accuracy and output stability
Not looking for
setup / CUDA issues
basic Whisper explanations
model theory
If you’ve done similar work or can point me to the right person, I’d appreciate it.
Happy to continue privately.
Thanks,