You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 9, 2025. It is now read-only.
Would be super appreciative of any insight. I am very out of my depth with this and feel very out of place even posting on Github. A while back I blindly followed along with one of Jarod's tutorials to clone a voice. I copied all the settings and used a pretty good 10 minute piece of speaking audio and the results were amazing, as good as ElevenLabs (just slower!)
I tried to reproduce this again with a different voice except now I'm running into problems. When I do the initial transcription part of training in Whisper I keep getting this error after it reaches 100% and begins to parse: "RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous"
I've managed to get past this but I'm not sure how. Basically just by resaving my wav file and trying different sample rates. From what I've read on the wiki pages this shouldn't actually matter, so maybe I've just been brute forcing it. The audio isn't corrupt as far as I can tell and plays back fine in media players, the waveform looks healthy, etc.
When I do manage to get beyond this point all my models are terrible. Nothing like the successful model I made with a different voice the first time. Is it likely this tensor error is the reason for it, perhaps something that's still affecting the way Tortoise interacts with the wav files and transcription txt when it actually trains?