Replies: 2 comments
-
|
Hi @Giak1234. Currently I'm traveling on vacation, but I'll try to give you some tips.
AFAIK this won't work. Try to setup and run the transcription on a machine with Internet access, the model will be downloaded in the first run, then copy all the content of C:\Users\your_user\.huggingface folder to the machine without Internet access and see if that works.
To transcribe audios from videos, add video; to mimesToProcess option in the same config file. I also suggest you to try to run transcription on the CPU initially. The iped python plugins package you downloaded works just for CPU, as stated by its name. After it works, then try to run on the GPU. For that, you will need to install the required python libraries with GPU support, please check our manual in the wiki for further details. |
Beta Was this translation helpful? Give feedback.
-
|
I've tried everything (I think)... installed Python libraries of all kinds required during testing... CUDA/PyTorch/Fast-Whisper/CTranslate2/Hugging Face/... but nothing! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Good evening,
I’m looking for a solution to make audio/video file transcription (in various formats) work with Whisper.
Is anyone able to share the correct setup for Windows 11 (with a GPU featuring 16GB dedicated VRAM and 128GB RAM) to use a model larger than “medium”, for example “large-v2”?
Here is what I have already done:
I’m working offline, so as indicated in the file found in iped-4-2-2/conf/AudioTranscriptTaskConfig.txt, I downloaded the model from the faster-whisper-large-v2 repository on Hugging Face—already converted for use with CTranslate2 (which faster-whisper uses). Hugging Face → https://huggingface.co/guillaumekln/faster-whisper-large-v2
I placed the downloaded files in the folders I created (fastwhisper and large-v2) under the path:
iped-4-2-2/models/fastwhisper/large-v2
(containing: config.json, model.bin, tokenizer.json, vocabulary.txt)
I replaced the Python folder (iped-4-2-2/python) with the one included in IPED-4.2.x_plugins_Whisper_FaceRecognition_Win64_CPU.zip
I configured the parameters in AudioTranscriptTaskConfig.txt as follows:
whisperModel = large-v2
device = gpu
precision = int8
batchSize = 1
I installed CUDA Toolkit > 12 with all required dependencies and added the environment variables
When I run the command
iped.exe -d image.dd -o output
I get the following error:
“Processing Error: Error loading 'large-v2' transcription model.”
Any suggestions?
Beta Was this translation helpful? Give feedback.
All reactions