There is unexpected behavior when attempting to transcribe non-english text.
The following code:
import whisper_s2t
model = whisper_s2t.load_model(model_identifier="large-v3", backend='TensorRT-LLM')
files = ['nonenglish-testfile.wav']
lang_codes = None
tasks = ['transcribe']
out = model.transcribe_with_vad(files,
lang_codes=lang_codes,
tasks=tasks,
batch_size=24)
Will return English text even if the spoken language is not English. If I pass in a language code, then the model will return that language, but this is manual and will also fail for speech that has multiple languages.
Openai and huggingface implementations of whisper have built in detection and do not have this problem.
Was something lost during the model conversion to tensorRT?
There is unexpected behavior when attempting to transcribe non-english text.
The following code:
Will return English text even if the spoken language is not English. If I pass in a language code, then the model will return that language, but this is manual and will also fail for speech that has multiple languages.
Openai and huggingface implementations of whisper have built in detection and do not have this problem.
Was something lost during the model conversion to tensorRT?