Transcribe Task Translates Without Language Code

There is unexpected behavior when attempting to transcribe non-english text.

The following code:

```python
import whisper_s2t

model = whisper_s2t.load_model(model_identifier="large-v3", backend='TensorRT-LLM')
files = ['nonenglish-testfile.wav']
lang_codes = None
tasks = ['transcribe']

out = model.transcribe_with_vad(files,
                                lang_codes=lang_codes,
                                tasks=tasks,
                                batch_size=24)
```
Will return English text even if the spoken language is not English. If I pass in a language code, then the model will return that language, but this is manual and will also fail for speech that has multiple languages.

Openai and huggingface implementations of whisper have built in detection and do not have this problem. 

Was something lost during the model conversion to tensorRT?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe Task Translates Without Language Code #84

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Transcribe Task Translates Without Language Code #84

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions