Do not encode supplied `initial_prompt` if it is not a string #1299

jzyeezy · 2025-05-15T17:50:36Z

The BatchedInferencePipeline#transcribe method technically supports passing an already encoded tokenized list as the initial_prompt. The code path should only try to encode the provided prompt if it's a string.

This fixes the bug where faster_whisper will fail due to: TypeError: TextInputSequence must be str when passing a List of integers as the initial_prompt.

MahmoudAshraf97

Thanks for your contribution, You still need to handle the case where options.initial_prompt is None

MahmoudAshraf97 · 2025-05-19T09:09:01Z

Also, even if the typing hints support, Why would you use a tokenized string instead of a string? note that different whisper models use different tokenizers so the tokenized string is not generalizable
I prefer if we change the typing hint to just accept str instead

jzyeezy · 2025-05-19T14:08:20Z

Why would you use a tokenized string instead of a string?

@MahmoudAshraf97 gotcha. I only discovered this bug while experimenting with the parameter to observe if there's any upside in providing the prompt in this format. I'm happy to update the signature of this parameter to accept only a string.

This reverts commit 3c425b6.

…lues for `initial_prompt`

Do not encode supplied initial_prompt if it is not a string

3c425b6

MahmoudAshraf97 requested changes May 19, 2025

View reviewed changes

jzyeezy and others added 3 commits May 19, 2025 09:09

Revert "Do not encode supplied initial_prompt if it is not a string"

cbb9d5a

This reverts commit 3c425b6.

Update BatchedInferencePipeline.transcribe to only accept string va…

193b1c9

…lues for `initial_prompt`

Update transcribe.py

3847a51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not encode supplied `initial_prompt` if it is not a string #1299

Do not encode supplied `initial_prompt` if it is not a string #1299

Uh oh!

jzyeezy commented May 15, 2025

Uh oh!

MahmoudAshraf97 left a comment

Uh oh!

MahmoudAshraf97 commented May 19, 2025

Uh oh!

jzyeezy commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Do not encode supplied initial_prompt if it is not a string #1299

Are you sure you want to change the base?

Do not encode supplied initial_prompt if it is not a string #1299

Uh oh!

Conversation

jzyeezy commented May 15, 2025

Uh oh!

MahmoudAshraf97 left a comment

Choose a reason for hiding this comment

Uh oh!

MahmoudAshraf97 commented May 19, 2025

Uh oh!

jzyeezy commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Do not encode supplied `initial_prompt` if it is not a string #1299

Do not encode supplied `initial_prompt` if it is not a string #1299