Whisper BiDec: a Blueprint by Mozilla.ai for aligning text transcriptions in Speech-to-Text applications

Whisper BiDec enables the user to "re-adjust" OpenAI's Whisper models to user predefined texts, leading to improved transcription accuracy for specific terms, names, or phrases. This is particularly useful in domains with specialized vocabulary or when dealing with uncommon names or smaller Whisper models.

Example Results

dileesh.mp4

Whisper tiny before and after biasing with the text: "Dileesh Pothan":

Without Bias	With Bias
The rich potent as an Indian film director from Kerala who works in the Malayalam film industry.	Dileesh Pothan is an Indian film director from Kerala who works in the Malayalam film industry.

Quick-start

Get started now with one of the following options:

Try it on our Hugging Face Space

Try it locally

You can also install and use BiDec locally, either via the command line or through a graphical interface app.

First, install the necessary dependencies:

git clone [email protected]:mozilla-ai/speech-to-text-alignment.git
cd speech-to-text-alignment
pip instal -r requirements.txt

Graphical Interface App

Start the graphical interface app by running:

python demo/app.py

Command Line Interface

Run python -m whisper_bidec --text <text_file> <wav_file> [<wav_file>...] to transcribe WAV files and get CSV output like:

path_to_1.wav|text 1 without bias|text 1 with bias
path_to_2.wav|text 2 without bias|text 2 with bias
...

The text file should contain a list of sentences that you want to bias Whisper towards. These need to have the correct casing and punctuation. You can add multiple --text files.

Increase --bias-towards-lm to get transcripts more like the example sentences (default: 0.5).

Increase --unk-logprob to allow more words outside the example sentences (default: -5, must be less than 0) or decrease it to restrict words to example sentences (e.g., -10).

Usage Example

Test transcribing the WAV file without any bias:

python3 -m whisper_bidec example_data/ecobee.wav

This outputs CSV with the format wav file|text without bias|text with bias like:

what's the temperature of the EcoBee.wav|What's the temperature of the incubi?|What's the temperature of the incubi?

Without bias, the WAV file is incorrectly transcribed as "What's the temperature of the incubi?"

Let's add a few example sentences that will bias Whisper towards the "EcoBee" device:

cat > example_sentences.txt <<EOF
What's the temperature of the EcoBee?
What is the temperature of the EcoBee?
EOF

Now we can see the corrected transcript:

python3 -m whisper_bidec --text ecobee_example.txt example_data/ecobee.wav
what's the temperature of the EcoBee.wav|What's the temperature of the incubi?|What's the temperature of the EcoBee?

The bias can be adjusted with --bias-towards-lm <BIAS> which defaults to 0.5. Increasing this value will bias Whisper more towards the example sentences.

Troubleshooting

If you run into issues, check our Troubleshooting section before opening a new issue.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Contributing

Contributions are welcome! To get started, you can check out the CONTRIBUTING.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
demo		demo
example_data		example_data
images		images
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper BiDec: a Blueprint by Mozilla.ai for aligning text transcriptions in Speech-to-Text applications

Example Results

Quick-start

Try it on our Hugging Face Space

Try it locally

Graphical Interface App

Command Line Interface

Usage Example

Troubleshooting

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

mozilla-ai/speech-to-text-alignment

Folders and files

Latest commit

History

Repository files navigation

Whisper BiDec: a Blueprint by Mozilla.ai for aligning text transcriptions in Speech-to-Text applications

Example Results

Quick-start

Try it on our Hugging Face Space

Try it locally

Graphical Interface App

Command Line Interface

Usage Example

Troubleshooting

License

Contributing

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages