Skip to content

Aligning / Biasing Speech-to-Text models to a predefined set of words or phrases.

License

Notifications You must be signed in to change notification settings

mozilla-ai/speech-to-text-alignment

Repository files navigation

Project logo

Blueprint title

This blueprint guides you to ...

Quick-start

Create a virtual environment and install the dependencies:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install 'whisper-bidec @ https://github.com/OHF-Voice/whisper-bidec/archive/refs/tags/v0.0.1.tar.gz'

Download an example WAV file:

wget "https://github.com/OHF-Voice/whisper-bidec/raw/refs/heads/main/tests/wav/what's%20the%20temperature%20of%20the%20EcoBee.wav"

Test transcribing the WAV file without any bias:

python3 -m whisper_bidec "what's the temperature of the EcoBee.wav"

This outputs CSV with the format wav file|text without bias|text with bias like:

what's the temperature of the EcoBee.wav|What's the temperature of the incubi?|What's the temperature of the incubi?

Without bias, the WAV file is incorrectly transcribed as "What's the temperature of the incubi?"

Let's add a few example sentences that will bias Whisper towards the "EcoBee" device:

cat > example_sentences.txt <<EOF
What's the temperature of the EcoBee?
What is the temperature of the EcoBee?
EOF

Now we can see the corrected transcript:

python3 -m whisper_bidec --text example_sentences.txt "what's the temperature of the EcoBee.wav"
what's the temperature of the EcoBee.wav|What's the temperature of the incubi?|What's the temperature of the EcoBee?

The bias can be adjusted with --bias-towards-lm <BIAS> which defaults to 0.5. Increasing this value will bias Whisper more towards the example sentences.

How it Works

Pre-requisites

  • System requirements:

    • OS: Windows, macOS, or Linux
    • Python 3.10 or higher
    • Minimum RAM:
    • Disk space:
  • Dependencies:

    • Dependencies listed in pyproject.toml

Troubleshooting

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Contributing

Contributions are welcome! To get started, you can check out the CONTRIBUTING.md file.

About

Aligning / Biasing Speech-to-Text models to a predefined set of words or phrases.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published