Speech to text on Azure

Overview

This repository provides a comprehensive demonstration of multiple speech-to-text transcription approaches available on Microsoft Azure. It showcases various methods for converting spoken language to written text, from Speech Service SDK functionality to advanced AI models. The project includes:

Working examples of all major Azure speech transcription services
Interactive demos using Gradio for easy testing without coding
Jupyter notebooks with detailed code examples and explanations
Utility functions for real-world speech processing scenarios

Ideal for developers looking to implement speech-to-text functionality in their applications, data scientists working with audio data, or anyone interested in exploring Azure's speech recognition capabilities.

Features

Real-time speech to text from microphone and wav file
Fast Transcription using Azure AI Speech Service REST API
OpenAI Whisper model via Azure OpenAI
Speaker diarization (identifying different speakers in conversations)
Azure OpenAI GPT-4o-transcribe model for high-accuracy transcription
NEW WebSocket-based real-time transcription with OpenAI Realtime API
Streaming transcription for both completed audio files and live audio

Quick Start (commands for git bash on Windows)

1. Create `.env` file locally

Create a new file .env with API keys and region details for Azure speech-to-text services.

# Azure Speech Service credentials (required for basic speech recognition, diarization, and fast transcription)
SPEECH_KEY=<your_azure_speech_service_key>
SERVICE_REGION=<your_azure_region> # e.g. uksouth

# Azure OpenAI Service credentials (required for Whisper model)
AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
AZURE_OPENAI_DEPLOYMENT_ID=whisper

# Azure OpenAI GPT-4o credentials (required for GPT-4o-transcribe model)
AZURE_OPENAI_GPT4O_API_KEY=<your_azure_openai_gpt4o_api_key>
AZURE_OPENAI_GPT4O_ENDPOINT=<your_azure_openai_gpt4o_endpoint>
AZURE_OPENAI_GPT4O_DEPLOYMENT_ID=<your_azure_openai_gpt4o_deployment_id>

# Direct OpenAI API credentials (optional, for using OpenAI's services directly)
OPENAI_API_KEY=<your_openai_api_key>

IMPORTANT NOTE: The endpoint URL formats for Azure OpenAI services differ based on the service:

For Whisper model (AZURE_OPENAI_ENDPOINT):

Requires the full endpoint URL including deployment path and API version

Format: https://<resource-name>.cognitiveservices.azure.com/openai/deployments/<deployment-id>/audio/translations?api-version=2024-06-01

Example: https://myopenai.cognitiveservices.azure.com/openai/deployments/whisper/audio/translations?api-version=2024-06-01

For GPT-4o-transcribe model (AZURE_OPENAI_GPT4O_ENDPOINT):

Requires only the domain name without https:// or any paths

Format: <resource-name>.openai.azure.com

Example: myopenai.openai.azure.com

The code handles these different formats appropriately - for GPT-4o-transcribe, the domain is used to construct the proper WebSocket URL for real-time transcription.

Prerequisites on Azure

Before using this repository, you'll need to set up the following Azure resources:

Azure AI Speech Service
- Create an Azure AI Speech resource in the Azure portal
- Note the key and region for your SPEECH_KEY and SERVICE_REGION variables
- Required for: Basic speech recognition, diarization, and fast transcription
Azure OpenAI Service for Whisper
- Create an Azure OpenAI resource
- Deploy the Whisper model with your chosen deployment name
- Note the endpoint and API key for your AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY variables
- Set AZURE_OPENAI_DEPLOYMENT_ID to your deployment name (e.g., "whisper")
- Required for: OpenAI Whisper transcription
Azure OpenAI Service for GPT-4o-transcribe
- Create an Azure OpenAI resource (can be the same as for Whisper)
- Deploy the GPT-4o model and enable transcription capability
- Note the endpoint URI including the deployment path for AZURE_OPENAI_GPT4O_ENDPOINT
- Note the API key for AZURE_OPENAI_GPT4O_API_KEY
- Set AZURE_OPENAI_GPT4O_DEPLOYMENT_ID to your deployment name
- Required for: GPT-4o-transcribe and real-time WebSocket transcription
OpenAI API (Optional)
- Only needed if you want to use OpenAI services directly instead of through Azure
- Get an API key from OpenAI and set it as OPENAI_API_KEY

Each component of this repository can be used independently based on which Azure resources you have available.

2. Install

python -m venv .venv

source .venv/Scripts/activate

pip install -r requirements.txt -r requirements-dev.txt

3. Notebook

Review azure-speech-to-text.ipynb.

Or skip directly to next part.

4. Run Demo

python gradio/app.py

5. Play with the demo on Browser

http://127.0.0.1:7860

6. Debug Mode

To run the application in debug mode, set the DEBUG environment variable to True:

export DEBUG=True

Contributing (Committing changes)

Install pre-commit for basic checks and fixes before commit.

pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
diagrams		diagrams
gradio		gradio
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mp3_to_wav_utility.py		mp3_to_wav_utility.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech to text on Azure

Overview

Features

Quick Start (commands for git bash on Windows)

1. Create `.env` file locally

Prerequisites on Azure

2. Install

3. Notebook

4. Run Demo

5. Play with the demo on Browser

6. Debug Mode

Contributing (Committing changes)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech to text on Azure

Overview

Features

Quick Start (commands for git bash on Windows)

1. Create .env file locally

Prerequisites on Azure

2. Install

3. Notebook

4. Run Demo

5. Play with the demo on Browser

6. Debug Mode

Contributing (Committing changes)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Create `.env` file locally

Packages