This repository provides a comprehensive demonstration of multiple speech-to-text transcription approaches available on Microsoft Azure. It showcases various methods for converting spoken language to written text, from Speech Service SDK functionality to advanced AI models. The project includes:
- Working examples of all major Azure speech transcription services
- Interactive demos using Gradio for easy testing without coding
- Jupyter notebooks with detailed code examples and explanations
- Utility functions for real-world speech processing scenarios
Ideal for developers looking to implement speech-to-text functionality in their applications, data scientists working with audio data, or anyone interested in exploring Azure's speech recognition capabilities.
- Real-time speech to text from microphone and wav file
- Fast Transcription using Azure AI Speech Service REST API
- OpenAI Whisper model via Azure OpenAI
- Speaker diarization (identifying different speakers in conversations)
- Azure OpenAI GPT-4o-transcribe model for high-accuracy transcription
- NEW WebSocket-based real-time transcription with OpenAI Realtime API
- Streaming transcription for both completed audio files and live audio
Create a new file .env with API keys and region details for Azure speech-to-text services.
# Azure Speech Service credentials (required for basic speech recognition, diarization, and fast transcription)
SPEECH_KEY=<your_azure_speech_service_key>
SERVICE_REGION=<your_azure_region> # e.g. uksouth
# Azure OpenAI Service credentials (required for Whisper model)
AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
AZURE_OPENAI_DEPLOYMENT_ID=whisper
# Azure OpenAI GPT-4o credentials (required for GPT-4o-transcribe model)
AZURE_OPENAI_GPT4O_API_KEY=<your_azure_openai_gpt4o_api_key>
AZURE_OPENAI_GPT4O_ENDPOINT=<your_azure_openai_gpt4o_endpoint>
AZURE_OPENAI_GPT4O_DEPLOYMENT_ID=<your_azure_openai_gpt4o_deployment_id>
# Direct OpenAI API credentials (optional, for using OpenAI's services directly)
OPENAI_API_KEY=<your_openai_api_key>IMPORTANT NOTE: The endpoint URL formats for Azure OpenAI services differ based on the service:
For Whisper model (AZURE_OPENAI_ENDPOINT):
- Requires the full endpoint URL including deployment path and API version
- Format:
https://<resource-name>.cognitiveservices.azure.com/openai/deployments/<deployment-id>/audio/translations?api-version=2024-06-01- Example:
https://myopenai.cognitiveservices.azure.com/openai/deployments/whisper/audio/translations?api-version=2024-06-01For GPT-4o-transcribe model (AZURE_OPENAI_GPT4O_ENDPOINT):
- Requires only the domain name without https:// or any paths
- Format:
<resource-name>.openai.azure.com- Example:
myopenai.openai.azure.comThe code handles these different formats appropriately - for GPT-4o-transcribe, the domain is used to construct the proper WebSocket URL for real-time transcription.
Before using this repository, you'll need to set up the following Azure resources:
-
Azure AI Speech Service
- Create an Azure AI Speech resource in the Azure portal
- Note the key and region for your
SPEECH_KEYandSERVICE_REGIONvariables - Required for: Basic speech recognition, diarization, and fast transcription
-
Azure OpenAI Service for Whisper
- Create an Azure OpenAI resource
- Deploy the Whisper model with your chosen deployment name
- Note the endpoint and API key for your
AZURE_OPENAI_ENDPOINTandAZURE_OPENAI_API_KEYvariables - Set
AZURE_OPENAI_DEPLOYMENT_IDto your deployment name (e.g., "whisper") - Required for: OpenAI Whisper transcription
-
Azure OpenAI Service for GPT-4o-transcribe
- Create an Azure OpenAI resource (can be the same as for Whisper)
- Deploy the GPT-4o model and enable transcription capability
- Note the endpoint URI including the deployment path for
AZURE_OPENAI_GPT4O_ENDPOINT - Note the API key for
AZURE_OPENAI_GPT4O_API_KEY - Set
AZURE_OPENAI_GPT4O_DEPLOYMENT_IDto your deployment name - Required for: GPT-4o-transcribe and real-time WebSocket transcription
-
OpenAI API (Optional)
- Only needed if you want to use OpenAI services directly instead of through Azure
- Get an API key from OpenAI and set it as
OPENAI_API_KEY
Each component of this repository can be used independently based on which Azure resources you have available.
python -m venv .venv
source .venv/Scripts/activate
pip install -r requirements.txt -r requirements-dev.txt
Review azure-speech-to-text.ipynb.
Or skip directly to next part.
python gradio/app.py
To run the application in debug mode, set the DEBUG environment variable to True:
export DEBUG=True
Install pre-commit for basic checks and fixes before commit.
pre-commit install
