|
| 1 | +# Video Translation Agent |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +The Video Translation Agent is an AI-powered agent that enables users to translate videos between languages with customizable speech and subtitle options. It leverages Azure AI Foundry, Azure AI Cognitive Services, and Azure Blob Storage to provide high-quality video translations. |
| 6 | + |
| 7 | +## Demo |
| 8 | + |
| 9 | +<div align="center"> |
| 10 | + <video width="800" controls> |
| 11 | + <source src="./assets/demo.mp4" type="video/mp4"> |
| 12 | + Your browser does not support the video tag. |
| 13 | + </video> |
| 14 | +</div> |
| 15 | + |
| 16 | +You can find the demo video at `./assets/demo.mp4` in the repository. |
| 17 | + |
| 18 | +## Use Cases |
| 19 | + |
| 20 | +1. **Corporate Communications** 🏢 - Translate training videos, executive messages, and internal communications to reach global employees in their native languages. |
| 21 | + |
| 22 | +2. **Educational Content** 🎓 - Enable educators to translate course materials and lectures for international students, expanding the reach of educational content. |
| 23 | + |
| 24 | +3. **Marketing and Sales** 📊 - Localize marketing videos and sales presentations for different regions and markets, increasing engagement with international audiences. |
| 25 | + |
| 26 | +## Architecture Overview |
| 27 | + |
| 28 | +The Video Translation Agent uses: |
| 29 | +- **Azure AI Agents** - For conversation management and orchestration |
| 30 | +- **Semantic Kernel** - For creating the AI Agent and managing plugins |
| 31 | +- **Azure AI's Cognitive Services Video Translation API** - For the core video translation functionality |
| 32 | +- **Azure Blob Storage** - For secure file storage and retrieval |
| 33 | + |
| 34 | +<div align="center"> |
| 35 | + <img src="./assets/architecture-diagram.png" alt="Architecture Diagram" width="800"/> |
| 36 | +</div> |
| 37 | + |
| 38 | +## Setup Instructions |
| 39 | + |
| 40 | +### Prerequisites |
| 41 | + |
| 42 | +- [Python > 3.11.10](https://www.python.org/downloads/) |
| 43 | +- [Git](https://git-scm.com/downloads) |
| 44 | +- Azure resources: |
| 45 | + - **Azure subscription**: [Create a free account](https://azure.microsoft.com/free/) if you don't already have one |
| 46 | + - **Azure AI Speech Services**: |
| 47 | + - Create a [Speech resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices) in a [supported region for video translation](https://learn.microsoft.com/azure/ai-services/speech-service/regions#speech-service) |
| 48 | + - **Azure Blob Storage**: |
| 49 | + - [Create a storage account](https://learn.microsoft.com/azure/storage/common/storage-account-create) with a container configured for video uploads |
| 50 | + - To provide video access either |
| 51 | + - Generate a [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to provide as a video URL directly to the agent |
| 52 | + - Tell the agent where your video is located locally (via file path) and the agent will handle uploading the video to the storage account and generating the url |
| 53 | + - **Azure CLI**: [Install](https://learn.microsoft.com/cli/azure/install-azure-cli) and authenticate using `az login` |
| 54 | + |
| 55 | +### Setting up Azure AI Agent Service |
| 56 | + |
| 57 | +Follow the [official Azure AI Agent Service documentation](https://learn.microsoft.com/azure/ai-services/agents/quickstart?pivots=ai-foundry-portal) quickstart. This guides you through creating a hub and project in the Azure AI Foundry Portal and deploying your model. Once completed, you'll have the necessary credentials to configure your environment variables in the next steps. |
| 58 | + |
| 59 | +### Quickstart |
| 60 | + |
| 61 | +1. Clone the repository and navigate to the `video-translation-agent` folder using the "cd" command in the terminal. |
| 62 | + |
| 63 | +2. Create and activate a virtual environment: |
| 64 | + |
| 65 | +```bash |
| 66 | +python -m venv .venv |
| 67 | +source .venv/bin/activate # On Windows, use: .venv\Scripts\activate |
| 68 | +``` |
| 69 | + |
| 70 | +3. Install the required dependencies: |
| 71 | + |
| 72 | +```bash |
| 73 | +pip install -r requirements.txt |
| 74 | +``` |
| 75 | + |
| 76 | +4. Create a `.env` file based on the `.env.sample` provided: |
| 77 | + |
| 78 | +```bash |
| 79 | +cp .env.sample .env |
| 80 | +``` |
| 81 | + |
| 82 | +5. Update the `.env` file. |
| 83 | + |
| 84 | +### Running the Video Translation Agent |
| 85 | + |
| 86 | +Start the agent with: |
| 87 | + |
| 88 | +```bash |
| 89 | +python template.py |
| 90 | +``` |
| 91 | + |
| 92 | +The agent will guide you through the process of translating videos, providing helpful prompts for required information. To exit the agent at any time, simply type `exit` or `quit` in the console. |
| 93 | + |
| 94 | +## Sample Data |
| 95 | + |
| 96 | +### Video Requirements |
| 97 | + |
| 98 | +- .mp4 format video file |
| 99 | +- Less than 5 GB in size |
| 100 | +- Shorter than 4 hours in duration |
| 101 | +- Verify your [source and target languages are supported](https://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=video-translation) |
| 102 | + |
| 103 | +### Sample Files for Testing |
| 104 | + |
| 105 | +You can provide the agent with this video URL directly [es-ES-TryOutOriginal.mp4](https://speechstudioprodpublicsa.blob.core.windows.net/ttsvoice/VideoTranslation/PublicDoc/SampleData/es-ES-TryOutOriginal.mp4). |
| 106 | + |
| 107 | +The repository also includes sample files in the `sample_data/` directory to help you test the agent: |
| 108 | + |
| 109 | +- **Sample Video in English**: `sample_data\en-US-TryoutOriginalTTSIntro.mp4` |
| 110 | +- **Sample Video in Spanish**: `sample_data\es-ES-TryOutOriginalElSalvadorMicrosoftIntro.mp4` |
| 111 | + |
| 112 | +## Example Agent Interaction |
| 113 | + |
| 114 | +### Video Translation |
| 115 | + |
| 116 | +Translate videos from one language to another with the following options: |
| 117 | +- Source and target languages |
| 118 | +- Voice type (Platform Voice or Personal Voice) |
| 119 | +- Speaker count customization |
| 120 | +- Subtitle character limit adjustment |
| 121 | +- Embedded subtitle options |
| 122 | + |
| 123 | +### File Management |
| 124 | + |
| 125 | +The agent can work with: |
| 126 | +- Local video files (automatically uploads to Azure Blob Storage) |
| 127 | +- Remote video URLs (direct processing) |
| 128 | +- WebVTT subtitle files for custom iterations |
| 129 | + |
| 130 | +### Translation Management |
| 131 | + |
| 132 | +Manage your translations with commands to: |
| 133 | +- List all translations |
| 134 | +- Get details about specific translations |
| 135 | +- Create new iterations with customized subtitles |
| 136 | +- Delete translations |
| 137 | + |
| 138 | +### Translating a Video |
| 139 | + |
| 140 | +``` |
| 141 | +You: I'd like to translate a video from English to Japanese |
| 142 | +Agent: I can help you translate your video from English to Japanese. First, I need to know if you have a local video file on your computer or a URL to a video stored online. |
| 143 | +
|
| 144 | +You: I have a local file at C:\Videos\my_presentation.mp4 |
| 145 | +Agent: Great! I'll need to upload that file to Azure Blob Storage first so it can be processed. |
| 146 | +[Agent uploads file] |
| 147 | +Video uploaded successfully! |
| 148 | +Access URL: https://yourstorageaccount.blob.core.windows.net/videos/my_presentation.mp4 |
| 149 | +Secure access URL (valid for 24 hours): https://yourstorageaccount.blob.core.windows.net/videos/my_presentation.mp4?sv=... |
| 150 | +
|
| 151 | +Now I need a few more details: |
| 152 | +1. Would you like to use "PlatformVoice" or "PersonalVoice" for the translation? PlatformVoice is the standard option. |
| 153 | +[...] |
| 154 | +``` |
| 155 | + |
| 156 | + |
| 157 | +``` |
| 158 | +You: I need to translate our new training video from English to Spanish, French, and German for our international facilities. The video is hosted on our secure server. |
| 159 | +
|
| 160 | +Agent: I'd be happy to help translate your training video. Do you have a URL for the video, or is it stored locally on your computer? |
| 161 | +
|
| 162 | +You: It's available at https://yourstorageaccount.blob.core.windows.net/videos/my_training_video.mp4 |
| 163 | +
|
| 164 | +Agent: Perfect. I'll use this URL directly for the translation process. Now I need some details for the translation: |
| 165 | +1. Would you like to enable subtitles for accessibility? |
| 166 | +2. How many speakers are in your video? This helps with speaker identification. |
| 167 | +
|
| 168 | +You: Yes, please enable subtitles. We have 2 speakers - a narrator and a demonstrator. |
| 169 | +
|
| 170 | +Agent: I'll set up the translations with subtitles and optimize for 2 speakers. |
| 171 | +
|
| 172 | +Starting translation process for all three languages: |
| 173 | +- English → Spanish (es-ES): Processing... |
| 174 | +- English → French (fr-FR): Processing... |
| 175 | +- English → German (de-DE): Processing... |
| 176 | +``` |
| 177 | + |
| 178 | +### Creating an Iteration with Custom Subtitles |
| 179 | + |
| 180 | +``` |
| 181 | +You: I want to improve the subtitles for my translation |
| 182 | +Agent: I can help you create a new iteration with custom subtitles. Do you have the Translation ID for the video? |
| 183 | +
|
| 184 | +You: Yes, it's 06202024123045_en-US_ja-JP_PlatformVoice |
| 185 | +Agent: Great! Now I need the URL to your WebVTT subtitle file and the type of WebVTT file you're providing. |
| 186 | +[...] |
| 187 | +``` |
| 188 | + |
| 189 | +### Listing previous translations |
| 190 | + |
| 191 | +``` |
| 192 | +You: Could you please provide me with my five most recent translations? |
| 193 | +Agent: I'll retrieve your 5 most recent translations: |
| 194 | +
|
| 195 | +1. Translation ID: 07152023153045_en-US_ja-JP_PlatformVoice |
| 196 | + - Source Language: English (en-US) |
| 197 | + - Target Language: Japanese (ja-JP) |
| 198 | + - Created: July 15, 2025 15:30:45 |
| 199 | + - Status: Completed |
| 200 | + - Video URL: https://yourstorageaccount.blob.core.windows.net/videos/quarterly_review.mp4 |
| 201 | +
|
| 202 | +2. Translation ID: 07102023091230_en-US_fr-FR_PlatformVoice |
| 203 | + - Source Language: English (en-US) |
| 204 | + - Target Language: French (fr-FR) |
| 205 | + - Created: July 10, 2025 09:12:30 |
| 206 | + - Status: Completed |
| 207 | + - Video URL: https://yourstorageaccount.blob.core.windows.net/videos/product_launch.mp4 |
| 208 | +
|
| 209 | +3. Translation ID: 07052023142215_en-US_es-ES_PlatformVoice |
| 210 | + - Source Language: English (en-US) |
| 211 | + - Target Language: Spanish (es-ES) |
| 212 | + - Created: July 5, 2025 14:22:15 |
| 213 | + - Status: Completed |
| 214 | + - Video URL: https://yourstorageaccount.blob.core.windows.net/videos/training_module_3.mp4 |
| 215 | +
|
| 216 | +4. Translation ID: 06302023103345_en-US_de-DE_PlatformVoice |
| 217 | + - Source Language: English (en-US) |
| 218 | + - Target Language: German (de-DE) |
| 219 | + - Created: June 30, 2025 10:33:45 |
| 220 | + - Status: Completed |
| 221 | + - Video URL: https://yourstorageaccount.blob.core.windows.net/videos/executive_message.mp4 |
| 222 | +
|
| 223 | +5. Translation ID: 06252023081120_en-US_it-IT_PlatformVoice |
| 224 | + - Source Language: English (en-US) |
| 225 | + - Target Language: Italian (it-IT) |
| 226 | + - Created: June 25, 2025 08:11:20 |
| 227 | + - Status: Completed |
| 228 | + - Video URL: https://yourstorageaccount.blob.core.windows.net/videos/customer_testimonial.mp4 |
| 229 | +
|
| 230 | +Would you like to get more details about any specific translation or perform any actions with these translations? |
| 231 | +``` |
| 232 | + |
| 233 | +## ⚠️ Troubleshooting |
| 234 | + |
| 235 | +Common issues and solutions: |
| 236 | + |
| 237 | +- **Authentication Errors**: Ensure you're logged in with `az login` and have the proper permissions. |
| 238 | +- **File Upload Issues**: Check your storage account name and permissions. |
| 239 | +- **Translation Failures**: Verify your video format is supported and your Cognitive Services endpoint is correct. |
| 240 | + |
| 241 | +### Logging Configuration |
| 242 | + |
| 243 | +By default, logging is set to the ERROR level. If you need more detailed output for debugging or monitoring, you can adjust the logging level: |
| 244 | + |
| 245 | +```bash |
| 246 | +# Options include: DEBUG, INFO, WARNING, ERROR, CRITICAL |
| 247 | +export LOG_LEVEL=DEBUG # On Windows, use: set LOG_LEVEL=DEBUG |
| 248 | +``` |
| 249 | + |
| 250 | +You can also modify the logging configuration directly in the code by updating the `logging.basicConfig()` call in `template.py`: |
| 251 | + |
| 252 | +```python |
| 253 | +# Example: Change to INFO level logging with more detailed format |
| 254 | +logging.basicConfig( |
| 255 | + level=logging.INFO, |
| 256 | + format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', |
| 257 | + datefmt='%Y-%m-%d %H:%M:%S' |
| 258 | +) |
| 259 | +``` |
| 260 | + |
| 261 | +## Resources |
| 262 | +- [Semantic Kernel Official Documentation](https://learn.microsoft.com/semantic-kernel/overview/) |
| 263 | +- [Azure AI Agent Service Documentation](https://learn.microsoft.com/azure/ai-services/agents/) |
| 264 | +- [Azure AI Speech Service: Video Translation Documentation](https://learn.microsoft.com/azure/ai-services/speech-service/video-translation-overview) |
| 265 | +- [Azure AI Foundry Documentation](https://learn.microsoft.com/azure/ai-foundry/) |
| 266 | +- [Azure AI Speech Service Documentation](https://learn.microsoft.com/azure/ai-services/speech-service/) |
| 267 | +- [Azure Blob Storage Documentation](https://learn.microsoft.com/azure/storage/blobs/) |
| 268 | +- [Azure AI Services Overview](https://learn.microsoft.com/azure/ai-services/) |
0 commit comments