|
| 1 | +--- |
| 2 | +title: 'Deploy DeepSeek R1 with Azure AI Foundry and Gradio' |
| 3 | +description: 'Create a chat interface for the Thinking LLM model using Azure AI Foundry, DeepSeek, and Gradio.' |
| 4 | +pubDate: 'Jan 29 2025' |
| 5 | +heroImage: '/cookbook/thinking-llm-hero.png' |
| 6 | +tags: ["Azure AI", "Azure AI Foundry", "Model catalog", "Models-as-a-Service", "MaaS", "Generative AI", "Gradio", "Advanced reasoning", "LLM"] |
| 7 | +--- |
| 8 | + |
| 9 | +In this walkthrough, I'll show you how to deploying create a "thinking LLM" chat interface with thought bubbles using DeepSeek R1, Azure AI Foundry SDK, and Gradio. |
| 10 | + |
| 11 | +> This post is based on a Jupyter notebook example I created. You can use it alongside this walkthrough. Find it here: [DeepSeek R1 with Azure AI Foundry and Gradio](https://github.com/nicholasdbrady/cookbook/blob/main/examples/deepseek/deepseek-r1-with-azure-ai-foundry-and-gradio.ipynb) |
| 12 | +
|
| 13 | +## Table of Contents |
| 14 | +- [Introduction](#introduction) |
| 15 | +- [DeepSeek R1 on Azure AI Foundry](#deepseek-r1-on-azure-ai-foundry) |
| 16 | +- [Benefits of Using DeepSeek R1 on Azure AI Foundry](#benefits-of-using-deepseek-r1-on-azure-ai-foundry) |
| 17 | +- [Prerequisites](#prerequisites) |
| 18 | +- [Setting Up the ChatCompletionsClient](#step-1-setting-up-the-chatcompletionsclient) |
| 19 | +- [Implementing a Streaming Response Function](#step-2-implementing-a-streaming-response-function) |
| 20 | +- [Creating the Gradio Interface](#step-3-creating-the-gradio-interface) |
| 21 | +- [Conclusion](#conclusion) |
| 22 | + |
| 23 | +## Introduction |
| 24 | + |
| 25 | +[](https://github.com/deepseek-ai/DeepSeek-R1) |
| 26 | + |
| 27 | +**DeepSeek R1** has gained widespread attention for its advanced reasoning capabilities, excelling in language processing, scientific problem-solving, and coding. With 671B total parameters, 37B active parameters, and a 128K context length, it pushes the boundaries of AI-driven reasoning ([Explore DeepSeek R1 on Azure AI Foundry](https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek)). Benchmarking and evaluation results highlight its performance against other models, showcasing its effectiveness in reasoning tasks ([Evaluation Results](https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=readme-ov-file#4-evaluation-results)). Building on prior models, DeepSeek R1 integrates Chain-of-Thought (CoT) reasoning, reinforcement learning (RL), and fine-tuning on curated datasets to achieve state-of-the-art performance. This tutorial will walk you through how to deploy DeepSeek R1 from [Azure AI Foundry's model catalog](https://ai.azure.com/explore/models/) and integrate it with [Gradio](https://www.gradio.app/) to build a real-time streaming chatbot specifically for thinking LLMs like **DeepSeek R1**. |
| 28 | + |
| 29 | +### DeepSeek R1 on Azure AI Foundry |
| 30 | + |
| 31 | +On **January 29, 2025**, Microsoft announced that **DeepSeek R1** is now available on **Azure AI Foundry** and **GitHub**, making it part of a growing portfolio of over **1,800 AI models** available for enterprise use. With this integration, businesses can deploy DeepSeek R1 using **serverless APIs**, ensuring seamless scalability, security, and compliance with Microsoft’s responsible AI principles. ([Azure AI Foundry announcement](https://azure.microsoft.com/en-us/blog/deepseek-r1-on-azure-ai-foundry)) |
| 32 | + |
| 33 | +<!-- Local image stored at public/assets/stars.png --> |
| 34 | + |
| 35 | + |
| 36 | +### Benefits of Using DeepSeek R1 on Azure AI Foundry |
| 37 | + |
| 38 | +- **Enterprise-Ready AI:** DeepSeek R1 is available as a trusted, scalable, and secure AI model, backed by Microsoft's infrastructure. |
| 39 | +- **Optimized Model Evaluation:** Built-in tools allow developers to benchmark performance and compare outputs across different models. |
| 40 | +- **Security and Responsible AI:** DeepSeek R1 has undergone **rigorous safety evaluations**, including automated assessments, security reviews, and **Azure AI Content Safety** integration for filtering potentially harmful content. |
| 41 | +- **Flexible Deployment:** Developers can deploy the model via the **Azure AI Studio, Azure CLI, ARM templates**, or Python SDK. |
| 42 | + |
| 43 | +By combining **DeepSeek R1**'s robust language understanding with Gradio's interactive capabilities, you can create a powerful chatbot application that processes and responds to user inputs in real time. This tutorial will walk you through the necessary steps, from setting up the DeepSeek API client to building a responsive Gradio interface, ensuring a comprehensive understanding of the integration process. |
| 44 | + |
| 45 | +[Return to top](#top) |
| 46 | + |
| 47 | +## Prerequisites |
| 48 | + |
| 49 | +Before we begin, ensure you have the following: |
| 50 | + |
| 51 | +- **Python 3.8+** installed on your system. |
| 52 | +- An Azure AI Foundry model deployment with an endpoint. If you haven't deployed DeepSeek R1 as a serverless API yet, please follow the steps outlined in [Deploy models as serverless APIs](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless). |
| 53 | +- An **Azure AI Foundry** model deployment with an endpoint. |
| 54 | +- The required Python packages installed: |
| 55 | + ```sh |
| 56 | + pip install azure-ai-inference gradio |
| 57 | + ``` |
| 58 | +- Environment variables set for your Azure AI credentials: |
| 59 | + ```sh |
| 60 | + export AZURE_INFERENCE_ENDPOINT="https://your-endpoint-name.region.inference.ai.azure.com" |
| 61 | + export AZURE_INFERENCE_CREDENTIAL="your-api-key" |
| 62 | + ``` |
| 63 | + |
| 64 | +[Return to top](#top) |
| 65 | + |
| 66 | +## Step 1: Setting Up the ChatCompletionsClient |
| 67 | + |
| 68 | +```python |
| 69 | +import os |
| 70 | +import gradio as gr |
| 71 | +from azure.ai.inference import ChatCompletionsClient |
| 72 | +from azure.ai.inference.models import SystemMessage, UserMessage, AssistantMessage |
| 73 | +from azure.core.credentials import AzureKeyCredential |
| 74 | +from gradio import ChatMessage |
| 75 | +from typing import Iterator |
| 76 | + |
| 77 | +client = ChatCompletionsClient( |
| 78 | + endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"], |
| 79 | + credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]) |
| 80 | + # If you're authenticating with Microsoft Entra ID, use DefaultAzureCredential() |
| 81 | + # or other supported credentials instead of AzureKeyCredential. |
| 82 | +) |
| 83 | +``` |
| 84 | + |
| 85 | +[Return to top](#top) |
| 86 | + |
| 87 | +## Step 2: Implementing a Streaming Response Function |
| 88 | + |
| 89 | +```python |
| 90 | +def stream_response(user_message: str, messages: list) -> Iterator[list]: |
| 91 | + if not messages: |
| 92 | + messages = [] |
| 93 | + |
| 94 | + # Convert Gradio chat history into Azure AI Inference messages |
| 95 | + azure_messages = [SystemMessage(content="You are a helpful assistant.")] |
| 96 | + for msg in messages: |
| 97 | + print(f"Gradio ChatMessage: {msg}") # Debug print |
| 98 | + if isinstance(msg, ChatMessage): |
| 99 | + azure_msg = UserMessage(content=msg.content) if msg.role == "user" else AssistantMessage(content=msg.content) |
| 100 | + elif isinstance(msg, dict) and "role" in msg and "content" in msg: |
| 101 | + azure_msg = UserMessage(content=msg["content"]) if msg["role"] == "user" else AssistantMessage(content=msg["content"]) |
| 102 | + else: |
| 103 | + continue |
| 104 | + azure_messages.append(azure_msg) |
| 105 | + |
| 106 | + # Ensure only serializable objects are sent to Azure |
| 107 | + azure_messages = [msg.dict() if hasattr(msg, "dict") else msg for msg in azure_messages] |
| 108 | + |
| 109 | + response = client.complete(messages=azure_messages, stream=True) |
| 110 | + |
| 111 | + # Initialize buffers |
| 112 | + thought_buffer = "" |
| 113 | + response_buffer = "" |
| 114 | + inside_thought = False |
| 115 | + |
| 116 | + for update in response: |
| 117 | + if update.choices: |
| 118 | + current_chunk = update.choices[0].delta.content |
| 119 | + |
| 120 | + if "<think>" in current_chunk: |
| 121 | + inside_thought = True |
| 122 | + print("Entering thought processing mode.") |
| 123 | + messages.append(ChatMessage(role="assistant", content="", metadata={"title": "🧠 R1 Thinking...", "status": "pending"})) |
| 124 | + yield messages |
| 125 | + continue |
| 126 | + elif "</think>" in current_chunk: |
| 127 | + inside_thought = False |
| 128 | + messages[-1] = ChatMessage( |
| 129 | + role="assistant", |
| 130 | + content=thought_buffer.strip(), |
| 131 | + metadata={"title": "🧠 R1 Thinking...", "status": "done"} |
| 132 | + ) |
| 133 | + yield messages # Yield the thought message immediately |
| 134 | + thought_buffer = "" |
| 135 | + continue |
| 136 | + |
| 137 | + if inside_thought: |
| 138 | + thought_buffer += current_chunk |
| 139 | + messages[-1] = ChatMessage( |
| 140 | + role="assistant", |
| 141 | + content=thought_buffer, |
| 142 | + metadata={"title": "🧠 R1 Thinking...", "status": "pending"} |
| 143 | + ) |
| 144 | + yield messages # Yield the thought message as it updates |
| 145 | + else: |
| 146 | + response_buffer += current_chunk |
| 147 | + if messages and isinstance(messages[-1], ChatMessage) and messages[-1].role == "assistant" and (not messages[-1].metadata or "title" not in messages[-1].metadata): |
| 148 | + messages[-1] = ChatMessage(role="assistant", content=response_buffer) |
| 149 | + else: |
| 150 | + messages.append(ChatMessage(role="assistant", content=response_buffer)) |
| 151 | + yield messages |
| 152 | +``` |
| 153 | + |
| 154 | +[Return to top](#top) |
| 155 | + |
| 156 | +## Step 3: Creating the Gradio Interface |
| 157 | + |
| 158 | +```python |
| 159 | +with gr.Blocks(title="DeepSeek R1 with Azure AI Foundry", fill_height=True, fill_width=True) as demo: |
| 160 | + title = gr.Markdown("## DeepSeek R1 with Azure AI Foundry 🤭") |
| 161 | + chatbot = gr.Chatbot( |
| 162 | + type="messages", |
| 163 | + label="DeepSeek-R1", |
| 164 | + render_markdown=True, |
| 165 | + show_label=False, |
| 166 | + scale=1, |
| 167 | + ) |
| 168 | + |
| 169 | + input_box = gr.Textbox( |
| 170 | + lines=1, |
| 171 | + submit_btn=True, |
| 172 | + show_label=False, |
| 173 | + ) |
| 174 | + |
| 175 | + msg_store = gr.State("") |
| 176 | + input_box.submit(lambda msg: (msg, msg, ""), inputs=[input_box], outputs=[msg_store, input_box, input_box], queue=False) |
| 177 | + input_box.submit(lambda msg, chat: (ChatMessage(role="user", content=msg), chat + [ChatMessage(role="user", content=msg)]), inputs=[msg_store, chatbot], outputs=[msg_store, chatbot], queue=False).then( |
| 178 | + stream_response, inputs=[msg_store, chatbot], outputs=chatbot |
| 179 | + ) |
| 180 | + |
| 181 | + demo.launch() |
| 182 | +``` |
| 183 | + |
| 184 | +## Conclusion |
| 185 | + |
| 186 | +In this tutorial, we built a **streaming chatbot** using **DeepSeek R1** on **Azure AI Foundry** with **Gradio**. |
| 187 | + |
| 188 | +<!-- Local image stored at public/assets/stars.png --> |
| 189 | + |
| 190 | + |
| 191 | +We covered: |
| 192 | + |
| 193 | +- Setting up the DeepSeek R1 model on Azure. |
| 194 | +- Creating and handling chat completion requests. |
| 195 | +- Implementing real-time streaming responses. |
| 196 | +- Deploying the chatbot using Gradio. |
| 197 | + |
| 198 | +Get started today by visiting **[Azure AI Foundry](https://azure.microsoft.com/en-us/products/ai-services/ai-foundry)** and **[DeepSeek on GitHub](https://github.com/DeepSeekAI/DeepSeek-R1)**. |
| 199 | + |
| 200 | +Happy coding! 🚀 |
0 commit comments