|
8 | 8 | "\n", |
9 | 9 | "## Introduction\n", |
10 | 10 | "\n", |
11 | | - "**DeepSeek R1** has gained widespread attention for its advanced reasoning capabilities, excelling in language processing, scientific problem-solving, and coding. With 671B total parameters, 37B active parameters, and a 128K context length, it pushes the boundaries of AI-driven reasoning ([Explore DeepSeek R1 on Azure AI Foundry](https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek)). Benchmarking and evaluation results highlight its performance against other models, showcasing its effectiveness in reasoning tasks ([Evaluation Results](https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=readme-ov-file#4-evaluation-results)). Building on prior models, DeepSeek R1 integrates Chain-of-Thought (CoT) reasoning, reinforcement learning (RL), and fine-tuning on curated datasets to achieve state-of-the-art performance. This tutorial will walk you through how to deploy DeepSeek R1 from [Azure AI Foundry's model catalog](https://ai.azure.com/explore/models/) and integrate it with [Gradio](https://www.gradio.app/) to build a real-time streaming chatbot specifically for thinking LLMs like **DeepSeek R1**.\n", |
12 | | - "\n", |
| 11 | + "**DeepSeek R1** has gained widespread attention for its advanced reasoning capabilities, excelling in language processing, scientific problem-solving, and coding. With 671B total parameters, 37B active parameters, and a 128K context length, it pushes the boundaries of AI-driven reasoning ([Explore DeepSeek R1 on Azure AI Foundry](https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek)). Benchmarking and evaluation results highlight its performance against other models, showcasing its effectiveness in reasoning tasks ([Evaluation Results](https://github.com/deepseek-ai/DeepSeek-R1/tree/main?tab=readme-ov-file#4-evaluation-results)). Building on prior models, DeepSeek R1 integrates Chain-of-Thought (CoT) reasoning, reinforcement learning (RL), and fine-tuning on curated datasets to achieve state-of-the-art performance. This tutorial will walk you through how to deploy DeepSeek R1 from [Azure AI Foundry's model catalog](https://ai.azure.com/explore/models/) and integrate it with [Gradio](https://www.gradio.app/) to build a real-time streaming chatbot specifically for thinking LLMs like **DeepSeek R1**." |
| 12 | + ] |
| 13 | + }, |
| 14 | + { |
| 15 | + "cell_type": "markdown", |
| 16 | + "metadata": {}, |
| 17 | + "source": [ |
13 | 18 | "### DeepSeek R1 on Azure AI Foundry\n", |
14 | 19 | "\n", |
15 | | - "On **January 29, 2025**, Microsoft announced that **DeepSeek R1** is now available on **Azure AI Foundry** and **GitHub**, making it part of a growing portfolio of over **1,800 AI models** available for enterprise use. With this integration, businesses can deploy DeepSeek R1 using **serverless APIs**, ensuring seamless scalability, security, and compliance with Microsoft’s responsible AI principles. ([Azure AI Foundry announcement](https://azure.microsoft.com/en-us/blog/deepseek-r1-on-azure-ai-foundry))\n", |
16 | | - "\n", |
| 20 | + "On **January 29, 2025**, Microsoft announced that **DeepSeek R1** is now available on **Azure AI Foundry** and **GitHub**, making it part of a growing portfolio of over **1,800 AI models** available for enterprise use. With this integration, businesses can deploy DeepSeek R1 using **serverless APIs**, ensuring seamless scalability, security, and compliance with Microsoft’s responsible AI principles. ([Azure AI Foundry announcement](https://azure.microsoft.com/en-us/blog/deepseek-r1-on-azure-ai-foundry))" |
| 21 | + ] |
| 22 | + }, |
| 23 | + { |
| 24 | + "cell_type": "markdown", |
| 25 | + "metadata": {}, |
| 26 | + "source": [ |
17 | 27 | "### Benefits of Using DeepSeek R1 on Azure AI Foundry\n", |
18 | 28 | "\n", |
19 | 29 | "- **Enterprise-Ready AI:** DeepSeek R1 is available as a trusted, scalable, and secure AI model, backed by Microsoft's infrastructure.\n", |
|
203 | 213 | "- Implementing real-time streaming responses.\n", |
204 | 214 | "- Deploying the chatbot using Gradio.\n", |
205 | 215 | "\n", |
206 | | - "Get started today by visiting **[Azure AI Foundry](https://azure.microsoft.com/en-us/products/ai-services/ai-foundry)** and **[DeepSeek on GitHub](https://github.com/DeepSeekAI/DeepSeek-R1)**.\n", |
| 216 | + "Get started today by visiting **[DeepSeek R1 on Azure AI Foundry Model Catalog](https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek)** and **[DeepSeek on GitHub Models](https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1)**.\n", |
207 | 217 | "\n", |
208 | 218 | "Happy coding! 🚀" |
209 | 219 | ] |
210 | | - }, |
211 | | - { |
212 | | - "cell_type": "code", |
213 | | - "execution_count": null, |
214 | | - "metadata": {}, |
215 | | - "outputs": [ |
216 | | - { |
217 | | - "name": "stdout", |
218 | | - "output_type": "stream", |
219 | | - "text": [ |
220 | | - "* Running on local URL: http://127.0.0.1:7910\n", |
221 | | - "\n", |
222 | | - "To create a public link, set `share=True` in `launch()`.\n" |
223 | | - ] |
224 | | - }, |
225 | | - { |
226 | | - "data": { |
227 | | - "text/html": [ |
228 | | - "<div><iframe src=\"http://127.0.0.1:7910/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>" |
229 | | - ], |
230 | | - "text/plain": [ |
231 | | - "<IPython.core.display.HTML object>" |
232 | | - ] |
233 | | - }, |
234 | | - "metadata": {}, |
235 | | - "output_type": "display_data" |
236 | | - }, |
237 | | - { |
238 | | - "name": "stdout", |
239 | | - "output_type": "stream", |
240 | | - "text": [ |
241 | | - "Gradio ChatMessage: {'role': 'user', 'metadata': {}, 'content': 'hey there', 'options': []}\n", |
242 | | - "Converted to Azure Message: {'role': 'user', 'content': 'hey there'}\n", |
243 | | - "Final Azure Messages: [{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'hey there'}]\n", |
244 | | - "Using parameters - Temperature: 0.7, Top P: 1, Max Tokens: 2048\n", |
245 | | - "Entering thought processing mode.\n" |
246 | | - ] |
247 | | - } |
248 | | - ], |
249 | | - "source": [ |
250 | | - "import os\n", |
251 | | - "import gradio as gr\n", |
252 | | - "from azure.ai.inference import ChatCompletionsClient\n", |
253 | | - "from azure.ai.inference.models import SystemMessage, UserMessage, AssistantMessage\n", |
254 | | - "from azure.core.credentials import AzureKeyCredential\n", |
255 | | - "from gradio import ChatMessage\n", |
256 | | - "from typing import Iterator\n", |
257 | | - "\n", |
258 | | - "###############################################################################\n", |
259 | | - "# 1) Create the ChatCompletionsClient\n", |
260 | | - "###############################################################################\n", |
261 | | - "client = ChatCompletionsClient(\n", |
262 | | - " endpoint=os.environ[\"AZURE_INFERENCE_ENDPOINT\"], # e.g. \"https://my-r1-endpoint.eastus2.inference.ai.azure.com\"\n", |
263 | | - " credential=AzureKeyCredential(os.environ[\"AZURE_INFERENCE_CREDENTIAL\"])\n", |
264 | | - " # If you're authenticating with Microsoft Entra ID, use DefaultAzureCredential() \n", |
265 | | - " # or other supported credentials instead of AzureKeyCredential.\n", |
266 | | - ")\n", |
267 | | - "\n", |
268 | | - "###############################################################################\n", |
269 | | - "# 2) Stream response function for Gradio\n", |
270 | | - "###############################################################################\n", |
271 | | - "def stream_response(user_message: str, messages: list, temperature: float, top_p: float, max_tokens: int) -> Iterator[list]:\n", |
272 | | - " if not messages:\n", |
273 | | - " messages = []\n", |
274 | | - " \n", |
275 | | - " # Convert Gradio chat history into Azure AI Inference messages\n", |
276 | | - " azure_messages = [SystemMessage(content=\"You are a helpful assistant.\")]\n", |
277 | | - " for msg in messages:\n", |
278 | | - " print(f\"Gradio ChatMessage: {msg}\") # Debug print\n", |
279 | | - " if isinstance(msg, ChatMessage):\n", |
280 | | - " azure_msg = UserMessage(content=msg.content) if msg.role == \"user\" else AssistantMessage(content=msg.content)\n", |
281 | | - " elif isinstance(msg, dict) and \"role\" in msg and \"content\" in msg:\n", |
282 | | - " azure_msg = UserMessage(content=msg[\"content\"]) if msg[\"role\"] == \"user\" else AssistantMessage(content=msg[\"content\"])\n", |
283 | | - " else:\n", |
284 | | - " continue\n", |
285 | | - " print(f\"Converted to Azure Message: {azure_msg}\") # Debug print\n", |
286 | | - " azure_messages.append(azure_msg)\n", |
287 | | - " \n", |
288 | | - " # Ensure only serializable objects are sent to Azure\n", |
289 | | - " azure_messages = [msg.dict() if hasattr(msg, \"dict\") else msg for msg in azure_messages]\n", |
290 | | - " \n", |
291 | | - " print(f\"Final Azure Messages: {azure_messages}\") # Debug print\n", |
292 | | - " print(f\"Using parameters - Temperature: {temperature}, Top P: {top_p}, Max Tokens: {max_tokens}\")\n", |
293 | | - " \n", |
294 | | - " response = client.complete(messages=azure_messages, stream=True, temperature=temperature, top_p=top_p, max_tokens=max_tokens)\n", |
295 | | - " \n", |
296 | | - " # Initialize buffers\n", |
297 | | - " thought_buffer = \"\"\n", |
298 | | - " response_buffer = \"\"\n", |
299 | | - " inside_thought = False\n", |
300 | | - " \n", |
301 | | - " for update in response:\n", |
302 | | - " if update.choices:\n", |
303 | | - " current_chunk = update.choices[0].delta.content\n", |
304 | | - " \n", |
305 | | - " if \"<think>\" in current_chunk:\n", |
306 | | - " inside_thought = True\n", |
307 | | - " print(\"Entering thought processing mode.\")\n", |
308 | | - " messages.append(ChatMessage(role=\"assistant\", content=\"\", metadata={\"title\": \"🧠 R1 Thinking...\", \"status\": \"pending\"}))\n", |
309 | | - " yield messages\n", |
310 | | - " continue\n", |
311 | | - " elif \"</think>\" in current_chunk:\n", |
312 | | - " inside_thought = False\n", |
313 | | - " messages[-1] = ChatMessage(\n", |
314 | | - " role=\"assistant\",\n", |
315 | | - " content=thought_buffer.strip(),\n", |
316 | | - " metadata={\"title\": \"🧠 R1 Thinking...\", \"status\": \"done\"}\n", |
317 | | - " )\n", |
318 | | - " yield messages # Yield the thought message immediately\n", |
319 | | - " thought_buffer = \"\"\n", |
320 | | - " continue\n", |
321 | | - " \n", |
322 | | - " if inside_thought:\n", |
323 | | - " thought_buffer += current_chunk\n", |
324 | | - " messages[-1] = ChatMessage(\n", |
325 | | - " role=\"assistant\",\n", |
326 | | - " content=thought_buffer,\n", |
327 | | - " metadata={\"title\": \"🧠 R1 Thinking...\", \"status\": \"pending\"}\n", |
328 | | - " )\n", |
329 | | - " yield messages # Yield the thought message as it updates\n", |
330 | | - " else:\n", |
331 | | - " response_buffer += current_chunk\n", |
332 | | - " if messages and isinstance(messages[-1], ChatMessage) and messages[-1].role == \"assistant\" and (not messages[-1].metadata or \"title\" not in messages[-1].metadata):\n", |
333 | | - " messages[-1] = ChatMessage(role=\"assistant\", content=response_buffer)\n", |
334 | | - " else:\n", |
335 | | - " messages.append(ChatMessage(role=\"assistant\", content=response_buffer))\n", |
336 | | - " yield messages\n", |
337 | | - "\n", |
338 | | - "###############################################################################\n", |
339 | | - "# 3) Gradio UI\n", |
340 | | - "###############################################################################\n", |
341 | | - "brand_theme = gr.themes.Default(\n", |
342 | | - " primary_hue=\"blue\",\n", |
343 | | - " secondary_hue=\"blue\",\n", |
344 | | - " neutral_hue=\"gray\",\n", |
345 | | - " font=[\"Segoe UI\", \"Arial\", \"sans-serif\"],\n", |
346 | | - " font_mono=[\"Courier New\", \"monospace\"]\n", |
347 | | - ").set(\n", |
348 | | - " button_primary_background_fill=\"#0f6cbd\",\n", |
349 | | - " button_primary_background_fill_hover=\"#115ea3\",\n", |
350 | | - " button_primary_background_fill_hover_dark=\"#4f52b2\",\n", |
351 | | - " button_primary_background_fill_dark=\"#5b5fc7\",\n", |
352 | | - " button_primary_text_color=\"#ffffff\",\n", |
353 | | - " body_background_fill=\"#f5f5f5\",\n", |
354 | | - " block_background_fill=\"#ffffff\",\n", |
355 | | - " body_text_color=\"#242424\",\n", |
356 | | - " body_text_color_subdued=\"#616161\",\n", |
357 | | - " block_border_color=\"#d1d1d1\",\n", |
358 | | - " input_background_fill=\"#ffffff\",\n", |
359 | | - " input_border_color=\"#d1d1d1\",\n", |
360 | | - " input_border_color_focus=\"#0f6cbd\",\n", |
361 | | - ")\n", |
362 | | - "\n", |
363 | | - "with gr.Blocks(title=\"DeepSeek R1 with Azure AI Foundry\", theme=brand_theme, css=\"footer {visibility: hidden}\", fill_height=True, fill_width=True) as demo:\n", |
364 | | - " title = gr.Markdown(\"## DeepSeek R1 with Azure AI Foundry 🤭\")\n", |
365 | | - " chatbot = gr.Chatbot(\n", |
366 | | - " type=\"messages\",\n", |
367 | | - " label=\"DeepSeek-R1\",\n", |
368 | | - " render_markdown=True,\n", |
369 | | - " show_label=False,\n", |
370 | | - " scale=1,\n", |
371 | | - " )\n", |
372 | | - " \n", |
373 | | - " input_box = gr.Textbox(\n", |
374 | | - " lines=1,\n", |
375 | | - " submit_btn=True,\n", |
376 | | - " show_label=False,\n", |
377 | | - " )\n", |
378 | | - " \n", |
379 | | - " with gr.Accordion(\"Model Parameters\", open=False):\n", |
380 | | - " temperature = gr.Slider(0.0, 2.0, value=0.7, step=0.1, label=\"Temperature\", info=\"Controls randomness\", interactive=True)\n", |
381 | | - " top_p = gr.Slider(0.0, 1.0, value=1.0, step=0.1, label=\"Top P\", info=\"Nucleus sampling\", interactive=True)\n", |
382 | | - " max_tokens = gr.Slider(0, 4096, value=2048, step=128, label=\"Max Tokens\", info=\"Limits response length\", interactive=True)\n", |
383 | | - " reset_button = gr.Button(\"Reset Defaults\")\n", |
384 | | - " \n", |
385 | | - " reset_button.click(lambda: (0.7, 1.0, 2048), outputs=[temperature, top_p, max_tokens])\n", |
386 | | - " \n", |
387 | | - " msg_store = gr.State(\"\")\n", |
388 | | - " input_box.submit(lambda msg: (msg, msg, \"\"), inputs=[input_box], outputs=[msg_store, input_box, input_box], queue=False)\n", |
389 | | - " input_box.submit(lambda msg, chat: (ChatMessage(role=\"user\", content=msg), chat + [ChatMessage(role=\"user\", content=msg)]), inputs=[msg_store, chatbot], outputs=[msg_store, chatbot], queue=False).then(\n", |
390 | | - " stream_response, inputs=[msg_store, chatbot, temperature, top_p, max_tokens], outputs=chatbot\n", |
391 | | - " )\n", |
392 | | - " \n", |
393 | | - " demo.launch()\n" |
394 | | - ] |
395 | 220 | } |
396 | 221 | ], |
397 | 222 | "metadata": { |
|
0 commit comments