|
2 | 2 | "cells": [ |
3 | 3 | { |
4 | 4 | "cell_type": "markdown", |
5 | | - "id": "220629c8-17aa-45f6-ac81-0ca31e165412", |
| 5 | + "id": "dce042eb-d3ad-463c-ac41-4e0895e67c2a", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# OpenAI Module Demo" |
| 8 | + "# Using MLRun Hub Module for OpenAI Proxy App" |
| 9 | + ] |
| 10 | + }, |
| 11 | + { |
| 12 | + "cell_type": "markdown", |
| 13 | + "id": "58850fbe-ef31-4e36-9154-d8ca3d212532", |
| 14 | + "metadata": {}, |
| 15 | + "source": [ |
| 16 | + "This notebook walks through the process of importing an OpenAI proxy application from an MLRun Hub module and deploying it as part of your MLRun project. \n", |
| 17 | + "\n", |
| 18 | + "The module provides a flexible FastAPI endpoint that exposes the following OpenAI URLs: chat completions, responses, and embeddings. So you can generate text, query models, and work with vector representations.\n", |
| 19 | + "\n", |
| 20 | + "\n", |
| 21 | + "**Note** - Before running this notebook please generate an .env file with the following credentials \n", |
| 22 | + "\n", |
| 23 | + "```\n", |
| 24 | + "OPENAI_BASE_URL=\"..\"\n", |
| 25 | + "OPENAI_API_KEY=\"..\"\n", |
| 26 | + "\n", |
| 27 | + "# optional:\n", |
| 28 | + "OPENAI_DEFAULT_MODEL=\"..\" # by default uses gpt-4o-mini, it can changed by using this key\n", |
| 29 | + "```\n" |
9 | 30 | ] |
10 | 31 | }, |
11 | 32 | { |
12 | 33 | "cell_type": "code", |
13 | 34 | "execution_count": null, |
14 | | - "id": "967b4d5d-7250-40bf-8149-de11e1e3244c", |
| 35 | + "id": "9262e948-a1b3-4a9e-8b5f-cfa3310bb875", |
15 | 36 | "metadata": {}, |
16 | 37 | "outputs": [], |
17 | 38 | "source": [ |
18 | 39 | "import mlrun\n", |
19 | | - "import pandas as pd" |
| 40 | + "import os\n", |
| 41 | + "import pandas as pd\n", |
| 42 | + "from dotenv import load_dotenv\n", |
| 43 | + "load_dotenv()" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "id": "3ae0cf4f-0183-42e0-9dda-7ba6a5cfcc7b", |
| 49 | + "metadata": {}, |
| 50 | + "source": [ |
| 51 | + "Load or create a project and set credentials." |
20 | 52 | ] |
21 | 53 | }, |
22 | 54 | { |
23 | 55 | "cell_type": "code", |
24 | 56 | "execution_count": null, |
25 | | - "id": "17d208f4-a00a-42ef-a849-0fa79bed10cb", |
| 57 | + "id": "ea80c1fb-014d-4db1-95d3-71e6cd362a87", |
26 | 58 | "metadata": {}, |
27 | 59 | "outputs": [], |
28 | 60 | "source": [ |
29 | | - "project = mlrun.get_or_create_project(\"fastapi-openai\",user_project=True,context=\"./src\")" |
| 61 | + "project = mlrun.get_or_create_project(\"openai-module\", user_project=True)\n", |
| 62 | + "\n", |
| 63 | + "project.set_secrets({\n", |
| 64 | + " \"OPENAI_BASE_URL\": os.environ[\"OPENAI_BASE_URL\"],\n", |
| 65 | + " \"OPENAI_API_KEY\": os.environ[\"OPENAI_API_KEY\"],\n", |
| 66 | + " \"OPENAI_DEFAULT_MODEL\": os.environ[\"OPENAI_DEFAULT_MODEL\"]\n", |
| 67 | + "})" |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "markdown", |
| 72 | + "id": "c59fd225-f719-4881-a643-f41553b529d6", |
| 73 | + "metadata": {}, |
| 74 | + "source": [ |
| 75 | + "### Import the OpenAI proxy module from the Hub" |
30 | 76 | ] |
31 | 77 | }, |
32 | 78 | { |
33 | 79 | "cell_type": "code", |
34 | | - "execution_count": null, |
35 | | - "id": "67c93a0d-8240-48b8-808e-9cd0af418309", |
| 80 | + "execution_count": 3, |
| 81 | + "id": "5d294a0a-0500-464e-b8f7-c3c5f02bcc45", |
36 | 82 | "metadata": {}, |
37 | 83 | "outputs": [], |
38 | 84 | "source": [ |
39 | | - "app = mlrun.import_module(\"hub://openai\")" |
| 85 | + "openai_module = mlrun.import_module(\"hub://openai_proxy_app\")" |
40 | 86 | ] |
41 | 87 | }, |
42 | 88 | { |
43 | 89 | "cell_type": "code", |
44 | | - "execution_count": null, |
45 | | - "id": "93e67d6a-5f53-4bda-b0b5-4e2977088139", |
| 90 | + "execution_count": 4, |
| 91 | + "id": "104a64c4-6707-4b01-b6b8-503e023f03a3", |
| 92 | + "metadata": { |
| 93 | + "scrolled": true, |
| 94 | + "tags": [] |
| 95 | + }, |
| 96 | + "outputs": [], |
| 97 | + "source": [ |
| 98 | + "# Instantiate the module with your MLRun project and deploy it \n", |
| 99 | + "openai_obj = openai_module.OpenAIModule(project)\n", |
| 100 | + "openai_obj.openai_proxy_app.deploy()" |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "markdown", |
| 105 | + "id": "c2720a12-9b60-42cc-b50b-96a6bfb3d7b0", |
| 106 | + "metadata": {}, |
| 107 | + "source": [ |
| 108 | + "## Examples of OpenAI app API's " |
| 109 | + ] |
| 110 | + }, |
| 111 | + { |
| 112 | + "cell_type": "markdown", |
| 113 | + "id": "47469ec3-8345-439c-933b-1fd16a994939", |
| 114 | + "metadata": {}, |
| 115 | + "source": [ |
| 116 | + "### Chat completions API\n", |
| 117 | + "This example asks for the three largest countries in Europe and their capitals and returns a standard chat completion response." |
| 118 | + ] |
| 119 | + }, |
| 120 | + { |
| 121 | + "cell_type": "code", |
| 122 | + "execution_count": 6, |
| 123 | + "id": "6beba771-a011-4952-b00c-416289b67179", |
| 124 | + "metadata": {}, |
| 125 | + "outputs": [], |
| 126 | + "source": [ |
| 127 | + "response = openai_obj.openai_proxy_app.invoke(\n", |
| 128 | + " path=\"/v1/chat/completions\",\n", |
| 129 | + " body={\n", |
| 130 | + " \"model\": \"gpt-4o-mini\",\n", |
| 131 | + " \"messages\": [{\"role\": \"user\", \"content\": \"What are the 3 largest countries in Europe and what are their capitals names\"}],\n", |
| 132 | + " },\n", |
| 133 | + " method=\"POST\",\n", |
| 134 | + ")" |
| 135 | + ] |
| 136 | + }, |
| 137 | + { |
| 138 | + "cell_type": "markdown", |
| 139 | + "id": "f2aabeff-9d68-44e5-9ea6-5ddd43e4caed", |
| 140 | + "metadata": {}, |
| 141 | + "source": [ |
| 142 | + "### Go over the OpenAI response" |
| 143 | + ] |
| 144 | + }, |
| 145 | + { |
| 146 | + "cell_type": "code", |
| 147 | + "execution_count": 7, |
| 148 | + "id": "9acfacde-63c8-4f4d-a767-7a4cd33e6ac8", |
| 149 | + "metadata": {}, |
| 150 | + "outputs": [ |
| 151 | + { |
| 152 | + "name": "stdout", |
| 153 | + "output_type": "stream", |
| 154 | + "text": [ |
| 155 | + "The three largest countries in Europe by area are:\n", |
| 156 | + "\n", |
| 157 | + "1. **Russia** (part of it is in Europe) - Capital: Moscow\n", |
| 158 | + "2. **Ukraine** - Capital: Kyiv\n", |
| 159 | + "3. **France** - Capital: Paris\n", |
| 160 | + "\n", |
| 161 | + "Note that while Russia is the largest country in the world, only a portion of its landmass is in Europe.\n" |
| 162 | + ] |
| 163 | + } |
| 164 | + ], |
| 165 | + "source": [ |
| 166 | + "data = response.json()\n", |
| 167 | + "text = data[\"choices\"][0][\"message\"][\"content\"]\n", |
| 168 | + "print(text)" |
| 169 | + ] |
| 170 | + }, |
| 171 | + { |
| 172 | + "cell_type": "markdown", |
| 173 | + "id": "a63c02d1-939a-4d74-b3a2-d0cad6ea65f5", |
| 174 | + "metadata": {}, |
| 175 | + "source": [ |
| 176 | + "### Embedding with the Deployed OpenAI Proxy\n", |
| 177 | + "This example sends a short sentence to the embeddings endpoint and extracts the returned vector from the response payload. \n", |
| 178 | + "The result is a numeric embedding you can use for similarity search, clustering, or downstream model features." |
| 179 | + ] |
| 180 | + }, |
| 181 | + { |
| 182 | + "cell_type": "code", |
| 183 | + "execution_count": 8, |
| 184 | + "id": "62251f4d-817e-4f7c-8b09-13f0c9f5085b", |
| 185 | + "metadata": { |
| 186 | + "scrolled": true, |
| 187 | + "tags": [] |
| 188 | + }, |
| 189 | + "outputs": [], |
| 190 | + "source": [ |
| 191 | + "import json\n", |
| 192 | + "\n", |
| 193 | + "response = openai_obj.openai_proxy_app.invoke(\n", |
| 194 | + " path=\"/v1/embeddings\",\n", |
| 195 | + " body={\n", |
| 196 | + " \"model\": \"text-embedding-3-small\",\n", |
| 197 | + " \"input\": \"Kubernetes whispers to its pods at night\"\n", |
| 198 | + " },\n", |
| 199 | + " method=\"POST\",\n", |
| 200 | + ")" |
| 201 | + ] |
| 202 | + }, |
| 203 | + { |
| 204 | + "cell_type": "markdown", |
| 205 | + "id": "43a06fd4-2c7b-448d-83c9-456f2d817446", |
| 206 | + "metadata": {}, |
| 207 | + "source": [ |
| 208 | + "### Go over the OpenAI response" |
| 209 | + ] |
| 210 | + }, |
| 211 | + { |
| 212 | + "cell_type": "code", |
| 213 | + "execution_count": 9, |
| 214 | + "id": "dfa478ca-f70c-48c3-a75b-9aa6f36375e9", |
| 215 | + "metadata": {}, |
| 216 | + "outputs": [], |
| 217 | + "source": [ |
| 218 | + "embedding = response.json()[\"data\"][0][\"embedding\"]\n", |
| 219 | + "\n", |
| 220 | + "#print if you want to see the embedding\n", |
| 221 | + "#print(embedding) " |
| 222 | + ] |
| 223 | + }, |
| 224 | + { |
| 225 | + "cell_type": "markdown", |
| 226 | + "id": "9a92fca0-c579-47f4-afa5-31b0f8bb484e", |
| 227 | + "metadata": {}, |
| 228 | + "source": [ |
| 229 | + "### Request a Text Response and Extract the Output\n", |
| 230 | + "The proxy also supports the unified responses endpoint. \n", |
| 231 | + "Here we send a compact request for a short joke and then extract the generated text from the structured output. " |
| 232 | + ] |
| 233 | + }, |
| 234 | + { |
| 235 | + "cell_type": "code", |
| 236 | + "execution_count": 10, |
| 237 | + "id": "f343f347-75bf-440a-bdcf-5950d80fd706", |
46 | 238 | "metadata": {}, |
47 | 239 | "outputs": [], |
48 | | - "source": "app.OpenAIModule.deploy()" |
| 240 | + "source": [ |
| 241 | + "response = openai_obj.openai_proxy_app.invoke(\n", |
| 242 | + " path=\"/v1/responses\",\n", |
| 243 | + " body={\n", |
| 244 | + " \"model\": \"gpt-4o-mini\",\n", |
| 245 | + " \"input\": \"Give me a short joke about high tech workers\",\n", |
| 246 | + " \"max_output_tokens\": 30\n", |
| 247 | + " },\n", |
| 248 | + " method=\"POST\",\n", |
| 249 | + ")" |
| 250 | + ] |
| 251 | + }, |
| 252 | + { |
| 253 | + "cell_type": "markdown", |
| 254 | + "id": "73456c7b-80e6-4ac9-a94c-8258e7efad60", |
| 255 | + "metadata": {}, |
| 256 | + "source": [ |
| 257 | + "### Go over the OpenAI response" |
| 258 | + ] |
| 259 | + }, |
| 260 | + { |
| 261 | + "cell_type": "code", |
| 262 | + "execution_count": 11, |
| 263 | + "id": "54e3ac7b-842a-4b02-bd76-3baa16941b36", |
| 264 | + "metadata": {}, |
| 265 | + "outputs": [ |
| 266 | + { |
| 267 | + "name": "stdout", |
| 268 | + "output_type": "stream", |
| 269 | + "text": [ |
| 270 | + "Why did the high-tech worker bring a ladder to work?\n", |
| 271 | + "\n", |
| 272 | + "Because they wanted to reach new heights in their career!\n" |
| 273 | + ] |
| 274 | + } |
| 275 | + ], |
| 276 | + "source": [ |
| 277 | + "data = response.json()\n", |
| 278 | + "text = data[\"output\"][0][\"content\"][0][\"text\"]\n", |
| 279 | + "print(text)" |
| 280 | + ] |
49 | 281 | } |
50 | 282 | ], |
51 | 283 | "metadata": { |
52 | 284 | "kernelspec": { |
53 | | - "display_name": "Python 3 (ipykernel)", |
| 285 | + "display_name": "mlrun-base", |
54 | 286 | "language": "python", |
55 | | - "name": "python3" |
| 287 | + "name": "conda-env-mlrun-base-py" |
56 | 288 | }, |
57 | 289 | "language_info": { |
58 | 290 | "codemirror_mode": { |
|
64 | 296 | "name": "python", |
65 | 297 | "nbconvert_exporter": "python", |
66 | 298 | "pygments_lexer": "ipython3", |
67 | | - "version": "3.11.10" |
| 299 | + "version": "3.9.22" |
68 | 300 | } |
69 | 301 | }, |
70 | 302 | "nbformat": 4, |
|
0 commit comments