ogx-ai · MichaelClifford · Apr 9, 2025 · Apr 4, 2025 · Apr 4, 2025 · Apr 8, 2025
diff --git a/demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb b/demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb
@@ -0,0 +1,252 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "45fc9086-93aa-4645-8ba2-380c3acbbed9",
+   "metadata": {},
+   "source": [
+    "# Level 1: Foundational RAG\n",
+    "\n",
+    "This tutorial presents an example of executing queries with foundational (i.e., non-agentic) RAG in Llama Stack. It shows how the APIs provided by Llama Stack can be used to directly control and invoke all RAG stages, including indexing, retrieval and inference. \n",
+    "For an agentic RAG tutorial, please refer to [Level3_agentic_RAG.ipynb](demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb).\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This tutorial covers the following steps:\n",
+    "1. Connecting to a llama-stack server.\n",
+    "2. Indexing a collection of documents in a vector DB for later retrieval.\n",
+    "3. Executing the built-in RAG tool to retrieve the document chunks relevant to a given query.\n",
+    "4. Using the retrieved context to answer user queries during the inference step.\n",
+    "\n",
+    "\n",
+    "## Prerequisites\n",
+    "\n",
+    "Before starting, ensure you have a running instance of the Llama Stack server (local or remote) with at least one preconfigured vector DB. For more information, please refer to the corresponding [Llama Stack tutorials](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6db34e4b-ed29-4007-b760-59543d4caca1",
+   "metadata": {},
+   "source": [
+    "## 1. Setting Up the Environment\n",
+    "- Import the necessary libraries.\n",
+    "- Define the settings for the RAG pipeline, including the Llama Stack server URL, inference and document ingestion parameters.\n",
+    "- Initialize the connection to the server."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "854e7cb4-aed9-4098-adc1-a66f4c9e6ce3",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import uuid\n",
+    "\n",
+    "from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient\n",
+    "\n",
+    "# the server endpoint\n",
+    "LLAMA_STACK_SERVER_URL = \"http://localhost:8321\"\n",
+    "\n",
+    "# inference settings\n",
+    "MODEL_ID = \"ibm-granite/granite-3.2-8b-instruct\"\n",
+    "SYSTEM_PROMPT = \"You are a helpful assistant. \"\n",
+    "TEMPERATURE = 0.0\n",
+    "TOP_P = 0.95\n",
+    "\n",
+    "# RAG settings\n",
+    "VECTOR_DB_EMBEDDING_MODEL = \"all-MiniLM-L6-v2\"\n",
+    "VECTOR_DB_EMBEDDING_DIMENSION = 384\n",
+    "VECTOR_DB_CHUNK_SIZE = 512\n",
+    "\n",
+    "# For this demo, we are using Milvus Lite, which is our preferred solution. Any other Vector DB supported by Llama Stack can be used.\n",
+    "VECTOR_DB_PROVIDER_ID = 'milvus'\n",
+    "\n",
+    "# initialize the inference strategy\n",
+    "if TEMPERATURE > 0.0:\n",
+    "    strategy = {\"type\": \"top_p\", \"temperature\": TEMPERATURE, \"top_p\": TOP_P}\n",
+    "else:\n",
+    "    strategy = {\"type\": \"greedy\"}\n",
+    "    \n",
+    "# initialize the document collection to be used for RAG\n",
+    "vector_db_id = f\"test_vector_db_{uuid.uuid4()}\"\n",
+    "    \n",
+    "# initialize the server connection\n",
+    "client = LlamaStackClient(base_url=os.environ.get(\"LLAMA_STACK_ENDPOINT\", LLAMA_STACK_SERVER_URL))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9203de51-f570-44ab-8130-36333a54888b",
+   "metadata": {},
+   "source": [
+    "## 2. Indexing the Documents\n",
+    "- Initialize a new document collection in the target vector DB. All parameters related to the vector DB, such as the embedding model and dimension, must be specified here.\n",
+    "- Provide a list of document URLs to the RAG tool. Llama Stack will handle fetching, conversion and chunking of the documents' content."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "8d81ffb2-2089-4cb8-adae-f32965f206c7",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# define and register the document collection to be used\n",
+    "client.vector_dbs.register(\n",
+    "    vector_db_id=vector_db_id,\n",
+    "    embedding_model=VECTOR_DB_EMBEDDING_MODEL,\n",
+    "    embedding_dimension=VECTOR_DB_EMBEDDING_DIMENSION,\n",
+    "    provider_id=VECTOR_DB_PROVIDER_ID,\n",
+    ")\n",
+    "\n",
+    "# ingest the documents into the newly created document collection\n",
+    "urls = [\n",
+    "    (\"https://www.openshift.guide/openshift-guide-screen.pdf\", \"application/pdf\"),\n",
+    "    (\"https://www.cdflaborlaw.com/_images/content/2023_OCBJ_GC_Awards_Article.pdf\", \"application/pdf\"),\n",
+    "]\n",
+    "documents = [\n",
+    "    RAGDocument(\n",
+    "        document_id=f\"num-{i}\",\n",
+    "        content=url,\n",
+    "        mime_type=url_type,\n",
+    "        metadata={},\n",
+    "    )\n",
+    "    for i, (url, url_type) in enumerate(urls)\n",
+    "]\n",
+    "client.tool_runtime.rag_tool.insert(\n",
+    "    documents=documents,\n",
+    "    vector_db_id=vector_db_id,\n",
+    "    chunk_size_in_tokens=VECTOR_DB_CHUNK_SIZE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5639413-90d6-42ae-add4-6c89da0297e2",
+   "metadata": {},
+   "source": [
+    "## 3. Executing Queries via the Built-in RAG Tool\n",
+    "- Directly invoke the RAG tool to query the vector DB we ingested into at the previous stage.\n",
+    "- Construct an extended prompt using the retrieved chunks.\n",
+    "- Query the model with the extended prompt.\n",
+    "- Output the reply received from the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "0d39ab00-2a65-4b72-b5ed-4dd61f1204a2",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "User> How to install OpenShift?\n",
+      "inference> To install OpenShift, you can follow these steps:\n",
+      "\n",
+      "1. Download and install the OpenShift CLI (Command Line Interface) tool from the official Red Hat website.\n",
+      "2. Create a new project in your OpenShift cluster using the `oc new-project` command.\n",
+      "3. Install the OpenShift client tools on your system by running the `oc setup` command.\n",
+      "4. Log in to your OpenShift cluster using the `oc login` command with your credentials.\n",
+      "5. Create a new application in your project using the `oc new-app` command, specifying the image URL of the container you want to deploy.\n",
+      "6. Once the application is created, you can access it through the OpenShift web console or by running the `oc expose` command.\n",
+      "\n",
+      "Alternatively, you can also use the odo tool provided by Red Hat to create and deploy applications on OpenShift. The odo tool allows you to create applications using \"Devfiles,\" which are YAML files that contain information about your application's programming language, dependencies, and other essential details.\n",
+      "\n",
+      "You can download the odo tool from the \"Help\" menu on the OpenShift web console through the \"Command line tools\" entry. Once installed, you can use the `odo init` command to create a new application using various programming languages, and then run the `odo push` command to build and deploy the container to the OpenShift container registry.\n",
+      "\n",
+      "Note: The above steps are general instructions and may vary depending on your specific environment and configuration. It's recommended to refer to the official Red Hat documentation for more detailed instructions and troubleshooting guides.\n",
+      "User> Are employees based in California eligible for remote work?\n",
+      "inference> Based on the provided knowledge search tool results, it appears that there are specific regulations and considerations for employers in California regarding remote work. According to the results, California has laws that require employers to reimburse employees for necessary business expenditures or losses incurred by the employee in direct consequence of the discharge of their duties, including internet usage, home electricity, computer equipment, and cell phone usage.\n",
+      "\n",
+      "Additionally, there are specific requirements for data security and privacy, as California has strong data protection laws such as the California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA). Employers must safeguard sensitive information and comply with these regulations, which apply regardless of where the employee works.\n",
+      "\n",
+      "However, it is not explicitly stated in the provided results whether employees based in California are eligible for remote work. The results seem to focus more on the responsibilities and considerations for employers rather than the eligibility of employees for remote work.\n",
+      "\n",
+      "Therefore, without further information or clarification, I would say that the answer to the question \"Are employees based in California eligible for remote work?\" is not explicitly stated in the provided knowledge search tool results."
+     ]
+    }
+   ],
+   "source": [
+    "queries = [\n",
+    "    \"How to install OpenShift?\",\n",
+    "    \"Are employees based in California eligible for remote work?\",\n",
+    "]\n",
+    "\n",
+    "for prompt in queries:\n",
+    "    print(f\"\\nUser> {prompt}\")\n",
+    "    \n",
+    "    # RAG retrieval call\n",
+    "    rag_response = client.tool_runtime.rag_tool.query(content=prompt, vector_db_ids=[vector_db_id])\n",
+    "\n",
+    "    # the list of messages to be sent to the model must start with the system prompt\n",
+    "    messages = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}]\n",
+    "\n",
+    "    # construct the actual prompt to be executed, incorporating the original query and the retrieved content\n",
+    "    prompt_context = rag_response.content\n",
+    "    extended_prompt = f\"Please answer the given query using the context below.\\n\\nCONTEXT:\\n{prompt_context}\\n\\nQUERY:\\n{prompt}\"\n",
+    "    messages.append({\"role\": \"user\", \"content\": extended_prompt})\n",
+    "\n",
+    "    # use Llama Stack inference API to directly communicate with the desired model\n",
+    "    response = client.inference.chat_completion(\n",
+    "        messages=messages,\n",
+    "        model_id=MODEL_ID,\n",
+    "        sampling_params={\n",
+    "            \"strategy\": strategy,\n",
+    "        },\n",
+    "        stream=True,\n",
+    "    )\n",
+    "    \n",
+    "    # print the response\n",
+    "    print(\"inference> \", end='')\n",
+    "    for chunk in response:\n",
+    "        response_delta = chunk.event.delta\n",
+    "        if isinstance(response_delta, TextDelta):\n",
+    "            print(response_delta.text, end='')\n",
+    "        elif isinstance(response_delta, ToolCallDelta):\n",
+    "            print(response_delta.tool_call, end='')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df6937a3-3efa-4b66-aaf0-85d96b6d43db",
+   "metadata": {},
+   "source": [
+    "## Key Takeaways\n",
+    "This tutorial demonstrates how to set up and use the built-in RAG tool for ingesting user-provided documents in a vector DB and later utilizing them during inference via direct retrieval. Please check out our [complementary tutorial]([Level3_agentic_RAG.ipynb](demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb) for an agentic RAG example."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}