-
Notifications
You must be signed in to change notification settings - Fork 76
RAG notebooks #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
RAG notebooks #53
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
4c2e174
Add the first draft of the direct + agentic RAG demo.
ilya-kolchinsky 07bf55d
Merge branch 'opendatahub-io:main' into main
ilya-kolchinsky 7f32e01
Merge branch 'opendatahub-io:main' into main
ilya-kolchinsky b4bed93
Merge branch 'opendatahub-io:main' into main
ilya-kolchinsky fb76b4c
Merge branch 'opendatahub-io:main' into main
ilya-kolchinsky 82d7ac6
Added Jupyter notebooks for the RAG demos.
ilya-kolchinsky ec4451d
Update demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb
ilya-kolchinsky aa254e0
Update demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb
ilya-kolchinsky befb3e5
Update demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb
ilya-kolchinsky 69860f0
Update demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb
ilya-kolchinsky 11e580c
Changed the model to Granite for consistency.
ilya-kolchinsky 4d6787a
Merge remote-tracking branch 'origin/main'
ilya-kolchinsky 578e33e
Removed unneeded imports.
ilya-kolchinsky 3258d3b
Highlighted Milvus as our preferred vector DB choice.
ilya-kolchinsky 3d908e5
Update demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb
ilya-kolchinsky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
252 changes: 252 additions & 0 deletions
252
demos/rag_agentic/notebooks/Level1_foundational_RAG.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "45fc9086-93aa-4645-8ba2-380c3acbbed9", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Level 1: Foundational RAG\n", | ||
| "\n", | ||
| "This tutorial presents an example of executing queries with foundational (i.e., non-agentic) RAG in Llama Stack. It shows how the APIs provided by Llama Stack can be used to directly control and invoke all RAG stages, including indexing, retrieval and inference. \n", | ||
| "For an agentic RAG tutorial, please refer to [Level3_agentic_RAG.ipynb](demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb).\n", | ||
| "\n", | ||
| "## Overview\n", | ||
| "\n", | ||
| "This tutorial covers the following steps:\n", | ||
| "1. Connecting to a llama-stack server.\n", | ||
| "2. Indexing a collection of documents in a vector DB for later retrieval.\n", | ||
| "3. Executing the built-in RAG tool to retrieve the document chunks relevant to a given query.\n", | ||
| "4. Using the retrieved context to answer user queries during the inference step.\n", | ||
| "\n", | ||
| "\n", | ||
| "## Prerequisites\n", | ||
| "\n", | ||
| "Before starting, ensure you have a running instance of the Llama Stack server (local or remote) with at least one preconfigured vector DB. For more information, please refer to the corresponding [Llama Stack tutorials](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html)." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "6db34e4b-ed29-4007-b760-59543d4caca1", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 1. Setting Up the Environment\n", | ||
| "- Import the necessary libraries.\n", | ||
| "- Define the settings for the RAG pipeline, including the Llama Stack server URL, inference and document ingestion parameters.\n", | ||
| "- Initialize the connection to the server." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 3, | ||
| "id": "854e7cb4-aed9-4098-adc1-a66f4c9e6ce3", | ||
| "metadata": { | ||
| "tags": [] | ||
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import os\n", | ||
| "import uuid\n", | ||
| "\n", | ||
| "from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient\n", | ||
| "\n", | ||
| "# the server endpoint\n", | ||
| "LLAMA_STACK_SERVER_URL = \"http://localhost:8321\"\n", | ||
| "\n", | ||
| "# inference settings\n", | ||
| "MODEL_ID = \"ibm-granite/granite-3.2-8b-instruct\"\n", | ||
| "SYSTEM_PROMPT = \"You are a helpful assistant. \"\n", | ||
| "TEMPERATURE = 0.0\n", | ||
| "TOP_P = 0.95\n", | ||
| "\n", | ||
| "# RAG settings\n", | ||
| "VECTOR_DB_EMBEDDING_MODEL = \"all-MiniLM-L6-v2\"\n", | ||
| "VECTOR_DB_EMBEDDING_DIMENSION = 384\n", | ||
| "VECTOR_DB_CHUNK_SIZE = 512\n", | ||
| "\n", | ||
| "# For this demo, we are using Milvus Lite, which is our preferred solution. Any other Vector DB supported by Llama Stack can be used.\n", | ||
| "VECTOR_DB_PROVIDER_ID = 'milvus'\n", | ||
| "\n", | ||
| "# initialize the inference strategy\n", | ||
| "if TEMPERATURE > 0.0:\n", | ||
| " strategy = {\"type\": \"top_p\", \"temperature\": TEMPERATURE, \"top_p\": TOP_P}\n", | ||
| "else:\n", | ||
| " strategy = {\"type\": \"greedy\"}\n", | ||
| " \n", | ||
| "# initialize the document collection to be used for RAG\n", | ||
| "vector_db_id = f\"test_vector_db_{uuid.uuid4()}\"\n", | ||
| " \n", | ||
| "# initialize the server connection\n", | ||
| "client = LlamaStackClient(base_url=os.environ.get(\"LLAMA_STACK_ENDPOINT\", LLAMA_STACK_SERVER_URL))" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "9203de51-f570-44ab-8130-36333a54888b", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 2. Indexing the Documents\n", | ||
| "- Initialize a new document collection in the target vector DB. All parameters related to the vector DB, such as the embedding model and dimension, must be specified here.\n", | ||
| "- Provide a list of document URLs to the RAG tool. Llama Stack will handle fetching, conversion and chunking of the documents' content." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 6, | ||
| "id": "8d81ffb2-2089-4cb8-adae-f32965f206c7", | ||
| "metadata": { | ||
| "tags": [] | ||
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# define and register the document collection to be used\n", | ||
| "client.vector_dbs.register(\n", | ||
| " vector_db_id=vector_db_id,\n", | ||
| " embedding_model=VECTOR_DB_EMBEDDING_MODEL,\n", | ||
| " embedding_dimension=VECTOR_DB_EMBEDDING_DIMENSION,\n", | ||
| " provider_id=VECTOR_DB_PROVIDER_ID,\n", | ||
| ")\n", | ||
| "\n", | ||
| "# ingest the documents into the newly created document collection\n", | ||
| "urls = [\n", | ||
| " (\"https://www.openshift.guide/openshift-guide-screen.pdf\", \"application/pdf\"),\n", | ||
| " (\"https://www.cdflaborlaw.com/_images/content/2023_OCBJ_GC_Awards_Article.pdf\", \"application/pdf\"),\n", | ||
| "]\n", | ||
| "documents = [\n", | ||
| " RAGDocument(\n", | ||
| " document_id=f\"num-{i}\",\n", | ||
| " content=url,\n", | ||
| " mime_type=url_type,\n", | ||
| " metadata={},\n", | ||
| " )\n", | ||
| " for i, (url, url_type) in enumerate(urls)\n", | ||
| "]\n", | ||
| "client.tool_runtime.rag_tool.insert(\n", | ||
| " documents=documents,\n", | ||
| " vector_db_id=vector_db_id,\n", | ||
| " chunk_size_in_tokens=VECTOR_DB_CHUNK_SIZE,\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "d5639413-90d6-42ae-add4-6c89da0297e2", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 3. Executing Queries via the Built-in RAG Tool\n", | ||
| "- Directly invoke the RAG tool to query the vector DB we ingested into at the previous stage.\n", | ||
| "- Construct an extended prompt using the retrieved chunks.\n", | ||
| "- Query the model with the extended prompt.\n", | ||
| "- Output the reply received from the model." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 24, | ||
| "id": "0d39ab00-2a65-4b72-b5ed-4dd61f1204a2", | ||
| "metadata": { | ||
| "tags": [] | ||
| }, | ||
| "outputs": [ | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "\n", | ||
| "User> How to install OpenShift?\n", | ||
| "inference> To install OpenShift, you can follow these steps:\n", | ||
| "\n", | ||
| "1. Download and install the OpenShift CLI (Command Line Interface) tool from the official Red Hat website.\n", | ||
| "2. Create a new project in your OpenShift cluster using the `oc new-project` command.\n", | ||
| "3. Install the OpenShift client tools on your system by running the `oc setup` command.\n", | ||
| "4. Log in to your OpenShift cluster using the `oc login` command with your credentials.\n", | ||
| "5. Create a new application in your project using the `oc new-app` command, specifying the image URL of the container you want to deploy.\n", | ||
| "6. Once the application is created, you can access it through the OpenShift web console or by running the `oc expose` command.\n", | ||
| "\n", | ||
| "Alternatively, you can also use the odo tool provided by Red Hat to create and deploy applications on OpenShift. The odo tool allows you to create applications using \"Devfiles,\" which are YAML files that contain information about your application's programming language, dependencies, and other essential details.\n", | ||
| "\n", | ||
| "You can download the odo tool from the \"Help\" menu on the OpenShift web console through the \"Command line tools\" entry. Once installed, you can use the `odo init` command to create a new application using various programming languages, and then run the `odo push` command to build and deploy the container to the OpenShift container registry.\n", | ||
| "\n", | ||
| "Note: The above steps are general instructions and may vary depending on your specific environment and configuration. It's recommended to refer to the official Red Hat documentation for more detailed instructions and troubleshooting guides.\n", | ||
| "User> Are employees based in California eligible for remote work?\n", | ||
| "inference> Based on the provided knowledge search tool results, it appears that there are specific regulations and considerations for employers in California regarding remote work. According to the results, California has laws that require employers to reimburse employees for necessary business expenditures or losses incurred by the employee in direct consequence of the discharge of their duties, including internet usage, home electricity, computer equipment, and cell phone usage.\n", | ||
| "\n", | ||
| "Additionally, there are specific requirements for data security and privacy, as California has strong data protection laws such as the California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA). Employers must safeguard sensitive information and comply with these regulations, which apply regardless of where the employee works.\n", | ||
| "\n", | ||
| "However, it is not explicitly stated in the provided results whether employees based in California are eligible for remote work. The results seem to focus more on the responsibilities and considerations for employers rather than the eligibility of employees for remote work.\n", | ||
| "\n", | ||
| "Therefore, without further information or clarification, I would say that the answer to the question \"Are employees based in California eligible for remote work?\" is not explicitly stated in the provided knowledge search tool results." | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "queries = [\n", | ||
| " \"How to install OpenShift?\",\n", | ||
| " \"Are employees based in California eligible for remote work?\",\n", | ||
| "]\n", | ||
| "\n", | ||
| "for prompt in queries:\n", | ||
| " print(f\"\\nUser> {prompt}\")\n", | ||
| " \n", | ||
| " # RAG retrieval call\n", | ||
| " rag_response = client.tool_runtime.rag_tool.query(content=prompt, vector_db_ids=[vector_db_id])\n", | ||
| "\n", | ||
| " # the list of messages to be sent to the model must start with the system prompt\n", | ||
| " messages = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}]\n", | ||
| "\n", | ||
| " # construct the actual prompt to be executed, incorporating the original query and the retrieved content\n", | ||
| " prompt_context = rag_response.content\n", | ||
| " extended_prompt = f\"Please answer the given query using the context below.\\n\\nCONTEXT:\\n{prompt_context}\\n\\nQUERY:\\n{prompt}\"\n", | ||
| " messages.append({\"role\": \"user\", \"content\": extended_prompt})\n", | ||
| "\n", | ||
| " # use Llama Stack inference API to directly communicate with the desired model\n", | ||
| " response = client.inference.chat_completion(\n", | ||
| " messages=messages,\n", | ||
| " model_id=MODEL_ID,\n", | ||
| " sampling_params={\n", | ||
| " \"strategy\": strategy,\n", | ||
| " },\n", | ||
| " stream=True,\n", | ||
| " )\n", | ||
| " \n", | ||
| " # print the response\n", | ||
| " print(\"inference> \", end='')\n", | ||
| " for chunk in response:\n", | ||
| " response_delta = chunk.event.delta\n", | ||
| " if isinstance(response_delta, TextDelta):\n", | ||
| " print(response_delta.text, end='')\n", | ||
| " elif isinstance(response_delta, ToolCallDelta):\n", | ||
| " print(response_delta.tool_call, end='')" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "df6937a3-3efa-4b66-aaf0-85d96b6d43db", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Key Takeaways\n", | ||
| "This tutorial demonstrates how to set up and use the built-in RAG tool for ingesting user-provided documents in a vector DB and later utilizing them during inference via direct retrieval. Please check out our [complementary tutorial]([Level3_agentic_RAG.ipynb](demos/rag_agentic/notebooks/Level3_agentic_RAG.ipynb) for an agentic RAG example." | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3 (ipykernel)", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.10.9" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.