diff --git a/README.md b/README.md index 802df689..54be23f3 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Try out Playground in Kibana with the following notebooks: - [`Document Chunking with Ingest Pipelines`](./notebooks/document-chunking/with-index-pipelines.ipynb) - [`Document Chunking with LangChain Splitters`](./notebooks/document-chunking/with-langchain-splitters.ipynb) - [`Calculating tokens for Semantic Search (ELSER and E5)`](./notebooks/document-chunking/tokenization.ipynb) -- [`Fetch surrounding chucks`](./supporting-blog-content/fetch-surrounding-chunks/fetch-surrounding-chunks.ipynb) +- [`Fetch surrounding chunks`](./supporting-blog-content/fetch-surrounding-chunks/fetch-surrounding-chunks.ipynb) ### Search diff --git a/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/README.md b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/README.md new file mode 100644 index 00000000..596eb02b --- /dev/null +++ b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/README.md @@ -0,0 +1,34 @@ + +# PDF Parsing - Table Extraction + + Python notebook demonstrates an alternative approach to parsing PDFs, particularly focusing on extracting and converting tables into a format suitable for search applications such as Retrieval-Augmented Generation (RAG). The notebook leverages Azure OpenAI to process and convert table data from PDFs into plain text for better searchability and indexing. + +## Features +- **PDF Table Extraction**: The notebook identifies and parses tables from PDFs. +- **LLM Integration**: Calls Azure OpenAI models to provide a text representation of the extracted tables. +- **Search Optimization**: The parsed table data is processed into a format that can be more easily indexed and searched in Elasticsearch or other vector-based search systems. + +## Getting Started + +### Prerequisites +- Python 3.x +- Output Directory + - Example + - `/tmp` + - Parsed output file name + - Example + - `parsed_file.txt` +- Azure Account + - OpenAI deployment + - Key + - Example + - a330xxxxxxxde9xxxxxx + - Completions endpoint such as GPT-4o + - Example + - https://exampledeploy.openai.azure.com/openai/deployments/gpt-35-turbo-16k/chat/completions?api-version=2024-08-01-preview + - For more information on getting started with Azure OpenAI, check out the official [Azure OpenAI ChatGPT Quickstart](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Ctypescript%2Cpython-new&pivots=programming-language-studio). + + +## Example Use Case +This notebook is ideal for use cases where PDFs contain structured tables that need to be converted into plain text for indexing and search applications in environments like Elasticsearch or similar search systems. + diff --git a/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/alternative-approach-for-parsing-pdfs-in-rag.ipynb b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/alternative-approach-for-parsing-pdfs-in-rag.ipynb new file mode 100644 index 00000000..3229a7ae --- /dev/null +++ b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/alternative-approach-for-parsing-pdfs-in-rag.ipynb @@ -0,0 +1,216 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "e9-GuDRKCz_1" + }, + "source": [ + "# PDF Parsing - Table Extraction\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MBdflc9G0ICc" + }, + "source": [ + "## Objective\n", + "This Python script extracts text and tables from a PDF file, converts the tables into a human-readable text format using Azure OpenAI, and writes the processed content to a text file. The script uses pdfplumber to extract text and table data from each page of the PDF. For tables, it sends a cleaned version (handling any missing or None values) to Azure OpenAI, which generates a natural language summary of the table. The extracted non-table text and the summarized table text are then saved to a text file for easy search and readability." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QBwz0_VNL1p6" + }, + "outputs": [], + "source": [ + "!pip install pdfplumber pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QC37eVM90few" + }, + "source": [ + "This code imports necessary libraries for PDF extraction, data processing, and interacting with Azure OpenAI via API calls. It retrieves the Azure OpenAI API key and endpoint from Google Colab's userdata storage, sets up the required headers, and prepares for sending requests to the Azure OpenAI service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "X3vuHZTjK6l7" + }, + "outputs": [], + "source": [ + "import pdfplumber\n", + "import pandas as pd\n", + "import requests\n", + "import base64\n", + "import json\n", + "from getpass import getpass\n", + "import io # To create an in-memory file-like object\n", + "import os\n", + "\n", + "# Endpoint example\n", + "# https://my-deploymet.openai.azure.com/openai/deployments/gpt-4o-global/chat/completions?api-version=2024-08-01-preview\n", + "ENDPOINT = getpass(\"Azure OpenAI Completions Endpoint: \")\n", + "\n", + "API_KEY = getpass(\"Azure OpenAI API Key: \")\n", + "\n", + "\n", + "##Directory where parsed output file will be written to\n", + "PARSED_PDF_DIRECTORY = getpass(\"Output directory for parsed PDF: \")\n", + "\n", + "##Name of output parsed file\n", + "PARSED_PDF_FILE_NAME = getpass(\"PARSED PDF File Name: \")\n", + "\n", + "\n", + "headers = {\n", + " \"Content-Type\": \"application/json\",\n", + " \"api-key\": API_KEY,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "79VOKKam0leA" + }, + "source": [ + "This code defines two functions: extract_table_text_from_openai and parse_pdf. The extract_table_text_from_openai function sends a table's plain text to Azure OpenAI for conversion into a human-readable description by building a request payload and handling the response. The parse_pdf function processes a PDF file page by page, extracting both text and tables, and sends the extracted tables to Azure OpenAI for summarization, saving all the content (including summarized tables) to a text file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CdMm1AKJLKbA" + }, + "outputs": [], + "source": [ + "def extract_table_text_from_openai(table_text):\n", + " # Payload for the Azure OpenAI request\n", + " payload = {\n", + " \"messages\": [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"You are an AI assistant that helps convert tables into a human-readable text.\",\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": f\"Convert this table to a readable text format:\\n{table_text}\",\n", + " },\n", + " ],\n", + " \"temperature\": 0.7,\n", + " \"top_p\": 0.95,\n", + " \"max_tokens\": 4096,\n", + " }\n", + "\n", + " # Send the request to Azure OpenAI\n", + " try:\n", + " response = requests.post(ENDPOINT, headers=headers, json=payload)\n", + " response.raise_for_status() # Raise error if the request fails\n", + " except requests.RequestException as e:\n", + " raise SystemExit(f\"Failed to make the request. Error: {e}\")\n", + "\n", + " # Process the response\n", + " return (\n", + " response.json()\n", + " .get(\"choices\", [{}])[0]\n", + " .get(\"message\", {})\n", + " .get(\"content\", \"\")\n", + " .strip()\n", + " )\n", + "\n", + "\n", + "def parse_pdf_from_url(file_url):\n", + " # Download the PDF file from the URL\n", + " response = requests.get(file_url)\n", + " response.raise_for_status() # Ensure the request was successful\n", + "\n", + " # Open the PDF content with pdfplumber using io.BytesIO\n", + " pdf_content = io.BytesIO(response.content)\n", + "\n", + " # Ensure the directory exists and has write permissions\n", + " os.makedirs(PARSED_PDF_DIRECTORY, mode=0o755, exist_ok=True)\n", + "\n", + " with pdfplumber.open(pdf_content) as pdf, open(\n", + " os.path.join(PARSED_PDF_DIRECTORY, PARSED_PDF_FILE_NAME), \"w\"\n", + " ) as output_file:\n", + " for page_num, page in enumerate(pdf.pages, 1):\n", + " print(f\"Processing page {page_num}\")\n", + "\n", + " # Extract text content\n", + " text = page.extract_text()\n", + " if text:\n", + " output_file.write(f\"Page {page_num} Text:\\n\")\n", + " output_file.write(text + \"\\n\\n\")\n", + " print(\"Text extracted:\", text)\n", + "\n", + " # Extract tables\n", + " tables = page.extract_tables()\n", + " for idx, table in enumerate(tables):\n", + " print(f\"Table {idx + 1} found on page {page_num}\")\n", + "\n", + " # Convert the table into plain text format (handling None values)\n", + " table_text = \"\\n\".join(\n", + " [\n", + " \"\\t\".join(\n", + " [str(cell) if cell is not None else \"\" for cell in row]\n", + " )\n", + " for row in table[1:]\n", + " ]\n", + " )\n", + "\n", + " # Call Azure OpenAI to convert the table into a text representation\n", + " table_description = extract_table_text_from_openai(table_text)\n", + "\n", + " # Write the text representation to the file\n", + " output_file.write(\n", + " f\"Table {idx + 1} (Page {page_num}) Text Representation:\\n\"\n", + " )\n", + " output_file.write(table_description + \"\\n\\n\")\n", + " print(\"Text representation of the table:\", table_description)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7ig9NSSnLMGt" + }, + "outputs": [], + "source": [ + "# URL of the PDF file\n", + "file_url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/refs/heads/sunman/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/quarterly_report.pdf\"\n", + "\n", + "# Call the function to parse the PDF from the URL\n", + "parse_pdf_from_url(file_url)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/quarterly_report.pdf b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/quarterly_report.pdf new file mode 100644 index 00000000..48776e9c Binary files /dev/null and b/supporting-blog-content/alternative-approach-for-parsing-pdfs-in-rag/quarterly_report.pdf differ diff --git a/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/README.md b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/README.md new file mode 100644 index 00000000..e82dbb01 --- /dev/null +++ b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/README.md @@ -0,0 +1,38 @@ + +# Unifying Elastic Vector Database and LLMs for Intelligent Retrieval + +## Overview +This notebook demonstrates how to integrate Elasticsearch as a vector database (VectorDB) with search templates and LLM functions to build an intelligent query layer. By leveraging vector search, dynamic query templates, and natural language processing, this approach enhances search precision, adaptability, and efficiency. + +## Features +- **Elasticsearch as a VectorDB**: Efficiently stores and retrieves dense vector embeddings for advanced search capabilities. +- **Search Templates**: Dynamically structure queries by mapping user inputs to the appropriate index parameters. +- **LLM Functions**: Extract key search parameters from natural language queries and inject them into search templates. +- **Hybrid Search**: Combines structured filtering with semantic search to improve search accuracy and relevance. + +## Components +- **Geocode Location Function**: Converts location names into latitude and longitude for geospatial queries. +- **Handle Extract Hotel Search Parameters Function**: Processes extracted search parameters, ensuring essential values like distance are correctly assigned. +- **Call Elasticsearch Function**: Executes structured search queries using dynamically populated search templates. +- **Format and Print Messages Functions**: Enhances query debugging by formatting and displaying responses in a structured format. +- **Run Conversation Function**: Orchestrates interactions between user queries, LLM functions, and Elasticsearch search execution. +- **Search Template Management Functions**: Defines, creates, and deletes search templates to optimize query processing. + + +## Usage +1. Set up an Elasticsearch cluster and ensure vector search capabilities are enabled. +2. Define search templates to map query parameters to the index schema. +3. Use LLM functions to extract and refine search parameters from user queries. +4. Run queries using the intelligent query layer to retrieve more relevant and accurate results. + + +### Prerequisites +- Elastic Cloud instance + - With ML nodes +- Azure OpenAI + - Completions endpoint such as GPT-4o + - For more information on getting started with Azure OpenAI, check out the official [Azure OpenAI ChatGPT Quickstart](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Ctypescript%2Cpython-new&pivots=programming-language-studio). + - Azure OpenAI Key +- Google Maps API Key + - https://developers.google.com/maps/documentation/embed/get-api-key + diff --git a/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb new file mode 100644 index 00000000..c91364de --- /dev/null +++ b/supporting-blog-content/unifying-elastic-vector-database-and-llms-for-intelligent-query/Unifying_Elastic_Vector_Database_and_LLMs_for_Intelligent_Query.ipynb @@ -0,0 +1,1151 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Cl2LilHSwHsi" + }, + "source": [ + "\n", + "# Objective\n", + "This notebook demonstrates how blending the capabilities of Elasticsearch as a vector database (VectorDB), search templates, and LLM functions can provide an intelligent query layer.\n", + "\n", + "\"Open\n", + "\n", + "\n", + "- **Elasticsearch as the VectorDB**: Acts as the core search engine, storing and retrieving dense vector embeddings efficiently.\n", + "- **Search Templates**: Marry index capabilities to query parameters, enabling dynamic query generation and structured search execution.\n", + "- **LLM Functions**: Parse the possible available parameters within a query and inject them into the search template for a more intelligent and context-aware retrieval process.\n", + "\n", + "This combination enables a more sophisticated search experience, leveraging both structured and unstructured data retrieval methods.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "kply8eYngIAL", + "outputId": "d303bfa2-ce7e-4211-d425-1f618141a293" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: elasticsearch in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (8.17.1)\n", + "Collecting openai\n", + " Downloading openai-1.63.2-py3-none-any.whl.metadata (27 kB)\n", + "Requirement already satisfied: elastic-transport<9,>=8.15.1 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from elasticsearch) (8.17.0)\n", + "Requirement already satisfied: anyio<5,>=3.5.0 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from openai) (4.6.2.post1)\n", + "Collecting distro<2,>=1.7.0 (from openai)\n", + " Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)\n", + "Requirement already satisfied: httpx<1,>=0.23.0 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from openai) (0.27.2)\n", + "Collecting jiter<1,>=0.4.0 (from openai)\n", + " Downloading jiter-0.8.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (5.2 kB)\n", + "Collecting pydantic<3,>=1.9.0 (from openai)\n", + " Downloading pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)\n", + "Requirement already satisfied: sniffio in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from openai) (1.3.1)\n", + "Requirement already satisfied: tqdm>4 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from openai) (4.66.5)\n", + "Requirement already satisfied: typing-extensions<5,>=4.11 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from openai) (4.12.2)\n", + "Requirement already satisfied: idna>=2.8 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from anyio<5,>=3.5.0->openai) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.26.2 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from elastic-transport<9,>=8.15.1->elasticsearch) (2.2.3)\n", + "Requirement already satisfied: certifi in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from elastic-transport<9,>=8.15.1->elasticsearch) (2024.8.30)\n", + "Requirement already satisfied: httpcore==1.* in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from httpx<1,>=0.23.0->openai) (1.0.6)\n", + "Requirement already satisfied: h11<0.15,>=0.13 in /Users/sunilemanjee/Documents/GitHub/elasticsearch-labs/.venv/lib/python3.12/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.14.0)\n", + "Collecting annotated-types>=0.6.0 (from pydantic<3,>=1.9.0->openai)\n", + " Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)\n", + "Collecting pydantic-core==2.27.2 (from pydantic<3,>=1.9.0->openai)\n", + " Downloading pydantic_core-2.27.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.6 kB)\n", + "Downloading openai-1.63.2-py3-none-any.whl (472 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m472.3/472.3 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n", + "\u001b[?25hUsing cached distro-1.9.0-py3-none-any.whl (20 kB)\n", + "Downloading jiter-0.8.2-cp312-cp312-macosx_11_0_arm64.whl (310 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m310.3/310.3 kB\u001b[0m \u001b[31m25.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pydantic-2.10.6-py3-none-any.whl (431 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m431.7/431.7 kB\u001b[0m \u001b[31m29.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading pydantic_core-2.27.2-cp312-cp312-macosx_11_0_arm64.whl (1.8 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m33.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n", + "\u001b[?25hUsing cached annotated_types-0.7.0-py3-none-any.whl (13 kB)\n", + "Installing collected packages: pydantic-core, jiter, distro, annotated-types, pydantic, openai\n", + "Successfully installed annotated-types-0.7.0 distro-1.9.0 jiter-0.8.2 openai-1.63.2 pydantic-2.10.6 pydantic-core-2.27.2\n", + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" + ] + } + ], + "source": [ + "!pip install elasticsearch openai" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "wcPv-D6lwHsk" + }, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "from openai import AzureOpenAI\n", + "from datetime import datetime, timedelta\n", + "from zoneinfo import ZoneInfo\n", + "from elasticsearch import Elasticsearch, helpers, NotFoundError\n", + "import requests\n", + "from IPython.display import Markdown, display\n", + "import requests\n", + "from getpass import getpass" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tWSsNKGOzA8m" + }, + "source": [ + "##Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9VE-PjkvzESZ" + }, + "source": [ + "##Elasticsearch Client" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "U7bQisDzgThO", + "outputId": "8271b49f-7860-4429-9606-4914eeb7e935" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: DeprecationWarning: The 'timeout' parameter is deprecated in favor of 'request_timeout'\n", + " es = Elasticsearch(\n" + ] + }, + { + "data": { + "text/plain": [ + "ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'c2ab8f2bea544fa98be6ab7deaa857f7', 'cluster_uuid': 'ymWuH36TQy-03wb-MIx27w', 'version': {'number': '8.17.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '747663ddda3421467150de0e4301e8d4bc636b0c', 'build_date': '2025-02-05T22:10:57.067596412Z', 'build_snapshot': False, 'lucene_version': '9.12.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "es_cloud_id = getpass(prompt=\"Enter your Elasticsearch Cloud ID: \")\n", + "es_api_key = getpass(prompt=\"Enter your Elasticsearch API key: \")\n", + "\n", + "es = Elasticsearch(cloud_id=es_cloud_id, api_key=es_api_key, timeout=300)\n", + "\n", + "es.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CTQuTPVuwHsk" + }, + "source": [ + "### Define a Get Completion Function" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "wbwpo08qzn4f" + }, + "outputs": [], + "source": [ + "# Initialize the Azure OpenAI client\n", + "\n", + "##Azure endpoint can be found under Azure AI Services (Azure OpenAI) --> Keys and Endpoint --> Endpoint\n", + "\n", + "##Azure api key can be found under Azure AI Services (Azure OpenAI) --> Keys and Endpoint --> KEY 1 or KEY 2\n", + "\n", + "##Azure OpenAI Deployment name can be found under Azure AI Services (Azure OpenAI) --> Shared resources --> Deployments\n", + "\n", + "##Azure api version can be found under Azure AI Services (Azure OpenAI) --> Chat playground --> View Code --> and then look for the following\n", + "# client = AzureOpenAI(\n", + "# azure_endpoint=endpoint,\n", + "# azure_ad_token_provider=token_provider,\n", + "# api_version=\"2024-05-01-preview\", )\n", + "\n", + "ENDPOINT = getpass(\"Azure OpenAI Completions Endpoint: \")\n", + "\n", + "AZURE_API_KEY = getpass(\"Azure OpenAI API Key: \")\n", + "\n", + "API_VERSION = getpass(\"Completions Endpoint API Version: \")\n", + "\n", + "DEPLOYMENT_NAME = getpass(\"Azure OpenAI Deployment Name: \")\n", + "\n", + "client = AzureOpenAI(\n", + " azure_endpoint=ENDPOINT, api_key=AZURE_API_KEY, api_version=API_VERSION\n", + ")\n", + "\n", + "# Provide the model deployment name you want to use for this example\n", + "\n", + "deployment_name = DEPLOYMENT_NAME" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KRwV4R_dd2FA" + }, + "source": [ + "##Google Maps API Key Required" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "id": "dgpGnZQbb7U_" + }, + "outputs": [], + "source": [ + "##create google maps api key here: https://developers.google.com/maps/documentation/embed/get-api-key\n", + "GMAPS_API_KEY = getpass(prompt=\"Enter Google Maps API Key: \")\n", + "google_maps_api_key = GMAPS_API_KEY" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "id": "fiVFDWOYUgvH" + }, + "outputs": [], + "source": [ + "# Elastic index\n", + "ES_INDEX = \"hotels\"\n", + "TEMPLATE_ID = \"hotel_search_template\"\n", + "\n", + "# JSON dataset URL\n", + "DATASET_URL = \"https://ela.st/hotels-dataset\"\n", + "\n", + "ELSER_ENDPOINT_NAME = \"my-e5-endpoint\"\n", + "E5_ENDPOINT_NAME = \"my-e5-endpoint\"\n", + "\n", + "\n", + "# Define the index mapping\n", + "INDEX_MAPPING = {\n", + " \"mappings\": {\n", + " \"properties\": {\n", + " \"Address\": {\"type\": \"text\"},\n", + " \"Attractions\": {\"type\": \"text\"},\n", + " \"Description\": {\"type\": \"text\"},\n", + " \"FaxNumber\": {\"type\": \"text\"},\n", + " \"HotelCode\": {\"type\": \"long\"},\n", + " \"HotelFacilities\": {\"type\": \"text\"},\n", + " \"HotelName\": {\"type\": \"text\"},\n", + " \"HotelRating\": {\"type\": \"long\"},\n", + " \"HotelWebsiteUrl\": {\"type\": \"keyword\"},\n", + " \"Map\": {\"type\": \"keyword\"},\n", + " \"PhoneNumber\": {\"type\": \"text\"},\n", + " \"PinCode\": {\"type\": \"keyword\"},\n", + " \"cityCode\": {\"type\": \"long\"},\n", + " \"cityName\": {\"type\": \"text\"},\n", + " \"combined_fields\": {\n", + " \"type\": \"text\",\n", + " \"copy_to\": [\"semantic_description_elser\", \"semantic_description_e5\"],\n", + " },\n", + " \"countryCode\": {\"type\": \"keyword\"},\n", + " \"countryName\": {\"type\": \"keyword\"},\n", + " \"latitude\": {\"type\": \"double\"},\n", + " \"location\": {\"type\": \"geo_point\"},\n", + " \"longitude\": {\"type\": \"double\"},\n", + " \"semantic_description_e5\": {\n", + " \"type\": \"semantic_text\",\n", + " \"inference_id\": E5_ENDPOINT_NAME,\n", + " },\n", + " \"semantic_description_elser\": {\n", + " \"type\": \"semantic_text\",\n", + " \"inference_id\": ELSER_ENDPOINT_NAME,\n", + " },\n", + " }\n", + " }\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V3mK_FcFy2TA" + }, + "source": [ + "##Inferencing Endpoint Methods" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "id": "RMZoAAq5y0YA" + }, + "outputs": [], + "source": [ + "def create_inferencing_endpoints():\n", + " endpoints = [\n", + " {\n", + " \"inference_id\": ELSER_ENDPOINT_NAME,\n", + " \"task_type\": \"sparse_embedding\",\n", + " \"body\": {\n", + " \"service\": \"elasticsearch\",\n", + " \"service_settings\": {\n", + " \"num_allocations\": 2,\n", + " \"num_threads\": 1,\n", + " \"model_id\": \".elser_model_2_linux-x86_64\",\n", + " },\n", + " \"chunking_settings\": {\n", + " \"strategy\": \"sentence\",\n", + " \"max_chunk_size\": 250,\n", + " \"sentence_overlap\": 1,\n", + " },\n", + " },\n", + " },\n", + " {\n", + " \"inference_id\": E5_ENDPOINT_NAME,\n", + " \"task_type\": \"text_embedding\",\n", + " \"body\": {\n", + " \"service\": \"elasticsearch\",\n", + " \"service_settings\": {\n", + " \"num_allocations\": 2,\n", + " \"num_threads\": 1,\n", + " \"model_id\": \".multilingual-e5-small\",\n", + " },\n", + " \"chunking_settings\": {\n", + " \"strategy\": \"sentence\",\n", + " \"max_chunk_size\": 250,\n", + " \"sentence_overlap\": 1,\n", + " },\n", + " },\n", + " },\n", + " ]\n", + "\n", + " for endpoint in endpoints:\n", + " try:\n", + " es.inference.delete(inference_id=endpoint[\"inference_id\"])\n", + " print(f\"Deleted endpoint '{endpoint['inference_id']}'\")\n", + " except NotFoundError:\n", + " print(\n", + " f\"Endpoint '{endpoint['inference_id']}' does not exist. Skipping deletion.\"\n", + " )\n", + "\n", + " response = es.inference.put(\n", + " inference_id=endpoint[\"inference_id\"],\n", + " task_type=endpoint[\"task_type\"],\n", + " body=endpoint[\"body\"],\n", + " )\n", + " print(f\"Created endpoint '{endpoint['inference_id']}': {response}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QBnpy1HuzK48" + }, + "source": [ + "##Indexing and Ingestion Methods" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "id": "H3xh9gbezHfO" + }, + "outputs": [], + "source": [ + "# Step 1: Create the index with mapping\n", + "def create_index():\n", + " try:\n", + " if es.indices.exists(index=ES_INDEX):\n", + " print(f\"Index '{ES_INDEX}' already exists. Deleting and recreating...\")\n", + " es.indices.delete(index=ES_INDEX)\n", + "\n", + " es.indices.create(index=ES_INDEX, body=INDEX_MAPPING)\n", + " print(f\"Index '{ES_INDEX}' created successfully.\")\n", + " except Exception as e:\n", + " print(f\"Error creating index: {e}\")\n", + " exit(1)\n", + "\n", + "\n", + "# Step 2: Download the JSON file\n", + "def download_json():\n", + " print(\"Downloading dataset...\")\n", + " print(f\"Using URL: {DATASET_URL}\")\n", + "\n", + " # Start the request\n", + " response = requests.get(DATASET_URL, stream=True)\n", + " print(f\"Received response with status code: {response.status_code}\")\n", + "\n", + " # Check for errors\n", + " try:\n", + " response.raise_for_status()\n", + " print(\"Response status is OK.\")\n", + " except requests.HTTPError as e:\n", + " print(f\"HTTP error occurred: {e}\")\n", + " raise\n", + "\n", + " # Optionally, show some headers (use carefully in production)\n", + " print(\"Response headers:\")\n", + " for key, value in response.headers.items():\n", + " print(f\" {key}: {value}\")\n", + "\n", + " # Now return an iterator for the response lines\n", + " print(\"Returning line iterator for the response content.\")\n", + " return response.iter_lines()\n", + "\n", + "\n", + "# Step 3: Ingest JSON records into Elasticsearch\n", + "def ingest_data():\n", + " print(\"Ingesting data into Elasticsearch...\")\n", + " actions = []\n", + "\n", + " for line in download_json():\n", + " if line:\n", + " record = json.loads(line)\n", + " # Convert latitude/longitude to geo_point format\n", + " if \"latitude\" in record and \"longitude\" in record:\n", + " record[\"location\"] = {\n", + " \"lat\": record[\"latitude\"],\n", + " \"lon\": record[\"longitude\"],\n", + " }\n", + "\n", + " actions.append({\"_index\": ES_INDEX, \"_source\": record})\n", + "\n", + " # Bulk index in batches of 50\n", + " if len(actions) >= 50:\n", + " helpers.bulk(es, actions)\n", + " print(f\"Ingested {len(actions)} records...\")\n", + " actions = []\n", + "\n", + " # Ingest any remaining records\n", + " if actions:\n", + " helpers.bulk(es, actions)\n", + " print(f\"Ingested {len(actions)} remaining records.\")\n", + "\n", + " print(\"Data ingestion complete.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U9XaRgU8zRDy" + }, + "source": [ + "##Search Template\n", + "Removes the existing hotel_search_template if present and replaces it with an updated version. This ensures the template is always current and correctly structured for search operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yiiRn9qdzQBB" + }, + "outputs": [], + "source": [ + "# Search template content\n", + "search_template_content = {\n", + " \"script\": {\n", + " \"lang\": \"mustache\",\n", + " \"source\": \"\"\"{\n", + " \"_source\": false,\n", + " \"fields\": [\"HotelName\", \"HotelRating\", \"countryName\", \"cityName\", \"countryCode\", \"Attractions\"],\n", + " \"retriever\": {\n", + " \"standard\": {\n", + " \"query\": {\n", + " \"semantic\": {\n", + " \"field\": \"semantic_description_elser\",\n", + " \"query\": \"{{query}}\"\n", + " }\n", + " },\n", + " \"filter\": {\n", + " \"bool\": {\n", + " \"must\": [\n", + " {{#distance}}{\n", + " \"geo_distance\": {\n", + " \"distance\": \"{{distance}}\",\n", + " \"location\": {\n", + " \"lat\": {{latitude}},\n", + " \"lon\": {{longitude}}\n", + " }\n", + " }\n", + " }{{/distance}}\n", + " {{#rating}}{{#distance}},{{/distance}}{\n", + " \"range\": {\n", + " \"HotelRating\": {\n", + " \"gte\": {{rating}}\n", + " }\n", + " }\n", + " }{{/rating}}\n", + " {{#countryName}}{{#distance}}{{^rating}},{{/rating}}{{/distance}}{{#rating}},{{/rating}}{\n", + " \"term\": {\n", + " \"countryName\": \"{{countryName}}\"\n", + " }\n", + " }{{/countryName}}\n", + " {{#city}}{{#distance}}{{^rating}},{{/rating}}{{/distance}}{{#rating}},{{/rating}}{\n", + " \"match\": {\n", + " \"cityName\": \"{{city}}\"\n", + " }\n", + " }{{/city}}\n", + " {{#countryCode}}{{#distance}}{{^rating}},{{/rating}}{{/distance}}{{#rating}},{{/rating}}{\n", + " \"term\": {\n", + " \"countryCode\": \"{{countryCode}}\"\n", + " }\n", + " }{{/countryCode}}\n", + " {{#distance}}{{^rating}}{{/rating}}{{/distance}}{{#rating}}{{/rating}}\n", + " ],\n", + " \"should\": [\n", + " {{#attraction}}{\n", + " \"wildcard\": {\n", + " \"Attractions\": {\n", + " \"value\": \"*{{attraction}}*\",\n", + " \"case_insensitive\": true\n", + " }\n", + " }\n", + " }{{/attraction}}\n", + " ]\n", + " }\n", + " }\n", + " }\n", + " }\n", + " }\"\"\",\n", + " }\n", + "}\n", + "\n", + "\n", + "def delete_search_template(template_id):\n", + " \"\"\"Deletes the search template if it exists\"\"\"\n", + " try:\n", + " es.delete_script(id=template_id)\n", + " print(f\"Deleted existing search template: {template_id}\")\n", + " except Exception as e:\n", + " if \"not_found\" in str(e):\n", + " print(f\"Search template '{template_id}' not found, skipping delete.\")\n", + " else:\n", + " print(f\"Error deleting template '{template_id}': {e}\")\n", + "\n", + "\n", + "def create_search_template(template_id, template_content):\n", + " \"\"\"Creates a new search template\"\"\"\n", + " try:\n", + " es.put_script(id=template_id, body=template_content)\n", + " print(f\"Created search template: {template_id}\")\n", + " except Exception as e:\n", + " print(f\"Error creating template '{template_id}': {e}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qvigzp5w4fTE" + }, + "source": [ + "## Find A Hotel Method\n", + "Manages interactions between user queries, LLM functions, and Elasticsearch, orchestrating tool calls to extract search parameters and execute queries.\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "id": "L6E7DFWcTEnb" + }, + "outputs": [], + "source": [ + "def find_a_hotel(content):\n", + " messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": (\n", + " \"You are an assistant that only provides recommendations \"\n", + " \"based on the search results retrieved from Elasticsearch. \"\n", + " \"Do not make up information or answer based on assumptions. \"\n", + " \"Only use the provided data to respond to the user's queries.\"\n", + " \"Don't make assumptions about what values to use with functions. Ask for clarification if a user request is ambiguous.\"\n", + " ),\n", + " },\n", + " {\"role\": \"user\", \"content\": content},\n", + " ]\n", + "\n", + " tools = [\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"extract_hotel_search_parameters\",\n", + " \"description\": \"Extract search parameters for finding hotels (excluding the query itself). the parameters are extracted from the input query\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"query\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"the full input query\",\n", + " },\n", + " \"distance\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The search radius (e.g., 500m, 1000m).\",\n", + " },\n", + " \"rating\": {\n", + " \"type\": \"number\",\n", + " \"description\": \"The minimum hotel rating (e.g., 3, 4, or 5 stars).\",\n", + " },\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Location mentioned in the query (e.g., Belongil Beach, Byron Bay).\",\n", + " },\n", + " \"countryName\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Name of the country (e.g., Australia, Germany).\",\n", + " },\n", + " \"city\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"City name (e.g., Byron Bay, Chicago, Houston).\",\n", + " },\n", + " \"State\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"State or province (e.g., Texas, Alaska, Alberta).\",\n", + " },\n", + " \"countryCode\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The country code (e.g., AU for Australia).\",\n", + " },\n", + " \"attraction\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Hotel attractions, amenities, or descriptive terms (e.g., Beach, Museum, gym, modern, luxurious). This can include multiple options.\",\n", + " },\n", + " },\n", + " \"required\": [\"query\", \"attraction\"],\n", + " },\n", + " },\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"geocode_location\",\n", + " \"description\": \"Resolve a location to its latitude and longitude.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The name of the location, e.g., Belongil Beach.\",\n", + " }\n", + " },\n", + " \"required\": [\"location\"],\n", + " },\n", + " },\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"query_elasticsearch\",\n", + " \"description\": \"Query Elasticsearch for accommodations based on provided parameters from extract_hotel_search_parameters. Must call extract_hotel_search_parameters prior to call this function \",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"query\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The original search query (e.g., 'hotels near Belongil Beach').\",\n", + " },\n", + " \"latitude\": {\n", + " \"type\": \"number\",\n", + " \"description\": \"Latitude of the location.\",\n", + " },\n", + " \"longitude\": {\n", + " \"type\": \"number\",\n", + " \"description\": \"Longitude of the location.\",\n", + " },\n", + " \"distance\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"Search radius (e.g., '5000m', '10km').\",\n", + " },\n", + " \"rating\": {\n", + " \"type\": \"number\",\n", + " \"description\": \"Minimum hotel rating (e.g., 3, 4, 5 stars).\",\n", + " },\n", + " \"countryName\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The country name (e.g., 'Australia', 'United States').\",\n", + " },\n", + " \"countryCode\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The country code (e.g., 'AU', 'US').\",\n", + " },\n", + " \"attraction\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"hotel attractions or amenity (e.g., Beach, Museum, gym, coffee shop, pool). This can be muliple options. Any feature of a hotel can be used here. Attractions in the query may be obvious so this can be a comprehensive list\",\n", + " },\n", + " },\n", + " \"required\": [\"query\"],\n", + " },\n", + " },\n", + " },\n", + " ]\n", + "\n", + " parameters = {}\n", + " while True:\n", + " # Call the LLM with tools\n", + " response = client.chat.completions.create(\n", + " model=deployment_name,\n", + " messages=messages,\n", + " tools=tools,\n", + " tool_choice=\"auto\",\n", + " )\n", + "\n", + " response_message = response.choices[0].message\n", + " messages.append(response_message)\n", + "\n", + " # Print formatted messages for debugging\n", + " print_messages([response_message])\n", + "\n", + " # Check for tool calls\n", + " if response_message.tool_calls:\n", + " for tool_call in response_message.tool_calls:\n", + " function_name = tool_call.function.name\n", + " function_args = json.loads(tool_call.function.arguments)\n", + "\n", + " if function_name == \"extract_hotel_search_parameters\":\n", + " # Debug: Print function_args\n", + " print(\"Function Arguments for extract_hotel_search_parameters:\")\n", + " print(function_args)\n", + "\n", + " # Extract required and optional parameters\n", + " function_response = handle_extract_hotel_search_parameters(\n", + " function_args\n", + " )\n", + "\n", + " # Debug: Print function_response\n", + " print(\"Response from handle_extract_hotel_search_parameters:\")\n", + " print(function_response)\n", + "\n", + " parameters.update(json.loads(function_response))\n", + "\n", + " # Debug: Print updated parameters\n", + " print(\"Updated parameters after extract_hotel_search_parameters:\")\n", + " print(parameters)\n", + "\n", + " elif function_name == \"query_elasticsearch\":\n", + " # Ensure 'query' is present\n", + " if \"query\" not in parameters:\n", + " print(\"Error: 'query' is required for Elasticsearch queries.\")\n", + " return None\n", + "\n", + " print(\"Function Arguments for extract_hotel_search_parameters:\")\n", + " print(function_args)\n", + "\n", + " # Update parameters directly\n", + " # parameters.update(function_args)\n", + "\n", + " # Pass extracted parameters to Elasticsearch\n", + " # function_response = call_elasticsearch(parameters)\n", + " function_response = call_elasticsearch(\n", + " query=function_args.get(\"query\"),\n", + " latitude=function_args.get(\"latitude\"),\n", + " attraction=function_args.get(\"attraction\"),\n", + " longitude=function_args.get(\"longitude\"),\n", + " distance=function_args.get(\"distance\"),\n", + " rating=function_args.get(\"rating\"),\n", + " country_name=function_args.get(\"countryName\"),\n", + " country_code=function_args.get(\"countryCode\"),\n", + " )\n", + "\n", + " elif function_name == \"geocode_location\":\n", + " function_response = geocode_location(\n", + " location=function_args.get(\"location\")\n", + " )\n", + " geo_response = json.loads(function_response)\n", + " parameters.update(geo_response)\n", + "\n", + " # Debug: Print updated parameters\n", + " print(\"Updated parameters after geocode_location:\")\n", + " print(parameters)\n", + " else:\n", + " function_response = json.dumps({\"error\": \"Unknown function\"})\n", + "\n", + " # Append the tool response to the conversation\n", + " messages.append(\n", + " {\n", + " \"tool_call_id\": tool_call.id,\n", + " \"role\": \"tool\",\n", + " \"name\": function_name,\n", + " \"content\": json.dumps(function_response),\n", + " }\n", + " )\n", + " else:\n", + " # If no further tools are requested, break the loop\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6sazS56d4LCT" + }, + "source": [ + "## Format and Print Messages Functions\n", + "Formats and prints ChatCompletionMessages for better readability, displaying roles, content, function calls, and tool interactions in a structured way.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "id": "S28E_3t5PEib" + }, + "outputs": [], + "source": [ + "def format_message(message):\n", + " \"\"\"\n", + " Format a ChatCompletionMessage for easier readability.\n", + " \"\"\"\n", + " formatted_message = f\"Role: {message.role}\\n\"\n", + " if message.content:\n", + " formatted_message += f\"Content: {message.content}\\n\"\n", + " if message.function_call:\n", + " formatted_message += (\n", + " f\"Function Call:\\n\"\n", + " f\" Name: {message.function_call.name}\\n\"\n", + " f\" Arguments: {message.function_call.arguments}\\n\"\n", + " )\n", + " if message.tool_calls:\n", + " formatted_message += \"Tool Calls:\\n\"\n", + " for tool_call in message.tool_calls:\n", + " formatted_message += (\n", + " f\" Tool Call ID: {tool_call.id}\\n\"\n", + " f\" Function Name: {tool_call.function.name}\\n\"\n", + " f\" Arguments: {tool_call.function.arguments}\\n\"\n", + " )\n", + " return formatted_message\n", + "\n", + "\n", + "def print_messages(messages):\n", + " \"\"\"\n", + " Print all ChatCompletionMessages in a nicely formatted way.\n", + " \"\"\"\n", + " print(\"\\nFormatted Messages:\")\n", + " for i, message in enumerate(messages, 1):\n", + " print(f\"Message {i}:\")\n", + " print(format_message(message))\n", + " print(\"-\" * 50)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gyz8L_EL4EHR" + }, + "source": [ + "## Call Elasticsearch Function\n", + "Executes a search query using Elasticsearch with structured parameters, leveraging a search template for dynamic query generation.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "id": "tVWlVx0cNWUd" + }, + "outputs": [], + "source": [ + "def call_elasticsearch(\n", + " query,\n", + " latitude=None,\n", + " longitude=None,\n", + " distance=None,\n", + " rating=None,\n", + " country_name=None,\n", + " country_code=None,\n", + " attraction=None,\n", + "):\n", + " \"\"\"\n", + " Call Elasticsearch with the provided parameters and return a JSON-serializable response.\n", + " \"\"\"\n", + " try:\n", + " # Construct the params dictionary\n", + " params = {\n", + " \"query\": query,\n", + " \"latitude\": latitude,\n", + " \"longitude\": longitude,\n", + " \"distance\": distance,\n", + " \"rating\": rating,\n", + " \"countryName\": country_name,\n", + " \"countryCode\": country_code,\n", + " \"attraction\": attraction,\n", + " }\n", + "\n", + " # Remove None values\n", + " cleaned_params = {\n", + " key: value for key, value in params.items() if value is not None\n", + " }\n", + "\n", + " # Debug: Print the parameters for Elasticsearch\n", + " print(\"Parameters for Elasticsearch:\")\n", + " print(cleaned_params)\n", + "\n", + " # Construct the query body\n", + " query_body = {\n", + " \"id\": \"hotel_search_template\", # Replace with your actual template ID\n", + " \"params\": cleaned_params,\n", + " }\n", + "\n", + " # Debug: Print query for Elasticsearch\n", + " print(\"Elasticsearch Query:\")\n", + " print(json.dumps(query_body, indent=2))\n", + "\n", + " # Call Elasticsearch\n", + " response = es.search_template(index=ES_INDEX, body=query_body)\n", + " print(\"Elasticsearch query successful.\")\n", + "\n", + " # Convert response to a JSON-serializable dictionary\n", + " response_body = response.body\n", + "\n", + " # Extract and print the number of results\n", + " total_results = response_body.get(\"hits\", {}).get(\"total\", {}).get(\"value\", 0)\n", + " print(f\"Number of results found: {total_results}\")\n", + "\n", + " return response_body\n", + " except Exception as e:\n", + " print(f\"Error while querying Elasticsearch: {e}\")\n", + " return {\"error\": str(e)}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F96sMuQO3_UG" + }, + "source": [ + "## Handle Extract Hotel Search Parameters Function\n", + "Validates and processes extracted search parameters, ensuring required values like distance are set for location-based hotel searches." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "id": "t2E7YSk-LH_J" + }, + "outputs": [], + "source": [ + "def handle_extract_hotel_search_parameters(args):\n", + " \"\"\"\n", + " Validate and handle parameters extracted by the LLM.\n", + " \"\"\"\n", + " if \"latitude\" in args and \"longitude\" in args:\n", + " if \"distance\" not in args:\n", + " args[\"distance\"] = \"5000m\" # Default distance\n", + "\n", + " return json.dumps(args)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CabxStb63fdK" + }, + "source": [ + "##Geo Location\n", + "Resolves a location name into latitude and longitude using Google Geocoding API for geospatial search integration.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "id": "1EXrB0KdLKH1" + }, + "outputs": [], + "source": [ + "def geocode_location(location):\n", + " \"\"\"\n", + " Resolve a location to latitude and longitude using Google Geocoding API.\n", + " \"\"\"\n", + " GEOCODING_API_URL = \"https://maps.googleapis.com/maps/api/geocode/json\"\n", + " params = {\"address\": location, \"key\": google_maps_api_key}\n", + " response = requests.get(GEOCODING_API_URL, params=params)\n", + " if response.status_code == 200:\n", + " data = response.json()\n", + " if data[\"status\"] == \"OK\":\n", + " result = data[\"results\"][0][\"geometry\"][\"location\"]\n", + " return json.dumps({\"latitude\": result[\"lat\"], \"longitude\": result[\"lng\"]})\n", + " return json.dumps({\"error\": \"Geocoding failed\"})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lYUVqJUS30ij" + }, + "source": [ + "##Create inferencing endpoints, Index, and Ingestion" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jNDaGcTJ2g7g" + }, + "outputs": [], + "source": [ + "print(\"Creating inferencing endpoints...\")\n", + "create_inferencing_endpoints()\n", + "print(\"Creating hotels index...\")\n", + "create_index()\n", + "print(\"Ingesting hotels data...\")\n", + "ingest_data()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5Pze4f50gmkn" + }, + "source": [ + "## Find a hotel\n", + "Here, issue a query and notice how the query understanding layer matches possible attributes in the query to the attributes supported within the hotels index." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "OyZKPEPEl4v5", + "outputId": "2021e133-6828-4c28-d6a9-bc233092efa2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Formatted Messages:\n", + "Message 1:\n", + "Role: assistant\n", + "Tool Calls:\n", + " Tool Call ID: call_MzuWSxe6HtXp3RhfHEmJ6Z85\n", + " Function Name: extract_hotel_search_parameters\n", + " Arguments: {\"query\":\"recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym\",\"distance\":\"250m\",\"rating\":4,\"location\":\"Belongil Beach\",\"attraction\":\"recently renovated,pool,gym\"}\n", + "\n", + "--------------------------------------------------\n", + "Function Arguments for extract_hotel_search_parameters:\n", + "{'query': 'recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym', 'distance': '250m', 'rating': 4, 'location': 'Belongil Beach', 'attraction': 'recently renovated,pool,gym'}\n", + "Response from handle_extract_hotel_search_parameters:\n", + "{\"query\": \"recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym\", \"distance\": \"250m\", \"rating\": 4, \"location\": \"Belongil Beach\", \"attraction\": \"recently renovated,pool,gym\"}\n", + "Updated parameters after extract_hotel_search_parameters:\n", + "{'query': 'recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym', 'distance': '250m', 'rating': 4, 'location': 'Belongil Beach', 'attraction': 'recently renovated,pool,gym'}\n", + "\n", + "Formatted Messages:\n", + "Message 1:\n", + "Role: assistant\n", + "Tool Calls:\n", + " Tool Call ID: call_STsRLGP8GcBhgWzRuJYnKw0k\n", + " Function Name: geocode_location\n", + " Arguments: {\"location\":\"Belongil Beach\"}\n", + "\n", + "--------------------------------------------------\n", + "Updated parameters after geocode_location:\n", + "{'query': 'recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym', 'distance': '250m', 'rating': 4, 'location': 'Belongil Beach', 'attraction': 'recently renovated,pool,gym', 'latitude': -28.6337328, 'longitude': 153.6003455}\n", + "\n", + "Formatted Messages:\n", + "Message 1:\n", + "Role: assistant\n", + "Tool Calls:\n", + " Tool Call ID: call_m2XTVPxzXT0mmu0QueYO5YLI\n", + " Function Name: query_elasticsearch\n", + " Arguments: {\"query\":\"recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym\",\"latitude\":-28.6337328,\"longitude\":153.6003455,\"distance\":\"250m\",\"rating\":4,\"attraction\":\"recently renovated,pool,gym\"}\n", + "\n", + "--------------------------------------------------\n", + "Function Arguments for extract_hotel_search_parameters:\n", + "{'query': 'recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym', 'latitude': -28.6337328, 'longitude': 153.6003455, 'distance': '250m', 'rating': 4, 'attraction': 'recently renovated,pool,gym'}\n", + "Parameters for Elasticsearch:\n", + "{'query': 'recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym', 'latitude': -28.6337328, 'longitude': 153.6003455, 'distance': '250m', 'rating': 4, 'attraction': 'recently renovated,pool,gym'}\n", + "Elasticsearch Query:\n", + "{\n", + " \"id\": \"hotel_search_template\",\n", + " \"params\": {\n", + " \"query\": \"recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym\",\n", + " \"latitude\": -28.6337328,\n", + " \"longitude\": 153.6003455,\n", + " \"distance\": \"250m\",\n", + " \"rating\": 4,\n", + " \"attraction\": \"recently renovated,pool,gym\"\n", + " }\n", + "}\n", + "Elasticsearch query successful.\n", + "Number of results found: 1\n", + "\n", + "Formatted Messages:\n", + "Message 1:\n", + "Role: assistant\n", + "Content: I found the following recently renovated accommodation 250m from Belongil Beach with at least 4 stars, a pool, and a gym:\n", + "\n", + "- **Hotel Name:** Belongil Beach Apartment\n", + "- **Rating:** 4 stars\n", + "- **City:** Byron Bay, New South Wales\n", + "- **Country:** Australia\n", + "\n", + "Would you like more details or assistance with anything else?\n", + "\n", + "--------------------------------------------------\n" + ] + } + ], + "source": [ + "find_a_hotel(\n", + " \"recently renovated accommodations 250m from Belongil Beach with at least 4 stars and with a pool and gym\"\n", + ")" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}