diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb new file mode 100644 index 000000000..df378ea06 --- /dev/null +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -0,0 +1,1129 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "7VBkjcqNNxEd" + }, + "source": [ + "# Intro to GraphRAG with Google Cloud Spanner Graph and LangChain\n", + "\n", + "Spanner Graph now [integrates seamlessly with LangChain](https://cloud.google.com/python/docs/reference/langchain-google-spanner/latest#spanner-graph-store-usage), making it easier to build GraphRAG applications.\n", + "\n", + "Instead of simply retrieving relevant text snippets based on keyword similarity, GraphRAG takes a more sophisticated, structured approach to Retrieval Augmented Generation. It involves creating a knowledge graph from the text, organizing it hierarchically, summarizing key concepts, and then using this structured information to enhance the accuracy and depth of responses.\n", + "\n", + "\n", + "### Objectives\n", + "\n", + "In this tutorial, you will see a complete walkthrough of building a question-answering system using the GraphRAG method. You'll learn how to create a knowledge graph from scratch, store it efficiently in Spanner Graph, a functional FAQ system with Langchain agent.\n", + "\n", + "Google Cloud [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n", + "\n", + "This notebook goes over how to use `Spanner Graph` for GraphRAG with the custom retriever `SpannerGraphVectorContextRetriever` and compares the response of GraphRAG with conventional RAG." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zfIhwIryOls1" + }, + "source": "### Library imports" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:31.764070Z", + "iopub.status.busy": "2026-02-02T15:57:31.763024Z", + "iopub.status.idle": "2026-02-02T15:57:31.775442Z", + "shell.execute_reply": "2026-02-02T15:57:31.773617Z", + "shell.execute_reply.started": "2026-02-02T15:57:31.764002Z" + }, + "id": "EWOkHI7XOna2" + }, + "source": [ + "import copy\n", + "import json\n", + "import os\n", + "import pprint\n", + "import textwrap\n", + "import uuid\n", + "import warnings\n", + "from google.cloud import spanner\n", + "from google.cloud.spanner_admin_database_v1.types import spanner_database_admin\n", + "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain_core.documents import Document\n", + "from langchain_core.output_parsers import StrOutputParser\n", + "from langchain_core.prompts.prompt import PromptTemplate\n", + "from langchain_core.runnables import RunnablePassthrough\n", + "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", + "from langchain_google_spanner import (\n", + " SpannerGraphStore,\n", + " SpannerGraphVectorContextRetriever,\n", + " SpannerVectorStore,\n", + ")\n", + "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", + "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", + "\n", + "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n", + "\n", + "from IPython.display import Markdown" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Specify the embedding and generative language models" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:32.939959Z", + "iopub.status.busy": "2026-02-02T15:57:32.938520Z", + "iopub.status.idle": "2026-02-02T15:57:32.946397Z", + "shell.execute_reply": "2026-02-02T15:57:32.944562Z", + "shell.execute_reply.started": "2026-02-02T15:57:32.939820Z" + }, + "tags": [] + }, + "source": [ + "EMBEDDING_MODEL = \"text-embedding-004\"\n", + "GENERATIVE_MODEL = \"gemini-2.5-flash\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Set Google Cloud project ID environment variable" + }, + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-02-02T15:57:33.304054Z", + "iopub.status.busy": "2026-02-02T15:57:33.303304Z", + "iopub.status.idle": "2026-02-02T15:57:34.374542Z", + "shell.execute_reply": "2026-02-02T15:57:34.373028Z", + "shell.execute_reply.started": "2026-02-02T15:57:33.304005Z" + }, + "id": "hF0481BGOsS8", + "tags": [] + }, + "source": [ + "PROJECT_ID = !gcloud config get-value project\n", + "PROJECT_ID = PROJECT_ID[0]\n", + "REGION = \"us-central1\"\n", + "%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4TiC0RbhOwUu" + }, + "source": [ + "### Spanner API Enablement\n", + "The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:34.945131Z", + "iopub.status.busy": "2026-02-02T15:57:34.944766Z", + "iopub.status.idle": "2026-02-02T15:57:36.047063Z", + "shell.execute_reply": "2026-02-02T15:57:36.045364Z", + "shell.execute_reply.started": "2026-02-02T15:57:34.945098Z" + }, + "id": "9f3fJd5eOyRr", + "tags": [] + }, + "source": [ + "!gcloud services enable spanner.googleapis.com" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k5pxMMiMOzt7" + }, + "source": [ + "## Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mtDbLU5sO2iA" + }, + "source": [ + "### Set Spanner database values\n", + "Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:37.067571Z", + "iopub.status.busy": "2026-02-02T15:57:37.067094Z", + "iopub.status.idle": "2026-02-02T15:57:37.074919Z", + "shell.execute_reply": "2026-02-02T15:57:37.073646Z", + "shell.execute_reply.started": "2026-02-02T15:57:37.067527Z" + }, + "id": "C-I8VTIcO442", + "tags": [] + }, + "source": [ + "INSTANCE = \"graphrag-instance-v1\"\n", + "DATABASE = \"graphrag\"\n", + "GRAPH_NAME = \"retail_demo_graph\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "To create a Cloud Spanner instance using gcloud, use the gcloud spanner instances create command, specifying a unique instance ID, configuration (region/multi-region), description, and compute capacity (nodes or processing units)." + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:38.916502Z", + "iopub.status.busy": "2026-02-02T15:57:38.915326Z", + "iopub.status.idle": "2026-02-02T15:57:40.187189Z", + "shell.execute_reply": "2026-02-02T15:57:40.185580Z", + "shell.execute_reply.started": "2026-02-02T15:57:38.916432Z" + }, + "tags": [] + }, + "source": [ + "!gcloud spanner instances create {INSTANCE} --config=regional-us-central1 --description=\"Graph RAG Instance\" --processing-units=100 --edition=ENTERPRISE" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Helper method to create a spanner database and table to store the graph with nodes and edges created in graph:" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:40.190640Z", + "iopub.status.busy": "2026-02-02T15:57:40.190146Z", + "iopub.status.idle": "2026-02-02T15:57:40.200445Z", + "shell.execute_reply": "2026-02-02T15:57:40.199024Z", + "shell.execute_reply.started": "2026-02-02T15:57:40.190595Z" + }, + "tags": [] + }, + "source": [ + "def create_database(project_id, instance_id, database_id):\n", + " \"\"\"Creates a database and tables for sample data.\"\"\"\n", + "\n", + " spanner_client = spanner.Client(project_id)\n", + " database_admin_api = spanner_client.database_admin_api\n", + "\n", + " request = spanner_database_admin.CreateDatabaseRequest(\n", + " parent=database_admin_api.instance_path(\n", + " spanner_client.project, instance_id\n", + " ),\n", + " create_statement=f\"CREATE DATABASE `{database_id}`\",\n", + " extra_statements=[],\n", + " )\n", + "\n", + " operation = database_admin_api.create_database(request=request)\n", + "\n", + " print(\"Waiting for operation to complete...\")\n", + " OPERATION_TIMEOUT_SECONDS = 60\n", + " database = operation.result(OPERATION_TIMEOUT_SECONDS)\n", + "\n", + " print(\n", + " \"Created database {} on instance {}\".format(\n", + " database.name,\n", + " database_admin_api.instance_path(\n", + " spanner_client.project, instance_id\n", + " ),\n", + " )\n", + " )" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Creating a Spanner database using helper method:" + }, + { + "metadata": {}, + "cell_type": "code", + "source": "create_database(PROJECT_ID, INSTANCE, DATABASE)", + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kpAv-tpcO_iL" + }, + "source": [ + "### SpannerGraphStore\n", + "\n", + "To initialize the `SpannerGraphStore` class you need to provide 3 required arguments and other arguments are optional and only need to pass if it's different from default ones\n", + "\n", + "1. a Spanner instance id;\n", + "2. a Spanner database id belongs to the above instance id;\n", + "3. a Spanner graph name used to create a graph in the above database." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:58.161077Z", + "iopub.status.busy": "2026-02-02T15:57:58.159840Z", + "iopub.status.idle": "2026-02-02T15:57:59.091242Z", + "shell.execute_reply": "2026-02-02T15:57:59.090045Z", + "shell.execute_reply.started": "2026-02-02T15:57:58.161026Z" + }, + "id": "u589YapWQFb8", + "tags": [] + }, + "source": [ + "graph_store = SpannerGraphStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " graph_name=GRAPH_NAME,\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G7-Pe2ADQlNJ" + }, + "source": [ + "#### Add Graph Documents\n", + "To add graph documents in the graph store." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:00.454428Z", + "iopub.status.busy": "2026-02-02T15:58:00.453025Z", + "iopub.status.idle": "2026-02-02T15:58:00.729323Z", + "shell.execute_reply": "2026-02-02T15:58:00.727578Z", + "shell.execute_reply.started": "2026-02-02T15:58:00.454372Z" + }, + "tags": [] + }, + "source": [ + "!wget https://raw.githubusercontent.com/googleapis/langchain-google-spanner-python/main/samples/retaildata.zip\n", + "!unzip -o \"retaildata.zip\"" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:01.628427Z", + "iopub.status.busy": "2026-02-02T15:58:01.627271Z", + "iopub.status.idle": "2026-02-02T15:58:01.648343Z", + "shell.execute_reply": "2026-02-02T15:58:01.647003Z", + "shell.execute_reply.started": "2026-02-02T15:58:01.628354Z" + }, + "tags": [] + }, + "source": [ + "path = \"retaildata/\"\n", + "directories = [\n", + " item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item))\n", + "]\n", + "\n", + "document_lists = []\n", + "for directory in directories:\n", + " loader = DirectoryLoader(\n", + " os.path.join(path, directory), glob=\"**/*.txt\", loader_cls=TextLoader\n", + " )\n", + " document_lists.append(loader.load())" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Extract Nodes and Edges" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:02.747573Z", + "iopub.status.busy": "2026-02-02T15:58:02.746435Z", + "iopub.status.idle": "2026-02-02T16:05:57.066573Z", + "shell.execute_reply": "2026-02-02T16:05:57.065164Z", + "shell.execute_reply.started": "2026-02-02T15:58:02.747523Z" + }, + "id": "fP7XNu3aPl5c", + "tags": [] + }, + "source": [ + "def print_graph(graph_documents):\n", + " for doc in graph_documents:\n", + " print(doc.source.page_content[:100])\n", + " nodes = copy.deepcopy(doc.nodes)\n", + " for node in nodes:\n", + " if \"embedding\" in node.properties:\n", + " node.properties[\"embedding\"] = \"...\"\n", + " print(nodes)\n", + " print(doc.relationships)\n", + " print()\n", + "\n", + "\n", + "llm = ChatVertexAI(model=GENERATIVE_MODEL, temperature=0)\n", + "llm_transformer = LLMGraphTransformer(\n", + " llm=llm,\n", + " allowed_nodes=[\"Category\", \"Segment\", \"Tag\", \"Product\", \"Bundle\", \"Deal\"],\n", + " allowed_relationships=[\n", + " \"In_Category\",\n", + " \"Tagged_With\",\n", + " \"In_Segment\",\n", + " \"In_Bundle\",\n", + " \"Is_Accessory_Of\",\n", + " \"Is_Upgrade_Of\",\n", + " \"Has_Deal\",\n", + " ],\n", + " node_properties=[\n", + " \"name\",\n", + " \"price\",\n", + " \"weight\",\n", + " \"deal_end_date\",\n", + " \"features\",\n", + " ],\n", + ")\n", + "\n", + "graph_documents = []\n", + "for document_list in document_lists:\n", + " graph_documents.extend(\n", + " llm_transformer.convert_to_graph_documents(document_list)\n", + " )\n", + "\n", + "# Add embeddings to the graph documents for Product nodes\n", + "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "for graph_document in graph_documents:\n", + " for node in graph_document.nodes:\n", + " if \"features\" in node.properties:\n", + " node.properties[\"embedding\"] = embedding_service.embed_query(\n", + " node.properties[\"features\"]\n", + " )\n", + "\n", + "print_graph(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Post process extracted nodes and edges\n", + "Apply your domain knowledge to clean up and make desired fixes to the\n", + "generated graph in the earlier step." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:05:57.069626Z", + "iopub.status.busy": "2026-02-02T16:05:57.069097Z", + "iopub.status.idle": "2026-02-02T16:05:57.150925Z", + "shell.execute_reply": "2026-02-02T16:05:57.149467Z", + "shell.execute_reply.started": "2026-02-02T16:05:57.069576Z" + }, + "id": "G9OAYF0spDTF", + "tags": [] + }, + "source": [ + "# set of all valid products\n", + "products = set()\n", + "\n", + "def prune_invalid_products():\n", + " for graph_document in graph_documents:\n", + " nodes_to_remove = []\n", + " relationships_to_remove = []\n", + " for node in graph_document.nodes:\n", + " if node.type == \"Product\" and \"features\" not in node.properties:\n", + " nodes_to_remove.append(node)\n", + " else:\n", + " products.add(node.id)\n", + " for node in nodes_to_remove:\n", + " graph_document.nodes.remove(node)\n", + "\n", + "\n", + "def prune_invalid_segments(valid_segments):\n", + " for graph_document in graph_documents:\n", + " nodes_to_remove = []\n", + " for node in graph_document.nodes:\n", + " if node.type == \"Segment\" and node.id not in valid_segments:\n", + " nodes_to_remove.append(node)\n", + " for node in nodes_to_remove:\n", + " graph_document.nodes.remove(node)\n", + "\n", + "\n", + "def is_not_a_listed_product(node):\n", + " if node.type == \"Product\" and node.id not in products:\n", + " return True\n", + " return False\n", + "\n", + "\n", + "def fix_directions(relation_name, wrong_source_type):\n", + " for graph_document in graph_documents:\n", + " for relationship in graph_document.relationships:\n", + " if relationship.type == relation_name:\n", + " if relationship.source.type == wrong_source_type:\n", + " source = relationship.source\n", + " target = relationship.target\n", + " relationship.source = target\n", + " relationship.target = source\n", + "\n", + "\n", + "def prune_dangling_relationships():\n", + " # now remove all dangling relationships\n", + " for graph_document in graph_documents:\n", + " relationships_to_remove = []\n", + " for relationship in graph_document.relationships:\n", + " if is_not_a_listed_product(\n", + " relationship.source\n", + " ) or is_not_a_listed_product(relationship.target):\n", + " relationships_to_remove.append(relationship)\n", + " for relationship in relationships_to_remove:\n", + " graph_document.relationships.remove(relationship)\n", + "\n", + "\n", + "def prune_unwanted_relationships(relation_name, source, target):\n", + " node_types = {source, target}\n", + " for graph_document in graph_documents:\n", + " relationships_to_remove = []\n", + " for relationship in graph_document.relationships:\n", + " if (\n", + " relationship.type == relation_name\n", + " and {relationship.source.type, relationship.target.type}\n", + " == node_types\n", + " ):\n", + " relationships_to_remove.append(relationship)\n", + " for relationship in relationships_to_remove:\n", + " graph_document.relationships.remove(relationship)\n", + "\n", + "\n", + "prune_invalid_products()\n", + "prune_invalid_segments({\"Home\", \"Office\", \"Fitness\"})\n", + "prune_unwanted_relationships(\"IN_CATEGORY\", \"Bundle\", \"Category\")\n", + "prune_unwanted_relationships(\"IN_CATEGORY\", \"Deal\", \"Category\")\n", + "prune_unwanted_relationships(\"IN_SEGMENT\", \"Bundle\", \"Segment\")\n", + "prune_unwanted_relationships(\"IN_SEGMENT\", \"Deal\", \"Segment\")\n", + "prune_dangling_relationships()\n", + "fix_directions(\"HAS_DEAL\", \"Deal\")\n", + "fix_directions(\"IN_BUNDLE\", \"Bundle\")\n", + "print_graph(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Load data to Spanner Graph\n", + "Cleanup database from previous iterations.\n", + "!!! THIS COULD REMOVE DATA FROM YOUR DATABASE !!!" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:05:57.154006Z", + "iopub.status.busy": "2026-02-02T16:05:57.153505Z", + "iopub.status.idle": "2026-02-02T16:06:05.931687Z", + "shell.execute_reply": "2026-02-02T16:06:05.930391Z", + "shell.execute_reply.started": "2026-02-02T16:05:57.153954Z" + }, + "id": "lMXvOpRbZdau", + "tags": [] + }, + "source": [ + "graph_store.cleanup()\n", + "graph_store.add_graph_documents(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "28aMSrBbEvps" + }, + "source": [ + "### Visualization" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Visualizes Graph Data: %%spanner_graph extension renders a visual topology of your nodes and edges of the graph." + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:02.253590Z", + "iopub.status.busy": "2026-02-02T16:08:02.253063Z", + "iopub.status.idle": "2026-02-02T16:08:03.523435Z", + "shell.execute_reply": "2026-02-02T16:08:03.521667Z", + "shell.execute_reply.started": "2026-02-02T16:08:02.253540Z" + }, + "tags": [] + }, + "source": [ + "%load_ext spanner_graphs" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:05.579025Z", + "iopub.status.busy": "2026-02-02T16:08:05.576289Z", + "iopub.status.idle": "2026-02-02T16:08:05.640054Z", + "shell.execute_reply": "2026-02-02T16:08:05.638587Z", + "shell.execute_reply.started": "2026-02-02T16:08:05.578935Z" + }, + "tags": [] + }, + "source": [ + "%%spanner_graph --project {PROJECT_ID} --instance {INSTANCE} --database {DATABASE}\n", + "\n", + "GRAPH retail_demo_graph\n", + "MATCH p = ()->()\n", + "RETURN TO_JSON(p) AS path_json" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zsPxCSzXhuoB" + }, + "source": [ + "## GraphRAG flow using Spanner Graph" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Define helper method to format and print content of retrieved chunks:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "def format_docs(docs):\n", + " print(\"Context Retrieved: \\n\")\n", + " for doc in docs:\n", + " print(\"-\" * 80)\n", + " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", + " print(\"-\" * 80)\n", + " print(\"\\n\")\n", + "\n", + " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", + " return context" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Define prompt template for the LLM:" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:14.567238Z", + "iopub.status.busy": "2026-02-02T16:08:14.566827Z", + "iopub.status.idle": "2026-02-02T16:08:14.579776Z", + "shell.execute_reply": "2026-02-02T16:08:14.578253Z", + "shell.execute_reply.started": "2026-02-02T16:08:14.567203Z" + }, + "id": "19Amjrge71Ue", + "tags": [] + }, + "source": [ + "SPANNERGRAPH_QA_TEMPLATE = \"\"\"\n", + "You are a helpful and friendly AI assistant for question answering tasks for an electronics\n", + "retail online store.\n", + "Create a human readable answer for the for the question.\n", + "You should only use the information provided in the context and not use your internal knowledge.\n", + "Don't add any information.\n", + "Here is an example:\n", + "\n", + "Question: Which funds own assets over 10M?\n", + "Context:[name:ABC Fund, name:Star fund]\"\n", + "Helpful Answer: ABC Fund and Star fund have assets over 10M.\n", + "\n", + "Follow this example when generating answers.\n", + "You are given the following information:\n", + "- `Question`: the natural language question from the user\n", + "- `Graph Schema`: contains the schema of the graph database\n", + "- `Graph Query`: A Spanner Graph GQL query equivalent of the question from the user used to extract context from the graph database\n", + "- `Context`: The response from the graph database as context. The context has nodes and edges. Use the relationships.\n", + "Information:\n", + "Question: {question}\n", + "Graph Schema: {graph_schema}\n", + "Context: {context}\n", + "\n", + "Format your answer to be human readable.\n", + "Use the relationships in the context to answer the question.\n", + "Only include information that is relevant to a customer.\n", + "Helpful Answer:\"\"\"\n", + "\n", + "prompt = PromptTemplate(\n", + " template=SPANNERGRAPH_QA_TEMPLATE,\n", + " input_variables=[\"question\", \"graph_schema\", \"context\"],\n", + ")\n", + "\n", + "llm = ChatVertexAI(model=GENERATIVE_MODEL, temperature=0)\n", + "\n", + "chain = prompt | llm | StrOutputParser()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify user query:" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:15.983161Z", + "iopub.status.busy": "2026-02-02T16:08:15.982299Z", + "iopub.status.idle": "2026-02-02T16:08:15.987533Z", + "shell.execute_reply": "2026-02-02T16:08:15.986469Z", + "shell.execute_reply.started": "2026-02-02T16:08:15.983113Z" + } + }, + "source": [ + "USER_QUERY = \"Give me recommendations for a beginner drone with a good battery and camera\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "GraphRAG using Vector Search and Graph Expansion" + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Define utility method for graph rag" + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:16.789507Z", + "iopub.status.busy": "2026-02-02T16:08:16.788140Z", + "iopub.status.idle": "2026-02-02T16:08:43.451105Z", + "shell.execute_reply": "2026-02-02T16:08:43.449528Z", + "shell.execute_reply.started": "2026-02-02T16:08:16.789442Z" + }, + "id": "VK8e9vTK-nJz", + "outputId": "e7946a49-b470-4ded-8d77-28270a33d90f", + "tags": [] + }, + "source": [ + "def use_node_vector_retriever(\n", + " question, graph_store, embedding_service, label_expr, expand_by_hops\n", + "):\n", + " retriever = SpannerGraphVectorContextRetriever.from_params(\n", + " graph_store=graph_store,\n", + " embedding_service=embedding_service,\n", + " label_expr=label_expr,\n", + " expand_by_hops=expand_by_hops,\n", + " top_k=3,\n", + " # k=10,\n", + " )\n", + " context = format_docs(retriever.invoke(question))\n", + " return context" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Query retriever to get context for grounded answer generation:" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + "question = USER_QUERY\n", + "\n", + "context = use_node_vector_retriever(\n", + " question,\n", + " graph_store,\n", + " embedding_service,\n", + " label_expr=\"Product\",\n", + " expand_by_hops=1,\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Now lets test GraphRAG to explore output results:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "answer = chain.invoke(\n", + " {\n", + " \"question\": question,\n", + " \"graph_schema\": graph_store.get_schema,\n", + " \"context\": context,\n", + " }\n", + ")\n", + "\n", + "print(\"\\n\\nAnswer:\\n\")\n", + "print(textwrap.fill(answer, width=80))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-FDDqobRE-8D" + }, + "source": [ + "## Compare with Conventional RAG" + ] + }, + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-02-02T16:08:43.455057Z", + "iopub.status.busy": "2026-02-02T16:08:43.454631Z", + "iopub.status.idle": "2026-02-02T16:08:43.460439Z", + "shell.execute_reply": "2026-02-02T16:08:43.458920Z", + "shell.execute_reply.started": "2026-02-02T16:08:43.455022Z" + }, + "id": "YfRxVV8_PCSh", + "tags": [] + }, + "source": [ + "TABLE_NAME = \"rag_table\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Setup and load data for vector search" + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:43.462457Z", + "iopub.status.busy": "2026-02-02T16:08:43.462068Z", + "iopub.status.idle": "2026-02-02T16:08:56.388932Z", + "shell.execute_reply": "2026-02-02T16:08:56.387590Z", + "shell.execute_reply.started": "2026-02-02T16:08:43.462403Z" + }, + "id": "urR34wZTFDr_", + "outputId": "13a8976e-92ec-4289-cf95-83b041b0e1a3", + "tags": [] + }, + "source": [ + "def load_data_for_vector_search(splits):\n", + " embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + " SpannerVectorStore.init_vector_store_table(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " )\n", + " db = SpannerVectorStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embeddings,\n", + " )\n", + " # Add the chunks to Spanner Vector Store with embeddings\n", + " ids = [str(uuid.uuid4()) for _ in range(len(splits))]\n", + " row_ids = db.add_documents(splits, ids)\n", + "\n", + "\n", + "# Create splits for documents\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=250, chunk_overlap=100\n", + ")\n", + "splits = text_splitter.split_documents(\n", + " [document for document_list in document_lists for document in document_list]\n", + ")\n", + "\n", + "# Initialize table and load data\n", + "embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "load_data_for_vector_search(splits)" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Retrieve and generate using the vanilla RAG approach:" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:56.391504Z", + "iopub.status.busy": "2026-02-02T16:08:56.391157Z", + "iopub.status.idle": "2026-02-02T16:08:57.073707Z", + "shell.execute_reply": "2026-02-02T16:08:57.072387Z", + "shell.execute_reply.started": "2026-02-02T16:08:56.391471Z" + }, + "id": "vs9-lZiUIC07" + }, + "source": [ + "def format_docs(docs):\n", + " print(\"Context Retrieved: \\n\")\n", + " for doc in docs:\n", + " print(\"-\" * 80)\n", + " print(textwrap.fill(doc.page_content, width=80))\n", + " print(\"-\" * 80)\n", + " print(\"\\n\")\n", + "\n", + " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", + " return context\n", + "\n", + "prompt = PromptTemplate(\n", + " template=\"\"\"\n", + " You are a friendly digital shopping assistant.\n", + " Use the following pieces of retrieved context to answer the question.\n", + " If you don't know the answer, just say that you don't know.\n", + " Question: {question}\n", + " Context: {context}\n", + " Answer:\n", + " \"\"\",\n", + " input_variables=[\"context\", \"question\"],\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Define embeddings model and vector database to use it for RAG" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + "db = SpannerVectorStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embeddings,\n", + ")\n", + "vector_retriever = db.as_retriever(search_kwargs={\"k\": 3})" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Create a complete pipeline for Retrieval-Augmented Generation (RAG)" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "rag_chain = (\n", + " {\n", + " \"context\": vector_retriever | format_docs,\n", + " \"question\": RunnablePassthrough(),\n", + " }\n", + " | prompt\n", + " | llm\n", + " | StrOutputParser()\n", + ")" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Now lets test traditional RAG approach to compare output results:" + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:57.075679Z", + "iopub.status.busy": "2026-02-02T16:08:57.075311Z", + "iopub.status.idle": "2026-02-02T16:08:59.731678Z", + "shell.execute_reply": "2026-02-02T16:08:59.730461Z", + "shell.execute_reply.started": "2026-02-02T16:08:57.075607Z" + }, + "id": "lkYxWBvvIn4K", + "outputId": "6e0b259f-4a45-4662-bbf8-476294d85e0f" + }, + "source": [ + "resp = rag_chain.invoke(USER_QUERY)\n", + "print(\"\\n\\nRag Response:\\n\")\n", + "print(textwrap.fill(resp, width=80))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pM7TmfI0TEFy" + }, + "source": [ + "## Clean up the graph\n", + "\n", + "> USE IT WITH CAUTION!\n", + "\n", + "**Clean up all the nodes/edges in your graph and remove your graph definition.**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UQWq4-sITOgl" + }, + "source": [ + "graph_store.cleanup()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Copyright 2025 Google LLC\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "environment": { + "kernel": "conda-base-py", + "name": "workbench-notebooks.m138", + "type": "gcloud", + "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m138" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel) (Local)", + "language": "python", + "name": "conda-base-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.19" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/asl_genai/requirements.txt b/asl_genai/requirements.txt index 9861abe81..cff445966 100644 --- a/asl_genai/requirements.txt +++ b/asl_genai/requirements.txt @@ -8,6 +8,8 @@ google-cloud-bigquery cloudml-hypertune google-genai==1.57.0 google-adk==1.22.1 +google-cloud-spanner==3.57.0 +spanner-graph-notebook==1.1.8 # Langchain dependencies langchain==0.3.14 @@ -20,6 +22,10 @@ langchain-community==0.3.14 langchain-core==0.3.47 langchain-google-vertexai==2.0.15 langchain-text-splitters==0.3.5 +langchain-experimental==0.3.4 +langchain-google-spanner==0.9.0 +networkx==3.5 + # Utilities pyyaml==6.0.2