From c5c0bec5a9c31cd4edfbc7c1a3a39fa5a2df2d43 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Fri, 6 Feb 2026 01:26:32 +0000 Subject: [PATCH 1/9] added Rraph RAG solution --- .../solutions/gemini_spanner_graph_rag.ipynb | 1048 +++++++++++++++++ 1 file changed, 1048 insertions(+) create mode 100644 asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb new file mode 100644 index 000000000..37d44cdbb --- /dev/null +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -0,0 +1,1048 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "7VBkjcqNNxEd" + }, + "source": [ + "# Intro to GraphRAG with Google Cloud Spanner Graph and LangChain\n", + "\n", + "Spanner Graph now [integrates seamlessly with LangChain](https://cloud.google.com/python/docs/reference/langchain-google-spanner/latest#spanner-graph-store-usage), making it easier to build GraphRAG applications.\n", + "\n", + "Instead of simply retrieving relevant text snippets based on keyword similarity, GraphRAG takes a more sophisticated, structured approach to Retrieval Augmented Generation. It involves creating a knowledge graph from the text, organizing it hierarchically, summarizing key concepts, and then using this structured information to enhance the accuracy and depth of responses.\n", + "\n", + "\n", + "### Objectives\n", + "\n", + "In this tutorial, you will see a complete walkthrough of building a question-answering system using the GraphRAG method. You'll learn how to create a knowledge graph from scratch, store it efficiently in Spanner Graph, a functional FAQ system with Langchain agent.\n", + "\n", + "Google Cloud [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n", + "\n", + "This notebook goes over how to use `Spanner Graph` for GraphRAG with the custom retriever `SpannerGraphVectorContextRetriever` and compares the response of GraphRAG with conventional RAG." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zfIhwIryOls1" + }, + "source": [ + "### Configure warnings" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:31.764070Z", + "iopub.status.busy": "2026-02-02T15:57:31.763024Z", + "iopub.status.idle": "2026-02-02T15:57:31.775442Z", + "shell.execute_reply": "2026-02-02T15:57:31.773617Z", + "shell.execute_reply.started": "2026-02-02T15:57:31.764002Z" + }, + "id": "EWOkHI7XOna2" + }, + "source": [ + "import warnings\n", + "warnings.filterwarnings(\"ignore\", category=DeprecationWarning) " + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:32.939959Z", + "iopub.status.busy": "2026-02-02T15:57:32.938520Z", + "iopub.status.idle": "2026-02-02T15:57:32.946397Z", + "shell.execute_reply": "2026-02-02T15:57:32.944562Z", + "shell.execute_reply.started": "2026-02-02T15:57:32.939820Z" + }, + "tags": [] + }, + "source": [ + "EMBEDDING_MODEL = \"text-embedding-004\"\n", + "GENERATIVE_MODEL = \"gemini-2.5-flash\"" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-02-02T15:57:33.304054Z", + "iopub.status.busy": "2026-02-02T15:57:33.303304Z", + "iopub.status.idle": "2026-02-02T15:57:34.374542Z", + "shell.execute_reply": "2026-02-02T15:57:34.373028Z", + "shell.execute_reply.started": "2026-02-02T15:57:33.304005Z" + }, + "id": "hF0481BGOsS8", + "tags": [] + }, + "source": [ + "PROJECT_ID = !gcloud config list --format 'value(core.project)'\n", + "PROJECT_ID = PROJECT_ID[0]\n", + "REGION = \"us-central1\"\n", + "%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4TiC0RbhOwUu" + }, + "source": [ + "### Spanner API Enablement\n", + "The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:34.945131Z", + "iopub.status.busy": "2026-02-02T15:57:34.944766Z", + "iopub.status.idle": "2026-02-02T15:57:36.047063Z", + "shell.execute_reply": "2026-02-02T15:57:36.045364Z", + "shell.execute_reply.started": "2026-02-02T15:57:34.945098Z" + }, + "id": "9f3fJd5eOyRr", + "tags": [] + }, + "source": [ + "!gcloud services enable spanner.googleapis.com" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k5pxMMiMOzt7" + }, + "source": [ + "## Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mtDbLU5sO2iA" + }, + "source": [ + "### Set Spanner database values\n", + "Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:37.067571Z", + "iopub.status.busy": "2026-02-02T15:57:37.067094Z", + "iopub.status.idle": "2026-02-02T15:57:37.074919Z", + "shell.execute_reply": "2026-02-02T15:57:37.073646Z", + "shell.execute_reply.started": "2026-02-02T15:57:37.067527Z" + }, + "id": "C-I8VTIcO442", + "tags": [] + }, + "source": [ + "INSTANCE = \"graphrag-instance-v1\"\n", + "DATABASE = \"graphrag\"\n", + "GRAPH_NAME = \"retail_demo_graph\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "To create a Cloud Spanner instance using gcloud, use the gcloud spanner instances create command, specifying a unique instance ID, configuration (region/multi-region), description, and compute capacity (nodes or processing units)." + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:38.916502Z", + "iopub.status.busy": "2026-02-02T15:57:38.915326Z", + "iopub.status.idle": "2026-02-02T15:57:40.187189Z", + "shell.execute_reply": "2026-02-02T15:57:40.185580Z", + "shell.execute_reply.started": "2026-02-02T15:57:38.916432Z" + }, + "tags": [] + }, + "source": "!gcloud spanner instances create {INSTANCE} --config=regional-us-central1 --description=\"Graph RAG Instance\" --processing-units=100 --edition=ENTERPRISE", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Helper method to create a spanner database and table to store the graph with nodes and edges created in graph:" + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:40.190640Z", + "iopub.status.busy": "2026-02-02T15:57:40.190146Z", + "iopub.status.idle": "2026-02-02T15:57:40.200445Z", + "shell.execute_reply": "2026-02-02T15:57:40.199024Z", + "shell.execute_reply.started": "2026-02-02T15:57:40.190595Z" + }, + "tags": [] + }, + "source": [ + "def create_database(project_id, instance_id, database_id):\n", + " \"\"\"Creates a database and tables for sample data.\"\"\"\n", + " from google.cloud.spanner_admin_database_v1.types import spanner_database_admin\n", + " from google.cloud import spanner\n", + " spanner_client = spanner.Client(project_id)\n", + " database_admin_api = spanner_client.database_admin_api\n", + "\n", + " request = spanner_database_admin.CreateDatabaseRequest(\n", + " parent=database_admin_api.instance_path(spanner_client.project, instance_id),\n", + " create_statement=f\"CREATE DATABASE `{database_id}`\",\n", + " extra_statements= [])\n", + "\n", + " operation = database_admin_api.create_database(request=request)\n", + "\n", + " print(\"Waiting for operation to complete...\")\n", + " OPERATION_TIMEOUT_SECONDS=60\n", + " database = operation.result(OPERATION_TIMEOUT_SECONDS)\n", + "\n", + " print(\n", + " \"Created database {} on instance {}\".format(\n", + " database.name,\n", + " database_admin_api.instance_path(spanner_client.project, instance_id)\n", + " )\n", + " )\n", + "\n" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kpAv-tpcO_iL" + }, + "source": [ + "### SpannerGraphStore\n", + "\n", + "To initialize the `SpannerGraphStore` class you need to provide 3 required arguments and other arguments are optional and only need to pass if it's different from default ones\n", + "\n", + "1. a Spanner instance id;\n", + "2. a Spanner database id belongs to the above instance id;\n", + "3. a Spanner graph name used to create a graph in the above database." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:58.161077Z", + "iopub.status.busy": "2026-02-02T15:57:58.159840Z", + "iopub.status.idle": "2026-02-02T15:57:59.091242Z", + "shell.execute_reply": "2026-02-02T15:57:59.090045Z", + "shell.execute_reply.started": "2026-02-02T15:57:58.161026Z" + }, + "id": "u589YapWQFb8", + "tags": [] + }, + "source": [ + "from langchain_google_spanner import SpannerGraphStore\n", + "\n", + "graph_store = SpannerGraphStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " graph_name=GRAPH_NAME,\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G7-Pe2ADQlNJ" + }, + "source": [ + "#### Add Graph Documents\n", + "To add graph documents in the graph store." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:57:59.866080Z", + "iopub.status.busy": "2026-02-02T15:57:59.865664Z", + "iopub.status.idle": "2026-02-02T15:58:00.146793Z", + "shell.execute_reply": "2026-02-02T15:58:00.145046Z", + "shell.execute_reply.started": "2026-02-02T15:57:59.866030Z" + }, + "id": "Do7bCkyJagok", + "tags": [] + }, + "source": [ + "import os\n", + "from langchain_community.document_loaders import DirectoryLoader\n", + "from langchain_community.document_loaders import TextLoader\n", + "from langchain_core.documents import Document" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:00.454428Z", + "iopub.status.busy": "2026-02-02T15:58:00.453025Z", + "iopub.status.idle": "2026-02-02T15:58:00.729323Z", + "shell.execute_reply": "2026-02-02T15:58:00.727578Z", + "shell.execute_reply.started": "2026-02-02T15:58:00.454372Z" + }, + "tags": [] + }, + "source": [ + "!wget https://raw.githubusercontent.com/googleapis/langchain-google-spanner-python/main/samples/retaildata.zip\n", + "!mkdir content\n", + "!unzip -o \"retaildata.zip\"" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:01.628427Z", + "iopub.status.busy": "2026-02-02T15:58:01.627271Z", + "iopub.status.idle": "2026-02-02T15:58:01.648343Z", + "shell.execute_reply": "2026-02-02T15:58:01.647003Z", + "shell.execute_reply.started": "2026-02-02T15:58:01.628354Z" + }, + "tags": [] + }, + "source": [ + "path = \"retaildata/\"\n", + "directories = [\n", + " item for item in os.listdir(path) if os.path.isdir(os.path.join(path, item))\n", + "]\n", + "\n", + "document_lists = []\n", + "for directory in directories:\n", + " loader = DirectoryLoader(\n", + " os.path.join(path, directory), glob=\"**/*.txt\", loader_cls=TextLoader\n", + " )\n", + " document_lists.append(loader.load())" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Extract Nodes and Edges" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T15:58:02.747573Z", + "iopub.status.busy": "2026-02-02T15:58:02.746435Z", + "iopub.status.idle": "2026-02-02T16:05:57.066573Z", + "shell.execute_reply": "2026-02-02T16:05:57.065164Z", + "shell.execute_reply.started": "2026-02-02T15:58:02.747523Z" + }, + "id": "fP7XNu3aPl5c", + "tags": [] + }, + "source": [ + "import copy\n", + "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", + "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", + "\n", + "\n", + "def print_graph(graph_documents):\n", + " for doc in graph_documents:\n", + " print(doc.source.page_content[:100])\n", + " nodes = copy.deepcopy(doc.nodes)\n", + " for node in nodes:\n", + " if \"embedding\" in node.properties:\n", + " node.properties[\"embedding\"] = \"...\"\n", + " print(nodes)\n", + " print(doc.relationships)\n", + " print()\n", + "\n", + "\n", + "llm = ChatVertexAI(model=GENERATIVE_MODEL, temperature=0)\n", + "llm_transformer = LLMGraphTransformer(\n", + " llm=llm,\n", + " allowed_nodes=[\"Category\", \"Segment\", \"Tag\", \"Product\", \"Bundle\", \"Deal\"],\n", + " allowed_relationships=[\n", + " \"In_Category\",\n", + " \"Tagged_With\",\n", + " \"In_Segment\",\n", + " \"In_Bundle\",\n", + " \"Is_Accessory_Of\",\n", + " \"Is_Upgrade_Of\",\n", + " \"Has_Deal\",\n", + " ],\n", + " node_properties=[\n", + " \"name\",\n", + " \"price\",\n", + " \"weight\",\n", + " \"deal_end_date\",\n", + " \"features\",\n", + " ],\n", + ")\n", + "\n", + "graph_documents = []\n", + "for document_list in document_lists:\n", + " graph_documents.extend(llm_transformer.convert_to_graph_documents(document_list))\n", + "\n", + "# Add embeddings to the graph documents for Product nodes\n", + "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "for graph_document in graph_documents:\n", + " for node in graph_document.nodes:\n", + " if \"features\" in node.properties:\n", + " node.properties[\"embedding\"] = embedding_service.embed_query(\n", + " node.properties[\"features\"]\n", + " )\n", + "\n", + "print_graph(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Post process extracted nodes and edges\n", + "Apply your domain knowledge to clean up and make desired fixes to the\n", + "generated graph in the earlier step." + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:05:57.069626Z", + "iopub.status.busy": "2026-02-02T16:05:57.069097Z", + "iopub.status.idle": "2026-02-02T16:05:57.150925Z", + "shell.execute_reply": "2026-02-02T16:05:57.149467Z", + "shell.execute_reply.started": "2026-02-02T16:05:57.069576Z" + }, + "id": "G9OAYF0spDTF", + "tags": [] + }, + "source": [ + "# set of all valid products\n", + "products = set()\n", + "\n", + "\n", + "def prune_invalid_products():\n", + " for graph_document in graph_documents:\n", + " nodes_to_remove = []\n", + " relationships_to_remove = []\n", + " for node in graph_document.nodes:\n", + " if node.type == \"Product\" and \"features\" not in node.properties:\n", + " nodes_to_remove.append(node)\n", + " else:\n", + " products.add(node.id)\n", + " for node in nodes_to_remove:\n", + " graph_document.nodes.remove(node)\n", + "\n", + "\n", + "def prune_invalid_segments(valid_segments):\n", + " for graph_document in graph_documents:\n", + " nodes_to_remove = []\n", + " for node in graph_document.nodes:\n", + " if node.type == \"Segment\" and node.id not in valid_segments:\n", + " nodes_to_remove.append(node)\n", + " for node in nodes_to_remove:\n", + " graph_document.nodes.remove(node)\n", + "\n", + "\n", + "def is_not_a_listed_product(node):\n", + " if node.type == \"Product\" and node.id not in products:\n", + " return True\n", + " return False\n", + "\n", + "\n", + "def fix_directions(relation_name, wrong_source_type):\n", + " for graph_document in graph_documents:\n", + " for relationship in graph_document.relationships:\n", + " if relationship.type == relation_name:\n", + " if relationship.source.type == wrong_source_type:\n", + " source = relationship.source\n", + " target = relationship.target\n", + " relationship.source = target\n", + " relationship.target = source\n", + "\n", + "\n", + "def prune_dangling_relationships():\n", + " # now remove all dangling relationships\n", + " for graph_document in graph_documents:\n", + " relationships_to_remove = []\n", + " for relationship in graph_document.relationships:\n", + " if is_not_a_listed_product(relationship.source) or is_not_a_listed_product(\n", + " relationship.target\n", + " ):\n", + " relationships_to_remove.append(relationship)\n", + " for relationship in relationships_to_remove:\n", + " graph_document.relationships.remove(relationship)\n", + "\n", + "\n", + "def prune_unwanted_relationships(relation_name, source, target):\n", + " node_types = set([source, target])\n", + " for graph_document in graph_documents:\n", + " relationships_to_remove = []\n", + " for relationship in graph_document.relationships:\n", + " if (\n", + " relationship.type == relation_name\n", + " and set([relationship.source.type, relationship.target.type])\n", + " == node_types\n", + " ):\n", + " relationships_to_remove.append(relationship)\n", + " for relationship in relationships_to_remove:\n", + " graph_document.relationships.remove(relationship)\n", + "\n", + "\n", + "prune_invalid_products()\n", + "prune_invalid_segments(set([\"Home\", \"Office\", \"Fitness\"]))\n", + "prune_unwanted_relationships(\"IN_CATEGORY\", \"Bundle\", \"Category\")\n", + "prune_unwanted_relationships(\"IN_CATEGORY\", \"Deal\", \"Category\")\n", + "prune_unwanted_relationships(\"IN_SEGMENT\", \"Bundle\", \"Segment\")\n", + "prune_unwanted_relationships(\"IN_SEGMENT\", \"Deal\", \"Segment\")\n", + "prune_dangling_relationships()\n", + "fix_directions(\"HAS_DEAL\", \"Deal\")\n", + "fix_directions(\"IN_BUNDLE\", \"Bundle\")\n", + "print_graph(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Load data to Spanner Graph\n", + "Cleanup database from previous iterations.\n", + "!!! THIS COULD REMOVE DATA FROM YOUR DATABASE !!!" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:05:57.154006Z", + "iopub.status.busy": "2026-02-02T16:05:57.153505Z", + "iopub.status.idle": "2026-02-02T16:06:05.931687Z", + "shell.execute_reply": "2026-02-02T16:06:05.930391Z", + "shell.execute_reply.started": "2026-02-02T16:05:57.153954Z" + }, + "id": "lMXvOpRbZdau", + "tags": [] + }, + "source": [ + "graph_store.cleanup()\n", + "graph_store.add_graph_documents(graph_documents)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "28aMSrBbEvps" + }, + "source": [ + "### Visualization" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:02.253590Z", + "iopub.status.busy": "2026-02-02T16:08:02.253063Z", + "iopub.status.idle": "2026-02-02T16:08:03.523435Z", + "shell.execute_reply": "2026-02-02T16:08:03.521667Z", + "shell.execute_reply.started": "2026-02-02T16:08:02.253540Z" + }, + "tags": [] + }, + "source": [ + "%load_ext spanner_graphs" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:05.579025Z", + "iopub.status.busy": "2026-02-02T16:08:05.576289Z", + "iopub.status.idle": "2026-02-02T16:08:05.640054Z", + "shell.execute_reply": "2026-02-02T16:08:05.638587Z", + "shell.execute_reply.started": "2026-02-02T16:08:05.578935Z" + }, + "tags": [] + }, + "source": [ + "%%spanner_graph --project {PROJECT_ID} --instance {INSTANCE} --database {DATABASE}\n", + "\n", + "GRAPH retail_demo_graph\n", + "MATCH p = ()->()\n", + "RETURN TO_JSON(p) AS path_json" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zsPxCSzXhuoB" + }, + "source": [ + "## GraphRAG flow using Spanner Graph" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:14.567238Z", + "iopub.status.busy": "2026-02-02T16:08:14.566827Z", + "iopub.status.idle": "2026-02-02T16:08:14.579776Z", + "shell.execute_reply": "2026-02-02T16:08:14.578253Z", + "shell.execute_reply.started": "2026-02-02T16:08:14.567203Z" + }, + "id": "19Amjrge71Ue", + "tags": [] + }, + "source": [ + "from langchain_core.output_parsers import StrOutputParser\n", + "from langchain_core.prompts.prompt import PromptTemplate\n", + "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", + "\n", + "from IPython.display import Markdown\n", + "import textwrap\n", + "import json\n", + "import pprint\n", + "\n", + "# Retrieve and generate using the relevant snippets of the blog.\n", + "def format_docs(docs):\n", + " print(\"Context Retrieved: \\n\")\n", + " for doc in docs:\n", + " print(\"-\"*80)\n", + " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", + " #print(json.dumps(json.loads(doc.page_content)[0], indent=4))\n", + " print(\"-\"*80)\n", + " print(\"\\n\")\n", + "\n", + " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", + " return context\n", + "\n", + "\n", + "SPANNERGRAPH_QA_TEMPLATE = \"\"\"\n", + "You are a helpful and friendly AI assistant for question answering tasks for an electronics\n", + "retail online store.\n", + "Create a human readable answer for the for the question.\n", + "You should only use the information provided in the context and not use your internal knowledge.\n", + "Don't add any information.\n", + "Here is an example:\n", + "\n", + "Question: Which funds own assets over 10M?\n", + "Context:[name:ABC Fund, name:Star fund]\"\n", + "Helpful Answer: ABC Fund and Star fund have assets over 10M.\n", + "\n", + "Follow this example when generating answers.\n", + "You are given the following information:\n", + "- `Question`: the natural language question from the user\n", + "- `Graph Schema`: contains the schema of the graph database\n", + "- `Graph Query`: A Spanner Graph GQL query equivalent of the question from the user used to extract context from the graph database\n", + "- `Context`: The response from the graph database as context. The context has nodes and edges. Use the relationships.\n", + "Information:\n", + "Question: {question}\n", + "Graph Schema: {graph_schema}\n", + "Context: {context}\n", + "\n", + "Format your answer to be human readable.\n", + "Use the relationships in the context to answer the question.\n", + "Only include information that is relevant to a customer.\n", + "Helpful Answer:\"\"\"\n", + "\n", + "prompt = PromptTemplate(\n", + " template=SPANNERGRAPH_QA_TEMPLATE,\n", + " input_variables=[\"question\", \"graph_schema\", \"context\"],\n", + ")\n", + "\n", + "llm = ChatVertexAI(model=GENERATIVE_MODEL, temperature=0)\n", + "\n", + "chain = prompt | llm | StrOutputParser()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify user query:" + ] + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:15.983161Z", + "iopub.status.busy": "2026-02-02T16:08:15.982299Z", + "iopub.status.idle": "2026-02-02T16:08:15.987533Z", + "shell.execute_reply": "2026-02-02T16:08:15.986469Z", + "shell.execute_reply.started": "2026-02-02T16:08:15.983113Z" + } + }, + "source": [ + "USER_QUERY = \"Give me recommendations for a beginner drone with a good battery and camera\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "GraphRAG using Vector Search and Graph Expansion" + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:16.789507Z", + "iopub.status.busy": "2026-02-02T16:08:16.788140Z", + "iopub.status.idle": "2026-02-02T16:08:43.451105Z", + "shell.execute_reply": "2026-02-02T16:08:43.449528Z", + "shell.execute_reply.started": "2026-02-02T16:08:16.789442Z" + }, + "id": "VK8e9vTK-nJz", + "outputId": "e7946a49-b470-4ded-8d77-28270a33d90f", + "tags": [] + }, + "source": [ + "import textwrap\n", + "from langchain_google_spanner import SpannerGraphVectorContextRetriever\n", + "from langchain_google_vertexai import VertexAIEmbeddings\n", + "\n", + "\n", + "def use_node_vector_retriever(\n", + " question, graph_store, embedding_service, label_expr, expand_by_hops\n", + "):\n", + " retriever = SpannerGraphVectorContextRetriever.from_params(\n", + " graph_store=graph_store,\n", + " embedding_service=embedding_service,\n", + " label_expr=label_expr,\n", + " expand_by_hops=expand_by_hops,\n", + " top_k=3,\n", + " #k=10,\n", + " )\n", + " context = format_docs(retriever.invoke(question))\n", + " return context\n", + "\n", + "\n", + "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + "question = USER_QUERY\n", + "\n", + "context = use_node_vector_retriever(\n", + " question, graph_store, embedding_service, label_expr=\"Product\", expand_by_hops=1\n", + ")\n", + "\n", + "answer = chain.invoke(\n", + " {\"question\": question, \"graph_schema\": graph_store.get_schema, \"context\": context}\n", + ")\n", + "\n", + "print(\"\\n\\nAnswer:\\n\")\n", + "print(textwrap.fill(answer, width=80))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-FDDqobRE-8D" + }, + "source": [ + "## Compare with Conventional RAG" + ] + }, + { + "cell_type": "code", + "metadata": { + "cellView": "form", + "execution": { + "iopub.execute_input": "2026-02-02T16:08:43.455057Z", + "iopub.status.busy": "2026-02-02T16:08:43.454631Z", + "iopub.status.idle": "2026-02-02T16:08:43.460439Z", + "shell.execute_reply": "2026-02-02T16:08:43.458920Z", + "shell.execute_reply.started": "2026-02-02T16:08:43.455022Z" + }, + "id": "YfRxVV8_PCSh", + "tags": [] + }, + "source": [ + "TABLE_NAME = \"rag_table\"" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Setup and load data for vector search" + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:43.462457Z", + "iopub.status.busy": "2026-02-02T16:08:43.462068Z", + "iopub.status.idle": "2026-02-02T16:08:56.388932Z", + "shell.execute_reply": "2026-02-02T16:08:56.387590Z", + "shell.execute_reply.started": "2026-02-02T16:08:43.462403Z" + }, + "id": "urR34wZTFDr_", + "outputId": "13a8976e-92ec-4289-cf95-83b041b0e1a3", + "tags": [] + }, + "source": [ + "from langchain_google_spanner import SpannerVectorStore\n", + "from langchain_google_vertexai import VertexAIEmbeddings\n", + "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", + "\n", + "import uuid\n", + "\n", + "\n", + "def load_data_for_vector_search(splits):\n", + " embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + " SpannerVectorStore.init_vector_store_table(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " )\n", + " db = SpannerVectorStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embeddings,\n", + " )\n", + " # Add the chunks to Spanner Vector Store with embeddings\n", + " ids = [str(uuid.uuid4()) for _ in range(len(splits))]\n", + " row_ids = db.add_documents(splits, ids)\n", + "\n", + "\n", + "# Create splits for documents\n", + "text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=100)\n", + "splits = text_splitter.split_documents(\n", + " [document for document_list in document_lists for document in document_list]\n", + ")\n", + "\n", + "# Initialize table and load data\n", + "embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "load_data_for_vector_search(splits)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "execution": { + "iopub.execute_input": "2026-02-02T16:08:56.391504Z", + "iopub.status.busy": "2026-02-02T16:08:56.391157Z", + "iopub.status.idle": "2026-02-02T16:08:57.073707Z", + "shell.execute_reply": "2026-02-02T16:08:57.072387Z", + "shell.execute_reply.started": "2026-02-02T16:08:56.391471Z" + }, + "id": "vs9-lZiUIC07" + }, + "source": [ + "from langchain_core.runnables import RunnablePassthrough\n", + "from langchain_google_spanner import SpannerVectorStore\n", + "import textwrap\n", + "\n", + "\n", + "# Retrieve and generate using the relevant snippets of the blog.\n", + "def format_docs(docs):\n", + " print(\"Context Retrieved: \\n\")\n", + " for doc in docs:\n", + " print(\"-\"*80)\n", + " print(textwrap.fill(doc.page_content, width=80))\n", + " print(\"-\"*80)\n", + " print(\"\\n\")\n", + "\n", + " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", + " return context\n", + "\n", + "\n", + "prompt = PromptTemplate(\n", + " template=\"\"\"\n", + " You are a friendly digital shopping assistant.\n", + " Use the following pieces of retrieved context to answer the question.\n", + " If you don't know the answer, just say that you don't know.\n", + " Question: {question}\n", + " Context: {context}\n", + " Answer:\n", + " \"\"\",\n", + " input_variables=[\"context\", \"question\"],\n", + ")\n", + "\n", + "# Create a rag chain\n", + "embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", + "\n", + "db = SpannerVectorStore(\n", + " instance_id=INSTANCE,\n", + " database_id=DATABASE,\n", + " table_name=TABLE_NAME,\n", + " embedding_service=embeddings,\n", + ")\n", + "vector_retriever = db.as_retriever(search_kwargs={\"k\": 3})\n", + "rag_chain = (\n", + " {\n", + " \"context\": vector_retriever | format_docs,\n", + " \"question\": RunnablePassthrough(),\n", + " }\n", + " | prompt\n", + " | llm\n", + " | StrOutputParser()\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "execution": { + "iopub.execute_input": "2026-02-02T16:08:57.075679Z", + "iopub.status.busy": "2026-02-02T16:08:57.075311Z", + "iopub.status.idle": "2026-02-02T16:08:59.731678Z", + "shell.execute_reply": "2026-02-02T16:08:59.730461Z", + "shell.execute_reply.started": "2026-02-02T16:08:57.075607Z" + }, + "id": "lkYxWBvvIn4K", + "outputId": "6e0b259f-4a45-4662-bbf8-476294d85e0f" + }, + "source": [ + "import textwrap\n", + "\n", + "resp = rag_chain.invoke(USER_QUERY)\n", + "print(\"\\n\\nRag Response:\\n\")\n", + "print(textwrap.fill(resp, width=80))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pM7TmfI0TEFy" + }, + "source": [ + "## Clean up the graph\n", + "\n", + "> USE IT WITH CAUTION!\n", + "\n", + "**Clean up all the nodes/edges in your graph and remove your graph definition.**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UQWq4-sITOgl" + }, + "source": [ + "graph_store.cleanup()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Copyright 2025 Google LLC\n", + "\n", + "Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "you may not use this file except in compliance with the License.\n", + "You may obtain a copy of the License at\n", + "\n", + " https://www.apache.org/licenses/LICENSE-2.0\n", + "\n", + "Unless required by applicable law or agreed to in writing, software\n", + "distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "See the License for the specific language governing permissions and\n", + "limitations under the License." + ] + } + ], + "metadata": { + "colab": { + "provenance": [], + "toc_visible": true + }, + "environment": { + "kernel": "conda-base-py", + "name": "workbench-notebooks.m138", + "type": "gcloud", + "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m138" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel) (Local)", + "language": "python", + "name": "conda-base-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.19" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 8248af31e308d789bf0d041c03f8a3701b8e2d5b Mon Sep 17 00:00:00 2001 From: olex-snk Date: Fri, 6 Feb 2026 01:27:53 +0000 Subject: [PATCH 2/9] style fix2 --- .../solutions/gemini_spanner_graph_rag.ipynb | 97 ++++++++++++------- 1 file changed, 62 insertions(+), 35 deletions(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index 37d44cdbb..17acd2dad 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -45,7 +45,8 @@ }, "source": [ "import warnings\n", - "warnings.filterwarnings(\"ignore\", category=DeprecationWarning) " + "\n", + "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)" ], "outputs": [], "execution_count": null @@ -178,7 +179,9 @@ }, "tags": [] }, - "source": "!gcloud spanner instances create {INSTANCE} --config=regional-us-central1 --description=\"Graph RAG Instance\" --processing-units=100 --edition=ENTERPRISE", + "source": [ + "!gcloud spanner instances create {INSTANCE} --config=regional-us-central1 --description=\"Graph RAG Instance\" --processing-units=100 --edition=ENTERPRISE" + ], "outputs": [], "execution_count": null }, @@ -202,29 +205,36 @@ "source": [ "def create_database(project_id, instance_id, database_id):\n", " \"\"\"Creates a database and tables for sample data.\"\"\"\n", - " from google.cloud.spanner_admin_database_v1.types import spanner_database_admin\n", " from google.cloud import spanner\n", + " from google.cloud.spanner_admin_database_v1.types import (\n", + " spanner_database_admin,\n", + " )\n", + "\n", " spanner_client = spanner.Client(project_id)\n", " database_admin_api = spanner_client.database_admin_api\n", "\n", " request = spanner_database_admin.CreateDatabaseRequest(\n", - " parent=database_admin_api.instance_path(spanner_client.project, instance_id),\n", + " parent=database_admin_api.instance_path(\n", + " spanner_client.project, instance_id\n", + " ),\n", " create_statement=f\"CREATE DATABASE `{database_id}`\",\n", - " extra_statements= [])\n", + " extra_statements=[],\n", + " )\n", "\n", " operation = database_admin_api.create_database(request=request)\n", "\n", " print(\"Waiting for operation to complete...\")\n", - " OPERATION_TIMEOUT_SECONDS=60\n", + " OPERATION_TIMEOUT_SECONDS = 60\n", " database = operation.result(OPERATION_TIMEOUT_SECONDS)\n", "\n", " print(\n", " \"Created database {} on instance {}\".format(\n", " database.name,\n", - " database_admin_api.instance_path(spanner_client.project, instance_id)\n", + " database_admin_api.instance_path(\n", + " spanner_client.project, instance_id\n", + " ),\n", " )\n", - " )\n", - "\n" + " )" ], "outputs": [], "execution_count": null @@ -294,8 +304,8 @@ }, "source": [ "import os\n", - "from langchain_community.document_loaders import DirectoryLoader\n", - "from langchain_community.document_loaders import TextLoader\n", + "\n", + "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", "from langchain_core.documents import Document" ], "outputs": [], @@ -371,6 +381,7 @@ }, "source": [ "import copy\n", + "\n", "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", "\n", @@ -411,7 +422,9 @@ "\n", "graph_documents = []\n", "for document_list in document_lists:\n", - " graph_documents.extend(llm_transformer.convert_to_graph_documents(document_list))\n", + " graph_documents.extend(\n", + " llm_transformer.convert_to_graph_documents(document_list)\n", + " )\n", "\n", "# Add embeddings to the graph documents for Product nodes\n", "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", @@ -499,22 +512,22 @@ " for graph_document in graph_documents:\n", " relationships_to_remove = []\n", " for relationship in graph_document.relationships:\n", - " if is_not_a_listed_product(relationship.source) or is_not_a_listed_product(\n", - " relationship.target\n", - " ):\n", + " if is_not_a_listed_product(\n", + " relationship.source\n", + " ) or is_not_a_listed_product(relationship.target):\n", " relationships_to_remove.append(relationship)\n", " for relationship in relationships_to_remove:\n", " graph_document.relationships.remove(relationship)\n", "\n", "\n", "def prune_unwanted_relationships(relation_name, source, target):\n", - " node_types = set([source, target])\n", + " node_types = {source, target}\n", " for graph_document in graph_documents:\n", " relationships_to_remove = []\n", " for relationship in graph_document.relationships:\n", " if (\n", " relationship.type == relation_name\n", - " and set([relationship.source.type, relationship.target.type])\n", + " and {relationship.source.type, relationship.target.type}\n", " == node_types\n", " ):\n", " relationships_to_remove.append(relationship)\n", @@ -523,7 +536,7 @@ "\n", "\n", "prune_invalid_products()\n", - "prune_invalid_segments(set([\"Home\", \"Office\", \"Fitness\"]))\n", + "prune_invalid_segments({\"Home\", \"Office\", \"Fitness\"})\n", "prune_unwanted_relationships(\"IN_CATEGORY\", \"Bundle\", \"Category\")\n", "prune_unwanted_relationships(\"IN_CATEGORY\", \"Deal\", \"Category\")\n", "prune_unwanted_relationships(\"IN_SEGMENT\", \"Bundle\", \"Segment\")\n", @@ -637,23 +650,25 @@ "tags": [] }, "source": [ + "import json\n", + "import pprint\n", + "import textwrap\n", + "\n", + "from IPython.display import Markdown\n", "from langchain_core.output_parsers import StrOutputParser\n", "from langchain_core.prompts.prompt import PromptTemplate\n", "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", "\n", - "from IPython.display import Markdown\n", - "import textwrap\n", - "import json\n", - "import pprint\n", - "\n", "# Retrieve and generate using the relevant snippets of the blog.\n", + "\n", + "\n", "def format_docs(docs):\n", " print(\"Context Retrieved: \\n\")\n", " for doc in docs:\n", - " print(\"-\"*80)\n", + " print(\"-\" * 80)\n", " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", - " #print(json.dumps(json.loads(doc.page_content)[0], indent=4))\n", - " print(\"-\"*80)\n", + " # print(json.dumps(json.loads(doc.page_content)[0], indent=4))\n", + " print(\"-\" * 80)\n", " print(\"\\n\")\n", "\n", " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", @@ -748,6 +763,7 @@ }, "source": [ "import textwrap\n", + "\n", "from langchain_google_spanner import SpannerGraphVectorContextRetriever\n", "from langchain_google_vertexai import VertexAIEmbeddings\n", "\n", @@ -761,7 +777,7 @@ " label_expr=label_expr,\n", " expand_by_hops=expand_by_hops,\n", " top_k=3,\n", - " #k=10,\n", + " # k=10,\n", " )\n", " context = format_docs(retriever.invoke(question))\n", " return context\n", @@ -772,11 +788,19 @@ "question = USER_QUERY\n", "\n", "context = use_node_vector_retriever(\n", - " question, graph_store, embedding_service, label_expr=\"Product\", expand_by_hops=1\n", + " question,\n", + " graph_store,\n", + " embedding_service,\n", + " label_expr=\"Product\",\n", + " expand_by_hops=1,\n", ")\n", "\n", "answer = chain.invoke(\n", - " {\"question\": question, \"graph_schema\": graph_store.get_schema, \"context\": context}\n", + " {\n", + " \"question\": question,\n", + " \"graph_schema\": graph_store.get_schema,\n", + " \"context\": context,\n", + " }\n", ")\n", "\n", "print(\"\\n\\nAnswer:\\n\")\n", @@ -837,12 +861,12 @@ "tags": [] }, "source": [ + "import uuid\n", + "\n", "from langchain_google_spanner import SpannerVectorStore\n", "from langchain_google_vertexai import VertexAIEmbeddings\n", "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", "\n", - "import uuid\n", - "\n", "\n", "def load_data_for_vector_search(splits):\n", " embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", @@ -864,7 +888,9 @@ "\n", "\n", "# Create splits for documents\n", - "text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=100)\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=250, chunk_overlap=100\n", + ")\n", "splits = text_splitter.split_documents(\n", " [document for document_list in document_lists for document in document_list]\n", ")\n", @@ -889,18 +915,19 @@ "id": "vs9-lZiUIC07" }, "source": [ + "import textwrap\n", + "\n", "from langchain_core.runnables import RunnablePassthrough\n", "from langchain_google_spanner import SpannerVectorStore\n", - "import textwrap\n", "\n", "\n", "# Retrieve and generate using the relevant snippets of the blog.\n", "def format_docs(docs):\n", " print(\"Context Retrieved: \\n\")\n", " for doc in docs:\n", - " print(\"-\"*80)\n", + " print(\"-\" * 80)\n", " print(textwrap.fill(doc.page_content, width=80))\n", - " print(\"-\"*80)\n", + " print(\"-\" * 80)\n", " print(\"\\n\")\n", "\n", " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", From 5310b3247b5e375f1c9b3aaa7744894895c28d65 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 6 Feb 2026 04:22:29 +0000 Subject: [PATCH 3/9] update requirements.txt in asl_genai --- asl_genai/requirements.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/asl_genai/requirements.txt b/asl_genai/requirements.txt index 9861abe81..cff445966 100644 --- a/asl_genai/requirements.txt +++ b/asl_genai/requirements.txt @@ -8,6 +8,8 @@ google-cloud-bigquery cloudml-hypertune google-genai==1.57.0 google-adk==1.22.1 +google-cloud-spanner==3.57.0 +spanner-graph-notebook==1.1.8 # Langchain dependencies langchain==0.3.14 @@ -20,6 +22,10 @@ langchain-community==0.3.14 langchain-core==0.3.47 langchain-google-vertexai==2.0.15 langchain-text-splitters==0.3.5 +langchain-experimental==0.3.4 +langchain-google-spanner==0.9.0 +networkx==3.5 + # Utilities pyyaml==6.0.2 From aaf74f912e7f9457dee5a7bf16f503feea349111 Mon Sep 17 00:00:00 2001 From: Takumi Ohyama Date: Fri, 6 Feb 2026 06:21:34 +0000 Subject: [PATCH 4/9] fix command --- .../vertex_genai/solutions/gemini_spanner_graph_rag.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index 17acd2dad..9a39c6d64 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -85,7 +85,7 @@ "tags": [] }, "source": [ - "PROJECT_ID = !gcloud config list --format 'value(core.project)'\n", + "PROJECT_ID = !gcloud config get-value project\n", "PROJECT_ID = PROJECT_ID[0]\n", "REGION = \"us-central1\"\n", "%env GOOGLE_CLOUD_PROJECT={PROJECT_ID}" From d1b86bac29d513b24d20a37e833758d3e0fc94c6 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Sun, 8 Feb 2026 00:48:21 +0000 Subject: [PATCH 5/9] fixed db not found error --- .../solutions/gemini_spanner_graph_rag.ipynb | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index 9a39c6d64..3b3c41619 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -239,6 +239,22 @@ "outputs": [], "execution_count": null }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Creating a Spanner database using helper method:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "from google.cloud import spanner\n", + "\n", + "create_database(PROJECT_ID, INSTANCE, DATABASE)" + ] + }, { "cell_type": "markdown", "metadata": { From 2af36bd17dc481489780ae88c8ab3243cdb6f844 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Tue, 24 Feb 2026 23:28:45 +0000 Subject: [PATCH 6/9] reorganized imports --- .../solutions/gemini_spanner_graph_rag.ipynb | 106 +++++------------- 1 file changed, 27 insertions(+), 79 deletions(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index 3b3c41619..a0010b62f 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -27,9 +27,7 @@ "metadata": { "id": "zfIhwIryOls1" }, - "source": [ - "### Configure warnings" - ] + "source": "### Imports" }, { "cell_type": "code", @@ -44,9 +42,33 @@ "id": "EWOkHI7XOna2" }, "source": [ + "# Standard library imports\n", + "import copy\n", + "import json\n", + "import os\n", + "import pprint\n", + "import textwrap\n", + "import uuid\n", "import warnings\n", + "from google.cloud import spanner\n", + "from google.cloud.spanner_admin_database_v1.types import spanner_database_admin\n", + "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", + "from langchain_core.documents import Document\n", + "from langchain_core.output_parsers import StrOutputParser\n", + "from langchain_core.prompts.prompt import PromptTemplate\n", + "from langchain_core.runnables import RunnablePassthrough\n", + "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", + "from langchain_google_spanner import (\n", + " SpannerGraphStore,\n", + " SpannerGraphVectorContextRetriever,\n", + " SpannerVectorStore,\n", + ")\n", + "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", + "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", "\n", - "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)" + "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n", + "\n", + "from IPython.display import Markdown" ], "outputs": [], "execution_count": null @@ -205,10 +227,6 @@ "source": [ "def create_database(project_id, instance_id, database_id):\n", " \"\"\"Creates a database and tables for sample data.\"\"\"\n", - " from google.cloud import spanner\n", - " from google.cloud.spanner_admin_database_v1.types import (\n", - " spanner_database_admin,\n", - " )\n", "\n", " spanner_client = spanner.Client(project_id)\n", " database_admin_api = spanner_client.database_admin_api\n", @@ -249,11 +267,7 @@ "cell_type": "code", "outputs": [], "execution_count": null, - "source": [ - "from google.cloud import spanner\n", - "\n", - "create_database(PROJECT_ID, INSTANCE, DATABASE)" - ] + "source": "create_database(PROJECT_ID, INSTANCE, DATABASE)" }, { "cell_type": "markdown", @@ -284,8 +298,6 @@ "tags": [] }, "source": [ - "from langchain_google_spanner import SpannerGraphStore\n", - "\n", "graph_store = SpannerGraphStore(\n", " instance_id=INSTANCE,\n", " database_id=DATABASE,\n", @@ -305,28 +317,6 @@ "To add graph documents in the graph store." ] }, - { - "cell_type": "code", - "metadata": { - "execution": { - "iopub.execute_input": "2026-02-02T15:57:59.866080Z", - "iopub.status.busy": "2026-02-02T15:57:59.865664Z", - "iopub.status.idle": "2026-02-02T15:58:00.146793Z", - "shell.execute_reply": "2026-02-02T15:58:00.145046Z", - "shell.execute_reply.started": "2026-02-02T15:57:59.866030Z" - }, - "id": "Do7bCkyJagok", - "tags": [] - }, - "source": [ - "import os\n", - "\n", - "from langchain_community.document_loaders import DirectoryLoader, TextLoader\n", - "from langchain_core.documents import Document" - ], - "outputs": [], - "execution_count": null - }, { "cell_type": "code", "metadata": { @@ -396,12 +386,6 @@ "tags": [] }, "source": [ - "import copy\n", - "\n", - "from langchain_experimental.graph_transformers import LLMGraphTransformer\n", - "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", - "\n", - "\n", "def print_graph(graph_documents):\n", " for doc in graph_documents:\n", " print(doc.source.page_content[:100])\n", @@ -482,7 +466,6 @@ "# set of all valid products\n", "products = set()\n", "\n", - "\n", "def prune_invalid_products():\n", " for graph_document in graph_documents:\n", " nodes_to_remove = []\n", @@ -666,24 +649,11 @@ "tags": [] }, "source": [ - "import json\n", - "import pprint\n", - "import textwrap\n", - "\n", - "from IPython.display import Markdown\n", - "from langchain_core.output_parsers import StrOutputParser\n", - "from langchain_core.prompts.prompt import PromptTemplate\n", - "from langchain_google_vertexai import ChatVertexAI, VertexAIEmbeddings\n", - "\n", - "# Retrieve and generate using the relevant snippets of the blog.\n", - "\n", - "\n", "def format_docs(docs):\n", " print(\"Context Retrieved: \\n\")\n", " for doc in docs:\n", " print(\"-\" * 80)\n", " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", - " # print(json.dumps(json.loads(doc.page_content)[0], indent=4))\n", " print(\"-\" * 80)\n", " print(\"\\n\")\n", "\n", @@ -778,12 +748,6 @@ "tags": [] }, "source": [ - "import textwrap\n", - "\n", - "from langchain_google_spanner import SpannerGraphVectorContextRetriever\n", - "from langchain_google_vertexai import VertexAIEmbeddings\n", - "\n", - "\n", "def use_node_vector_retriever(\n", " question, graph_store, embedding_service, label_expr, expand_by_hops\n", "):\n", @@ -877,13 +841,6 @@ "tags": [] }, "source": [ - "import uuid\n", - "\n", - "from langchain_google_spanner import SpannerVectorStore\n", - "from langchain_google_vertexai import VertexAIEmbeddings\n", - "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", - "\n", - "\n", "def load_data_for_vector_search(splits):\n", " embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", "\n", @@ -931,12 +888,6 @@ "id": "vs9-lZiUIC07" }, "source": [ - "import textwrap\n", - "\n", - "from langchain_core.runnables import RunnablePassthrough\n", - "from langchain_google_spanner import SpannerVectorStore\n", - "\n", - "\n", "# Retrieve and generate using the relevant snippets of the blog.\n", "def format_docs(docs):\n", " print(\"Context Retrieved: \\n\")\n", @@ -949,7 +900,6 @@ " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", " return context\n", "\n", - "\n", "prompt = PromptTemplate(\n", " template=\"\"\"\n", " You are a friendly digital shopping assistant.\n", @@ -1002,8 +952,6 @@ "outputId": "6e0b259f-4a45-4662-bbf8-476294d85e0f" }, "source": [ - "import textwrap\n", - "\n", "resp = rag_chain.invoke(USER_QUERY)\n", "print(\"\\n\\nRag Response:\\n\")\n", "print(textwrap.fill(resp, width=80))" From 1180c0fa3791d98d93cc3468da4bccef5cfa8be9 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Tue, 24 Feb 2026 23:45:06 +0000 Subject: [PATCH 7/9] removed redundant mkdir --- .../vertex_genai/solutions/gemini_spanner_graph_rag.ipynb | 1 - 1 file changed, 1 deletion(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index a0010b62f..eb8615c25 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -331,7 +331,6 @@ }, "source": [ "!wget https://raw.githubusercontent.com/googleapis/langchain-google-spanner-python/main/samples/retaildata.zip\n", - "!mkdir content\n", "!unzip -o \"retaildata.zip\"" ], "outputs": [], From 8b2f20173b6589f03a0115c82ef21ace33172cb1 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Tue, 24 Feb 2026 23:52:53 +0000 Subject: [PATCH 8/9] splitted rag code --- .../solutions/gemini_spanner_graph_rag.ipynb | 28 +++++++++++++++---- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index eb8615c25..090e8f3ef 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -729,6 +729,11 @@ "cell_type": "markdown", "source": "GraphRAG using Vector Search and Graph Expansion" }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Define utility method for graph rag" + }, { "cell_type": "code", "metadata": { @@ -759,9 +764,22 @@ " # k=10,\n", " )\n", " context = format_docs(retriever.invoke(question))\n", - " return context\n", - "\n", - "\n", + " return context" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Query retriever to get context for grounded answer generation:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", "\n", "question = USER_QUERY\n", @@ -784,9 +802,7 @@ "\n", "print(\"\\n\\nAnswer:\\n\")\n", "print(textwrap.fill(answer, width=80))" - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "markdown", From d58487423ff8aaa68ec79d326a308fb231384825 Mon Sep 17 00:00:00 2001 From: olex-snk Date: Wed, 25 Feb 2026 14:20:57 +0000 Subject: [PATCH 9/9] added more description --- .../solutions/gemini_spanner_graph_rag.ipynb | 145 +++++++++++++----- 1 file changed, 110 insertions(+), 35 deletions(-) diff --git a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb index 090e8f3ef..df378ea06 100644 --- a/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb +++ b/asl_genai/notebooks/vertex_genai/solutions/gemini_spanner_graph_rag.ipynb @@ -27,7 +27,7 @@ "metadata": { "id": "zfIhwIryOls1" }, - "source": "### Imports" + "source": "### Library imports" }, { "cell_type": "code", @@ -42,7 +42,6 @@ "id": "EWOkHI7XOna2" }, "source": [ - "# Standard library imports\n", "import copy\n", "import json\n", "import os\n", @@ -73,6 +72,11 @@ "outputs": [], "execution_count": null }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Specify the embedding and generative language models" + }, { "cell_type": "code", "metadata": { @@ -92,6 +96,11 @@ "outputs": [], "execution_count": null }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "#### Set Google Cloud project ID environment variable" + }, { "cell_type": "code", "metadata": { @@ -265,9 +274,9 @@ { "metadata": {}, "cell_type": "code", + "source": "create_database(PROJECT_ID, INSTANCE, DATABASE)", "outputs": [], - "execution_count": null, - "source": "create_database(PROJECT_ID, INSTANCE, DATABASE)" + "execution_count": null }, { "cell_type": "markdown", @@ -585,6 +594,11 @@ "### Visualization" ] }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Visualizes Graph Data: %%spanner_graph extension renders a visual topology of your nodes and edges of the graph." + }, { "cell_type": "code", "metadata": { @@ -634,6 +648,34 @@ "## GraphRAG flow using Spanner Graph" ] }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Define helper method to format and print content of retrieved chunks:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ + "def format_docs(docs):\n", + " print(\"Context Retrieved: \\n\")\n", + " for doc in docs:\n", + " print(\"-\" * 80)\n", + " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", + " print(\"-\" * 80)\n", + " print(\"\\n\")\n", + "\n", + " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", + " return context" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "Define prompt template for the LLM:" + }, { "cell_type": "code", "metadata": { @@ -648,18 +690,6 @@ "tags": [] }, "source": [ - "def format_docs(docs):\n", - " print(\"Context Retrieved: \\n\")\n", - " for doc in docs:\n", - " print(\"-\" * 80)\n", - " pprint.pprint(json.loads(doc.page_content)[0], width=80, indent=4)\n", - " print(\"-\" * 80)\n", - " print(\"\\n\")\n", - "\n", - " context = \"\\n\\n\".join(doc.page_content for doc in docs)\n", - " return context\n", - "\n", - "\n", "SPANNERGRAPH_QA_TEMPLATE = \"\"\"\n", "You are a helpful and friendly AI assistant for question answering tasks for an electronics\n", "retail online store.\n", @@ -777,8 +807,6 @@ { "metadata": {}, "cell_type": "code", - "outputs": [], - "execution_count": null, "source": [ "embedding_service = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", "\n", @@ -790,8 +818,22 @@ " embedding_service,\n", " label_expr=\"Product\",\n", " expand_by_hops=1,\n", - ")\n", - "\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Now lets test GraphRAG to explore output results:" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ "answer = chain.invoke(\n", " {\n", " \"question\": question,\n", @@ -890,6 +932,11 @@ "outputs": [], "execution_count": null }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Retrieve and generate using the vanilla RAG approach:" + }, { "cell_type": "code", "metadata": { @@ -903,7 +950,6 @@ "id": "vs9-lZiUIC07" }, "source": [ - "# Retrieve and generate using the relevant snippets of the blog.\n", "def format_docs(docs):\n", " print(\"Context Retrieved: \\n\")\n", " for doc in docs:\n", @@ -925,9 +971,22 @@ " Answer:\n", " \"\"\",\n", " input_variables=[\"context\", \"question\"],\n", - ")\n", - "\n", - "# Create a rag chain\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Define embeddings model and vector database to use it for RAG" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ "embeddings = VertexAIEmbeddings(model_name=EMBEDDING_MODEL)\n", "\n", "db = SpannerVectorStore(\n", @@ -936,19 +995,35 @@ " table_name=TABLE_NAME,\n", " embedding_service=embeddings,\n", ")\n", - "vector_retriever = db.as_retriever(search_kwargs={\"k\": 3})\n", + "vector_retriever = db.as_retriever(search_kwargs={\"k\": 3})" + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Create a complete pipeline for Retrieval-Augmented Generation (RAG)" + }, + { + "metadata": {}, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [ "rag_chain = (\n", - " {\n", - " \"context\": vector_retriever | format_docs,\n", - " \"question\": RunnablePassthrough(),\n", - " }\n", - " | prompt\n", - " | llm\n", - " | StrOutputParser()\n", + " {\n", + " \"context\": vector_retriever | format_docs,\n", + " \"question\": RunnablePassthrough(),\n", + " }\n", + " | prompt\n", + " | llm\n", + " | StrOutputParser()\n", ")" - ], - "outputs": [], - "execution_count": null + ] + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": "### Now lets test traditional RAG approach to compare output results:" }, { "cell_type": "code",