Name	Name	Last commit message	Last commit date
parent directory ..
03_data_and_embeddings.ipynb	03_data_and_embeddings.ipynb
04_graphrag_retrievers.ipynb	04_graphrag_retrievers.ipynb
05_hybrid_retrievers.ipynb	05_hybrid_retrievers.ipynb
README.md	README.md
SAMPLE_QUERIES.md	SAMPLE_QUERIES.md
data_utils.py	data_utils.py

Lab 7 - Semantic Search for Aircraft Maintenance

In this lab, you'll add semantic search capabilities to your aircraft knowledge graph. Building on the aircraft topology loaded in Lab 5, you'll create a Document-Chunk structure for the A320-200 Maintenance Manual and enable AI-powered retrieval of maintenance procedures.

Infrastructure: This lab uses your personal Aura instance. You'll load maintenance manual chunks and generate embeddings into the graph you built in Lab 5.

Prerequisites

Before starting, make sure you have:

Completed Lab 5 (Databricks ETL) to load the aircraft graph (Aircraft, System, Component nodes)
Neo4j Aura credentials from Lab 1 (URI, username, password)
Running in a Databricks notebook environment (for Foundation Model API access)
Maintenance manual uploaded to the Unity Catalog Volume (see lab_setup/README.md)

Lab Overview

This lab consists of two notebooks that add semantic search to your existing knowledge graph:

03_data_and_embeddings.ipynb - Data Preparation

Build the foundation for semantic search over maintenance documentation:

Understand the Document -> Chunk graph structure
Load the A320-200 Maintenance Manual into Neo4j
Create Document and Chunk nodes with relationships
Generate embeddings using Databricks Foundation Model APIs (BGE-large)
Create a vector index in Neo4j
Perform similarity search to find relevant maintenance procedures

04_graphrag_retrievers.ipynb - Retrieval Strategies

Learn retrieval patterns from simple to graph-enhanced:

Set up a VectorRetriever using Neo4j's vector index
Use GraphRAG to combine vector search with LLM-generated answers
Create custom Cypher queries with VectorCypherRetriever
Connect maintenance documentation to your aircraft topology
Compare standard vs. graph-enhanced retrieval results

Knowledge Graph Structure

After completing this lab, your knowledge graph will combine:

From Lab 5 (Structured Data):

(:Aircraft)-[:HAS_SYSTEM]->(:System)-[:HAS_COMPONENT]->(:Component)

From Lab 7 (Unstructured Data):

(:Document) <-[:FROM_DOCUMENT]- (:Chunk) -[:NEXT_CHUNK]-> (:Chunk)

Databricks Foundation Model APIs

This lab uses Databricks-hosted embedding and LLM models:

Embedding Models

Model	Dimensions	Context	Best For
`databricks-bge-large-en`	1024	512 tokens	Short text, fast
`databricks-gte-large-en`	1024	8192 tokens	Long documents

LLM Models

Model	Description
`databricks-meta-llama-3-3-70b-instruct`	Llama 3.3 70B (default)
`databricks-dbrx-instruct`	DBRX Instruct
`databricks-mixtral-8x7b-instruct`	Mixtral 8x7B

These models are pre-deployed and ready to use via the MLflow deployments client.

Maintenance Manual Content

The A320-200 Maintenance and Troubleshooting Manual is loaded from the Unity Catalog Volume at:

/Volumes/aws-databricks-neo4j-lab/lab-schema/lab-volume/MAINTENANCE_A320.md

This comprehensive manual includes:

Aircraft Overview: Fleet configuration (5 aircraft), specifications
System Architecture: Engine (V2500-A1), Avionics, Hydraulics systems
Troubleshooting Procedures: EGT overheat, vibration exceedance, fuel starvation, bearing wear
Fault Codes: Complete reference for Engine, Avionics, and Hydraulics faults
Decision Trees: Diagnostic flows for common issues
Scheduled Maintenance: Inspection intervals and task cards

Sample Queries

After completing this lab, you can ask questions like:

"How do I troubleshoot engine vibration?"
"What are the EGT limits during takeoff?"
"What causes hydraulic pressure loss?"
"When should I replace the fuel filter?"
"What oil analysis levels indicate bearing wear?"

Configuration

Each notebook has a Configuration cell at the top where you enter your Neo4j credentials:

NEO4J_URI = ""  # e.g., "neo4j+s://xxxxxxxx.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = ""  # Your password from Lab 1

The embedding and LLM models use Databricks Foundation Model APIs which are pre-deployed and require no additional configuration. When running in Databricks, the MLflow deployments client automatically handles authentication.

Key Concepts

Chunks: Smaller pieces of text split from the maintenance manual for efficient retrieval
Embeddings: 1024-dimensional vectors (BGE-large) that capture semantic meaning
Vector Index: Enables fast similarity search across embeddings
VectorRetriever: Simple semantic search over embedded chunks
VectorCypherRetriever: Graph-enhanced retrieval using custom Cypher queries
GraphRAG: Combining retrieval with LLM generation for context-aware answers

Getting Started

Ensure Lab 5 is complete (aircraft topology loaded)

Verify the maintenance manual is uploaded to the Volume:

/Volumes/aws-databricks-neo4j-lab/lab-schema/lab-volume/MAINTENANCE_A320.md

Upload the notebook files and data_utils.py to your Databricks workspace
Open 03_data_and_embeddings.ipynb
Enter your Neo4j credentials in the Configuration cell
Run cells sequentially to load the maintenance manual and create embeddings
Continue to 04_graphrag_retrievers.ipynb for retrieval strategies

Files

File	Description
`03_data_and_embeddings.ipynb`	Data loading and embedding generation
`04_graphrag_retrievers.ipynb`	Retrieval strategies and GraphRAG
`05_hybrid_retrievers.ipynb`	Hybrid search combining vector + keyword retrieval
`data_utils.py`	Utility functions for Neo4j and Databricks
`README.md`	This file

Note: The MAINTENANCE_A320.md file from lab_setup/aircraft_digital_twin_data/ must be uploaded to the Unity Catalog Volume before running the notebooks. See lab_setup/README.md for upload instructions.

Technical Details

Embedding Generation

The DatabricksEmbeddings class uses the MLflow deployments client:

import mlflow.deployments
client = mlflow.deployments.get_deploy_client("databricks")
response = client.predict(
    endpoint="databricks-bge-large-en",
    inputs={"input": ["text to embed"]},
)
embedding = response["data"][0]["embedding"]  # 1024-dim vector

API Format

Databricks Foundation Models use OpenAI-compatible format:

Input: {"input": ["text1", "text2"]}
Output: {"data": [{"embedding": [0.1, ...]}, ...]}

Next Steps

Continue to Lab 8 - Aura Agents to build a no-code conversational agent using the Neo4j Aura Agents console.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Lab 7 - Semantic Search for Aircraft Maintenance

Prerequisites

Lab Overview

03_data_and_embeddings.ipynb - Data Preparation

04_graphrag_retrievers.ipynb - Retrieval Strategies

Knowledge Graph Structure

Databricks Foundation Model APIs

Embedding Models

LLM Models

Maintenance Manual Content

Sample Queries

Configuration

Key Concepts

Getting Started

Files

Technical Details

Embedding Generation

API Format

Next Steps

FilesExpand file tree

Lab_7_Semantic_Search

Directory actions

More options

Directory actions

More options

Latest commit

History

Lab_7_Semantic_Search

Folders and files

parent directory

README.md

Lab 7 - Semantic Search for Aircraft Maintenance

Prerequisites

Lab Overview

03_data_and_embeddings.ipynb - Data Preparation

04_graphrag_retrievers.ipynb - Retrieval Strategies

Knowledge Graph Structure

Databricks Foundation Model APIs

Embedding Models

LLM Models

Maintenance Manual Content

Sample Queries

Configuration

Key Concepts

Getting Started

Files

Technical Details

Embedding Generation

API Format

Next Steps