In this lab, you'll add semantic search capabilities to your aircraft knowledge graph. Building on the aircraft topology loaded in Lab 5, you'll create a Document-Chunk structure for the A320-200 Maintenance Manual and enable AI-powered retrieval of maintenance procedures.
Infrastructure: This lab uses your personal Aura instance. You'll load maintenance manual chunks and generate embeddings into the graph you built in Lab 5.
Before starting, make sure you have:
- Completed Lab 5 (Databricks ETL) to load the aircraft graph (Aircraft, System, Component nodes)
- Neo4j Aura credentials from Lab 1 (URI, username, password)
- Running in a Databricks notebook environment (for Foundation Model API access)
- Maintenance manual uploaded to the Unity Catalog Volume (see
lab_setup/README.md)
This lab consists of two notebooks that add semantic search to your existing knowledge graph:
Build the foundation for semantic search over maintenance documentation:
- Understand the Document -> Chunk graph structure
- Load the A320-200 Maintenance Manual into Neo4j
- Create Document and Chunk nodes with relationships
- Generate embeddings using Databricks Foundation Model APIs (BGE-large)
- Create a vector index in Neo4j
- Perform similarity search to find relevant maintenance procedures
Learn retrieval patterns from simple to graph-enhanced:
- Set up a VectorRetriever using Neo4j's vector index
- Use GraphRAG to combine vector search with LLM-generated answers
- Create custom Cypher queries with VectorCypherRetriever
- Connect maintenance documentation to your aircraft topology
- Compare standard vs. graph-enhanced retrieval results
After completing this lab, your knowledge graph will combine:
From Lab 5 (Structured Data):
(:Aircraft)-[:HAS_SYSTEM]->(:System)-[:HAS_COMPONENT]->(:Component)
From Lab 7 (Unstructured Data):
(:Document) <-[:FROM_DOCUMENT]- (:Chunk) -[:NEXT_CHUNK]-> (:Chunk)
This lab uses Databricks-hosted embedding and LLM models:
| Model | Dimensions | Context | Best For |
|---|---|---|---|
databricks-bge-large-en |
1024 | 512 tokens | Short text, fast |
databricks-gte-large-en |
1024 | 8192 tokens | Long documents |
| Model | Description |
|---|---|
databricks-meta-llama-3-3-70b-instruct |
Llama 3.3 70B (default) |
databricks-dbrx-instruct |
DBRX Instruct |
databricks-mixtral-8x7b-instruct |
Mixtral 8x7B |
These models are pre-deployed and ready to use via the MLflow deployments client.
The A320-200 Maintenance and Troubleshooting Manual is loaded from the Unity Catalog Volume at:
/Volumes/aws-databricks-neo4j-lab/lab-schema/lab-volume/MAINTENANCE_A320.md
This comprehensive manual includes:
- Aircraft Overview: Fleet configuration (5 aircraft), specifications
- System Architecture: Engine (V2500-A1), Avionics, Hydraulics systems
- Troubleshooting Procedures: EGT overheat, vibration exceedance, fuel starvation, bearing wear
- Fault Codes: Complete reference for Engine, Avionics, and Hydraulics faults
- Decision Trees: Diagnostic flows for common issues
- Scheduled Maintenance: Inspection intervals and task cards
After completing this lab, you can ask questions like:
- "How do I troubleshoot engine vibration?"
- "What are the EGT limits during takeoff?"
- "What causes hydraulic pressure loss?"
- "When should I replace the fuel filter?"
- "What oil analysis levels indicate bearing wear?"
Each notebook has a Configuration cell at the top where you enter your Neo4j credentials:
NEO4J_URI = "" # e.g., "neo4j+s://xxxxxxxx.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "" # Your password from Lab 1The embedding and LLM models use Databricks Foundation Model APIs which are pre-deployed and require no additional configuration. When running in Databricks, the MLflow deployments client automatically handles authentication.
- Chunks: Smaller pieces of text split from the maintenance manual for efficient retrieval
- Embeddings: 1024-dimensional vectors (BGE-large) that capture semantic meaning
- Vector Index: Enables fast similarity search across embeddings
- VectorRetriever: Simple semantic search over embedded chunks
- VectorCypherRetriever: Graph-enhanced retrieval using custom Cypher queries
- GraphRAG: Combining retrieval with LLM generation for context-aware answers
- Ensure Lab 5 is complete (aircraft topology loaded)
- Verify the maintenance manual is uploaded to the Volume:
/Volumes/aws-databricks-neo4j-lab/lab-schema/lab-volume/MAINTENANCE_A320.md - Upload the notebook files and
data_utils.pyto your Databricks workspace - Open
03_data_and_embeddings.ipynb - Enter your Neo4j credentials in the Configuration cell
- Run cells sequentially to load the maintenance manual and create embeddings
- Continue to
04_graphrag_retrievers.ipynbfor retrieval strategies
| File | Description |
|---|---|
03_data_and_embeddings.ipynb |
Data loading and embedding generation |
04_graphrag_retrievers.ipynb |
Retrieval strategies and GraphRAG |
05_hybrid_retrievers.ipynb |
Hybrid search combining vector + keyword retrieval |
data_utils.py |
Utility functions for Neo4j and Databricks |
README.md |
This file |
Note: The MAINTENANCE_A320.md file from lab_setup/aircraft_digital_twin_data/ must be uploaded to the Unity Catalog Volume before running the notebooks. See lab_setup/README.md for upload instructions.
The DatabricksEmbeddings class uses the MLflow deployments client:
import mlflow.deployments
client = mlflow.deployments.get_deploy_client("databricks")
response = client.predict(
endpoint="databricks-bge-large-en",
inputs={"input": ["text to embed"]},
)
embedding = response["data"][0]["embedding"] # 1024-dim vectorDatabricks Foundation Models use OpenAI-compatible format:
- Input:
{"input": ["text1", "text2"]} - Output:
{"data": [{"embedding": [0.1, ...]}, ...]}
Continue to Lab 8 - Aura Agents to build a no-code conversational agent using the Neo4j Aura Agents console.