Graph-Augmented Intelligence — Databricks + Neo4j Aura

A four-notebook walkthrough showing how to enrich a Databricks fraud-detection model with graph features computed in Neo4j Aura's Graph Data Science (GDS) library. Data moves bidirectionally between Delta Lake and Neo4j via the Neo4j Spark Connector; GDS algorithms (PageRank, Louvain, Node Similarity) produce features that a baseline tabular model cannot.

The final notebook, 03_pull_and_model, closes the loop by training two GradientBoostingClassifier models on the same fraud label: a baseline fit on tabular features only (balance, transaction aggregates, P2P counts, encoded categoricals) and a graph-augmented model that adds the three GDS-derived features (risk_score, community_id, similarity_score). Both runs are logged to MLflow with AUC, precision, recall, and F1, then compared head-to-head via ROC curves, confusion matrices, and feature importance — translating the lift into an estimated dollar impact from the additional fraud caught.

Notebooks

#	Notebook	Runs In	Purpose
00	`00_setup_and_data.py`	Databricks	Generate a synthetic financial-transaction graph (accounts, merchants, transactions, P2P transfers) as Delta tables. A small fraction of accounts are planted as fraud rings.
01	`01_neo4j_ingest.py`	Databricks	Push Delta tables into Neo4j Aura as a typed property graph using the Neo4j Spark Connector.
02	`02_aura_gds_guide.py`	Neo4j Aura Workspace	Reference guide of Cypher + GDS commands to run in Aura: project the graph, run PageRank / Louvain / Node Similarity, write results back as node properties. Also available as plain markdown in `aura_gds_guide.md`.
03	`03_pull_and_model.py`	Databricks	Read enriched Account nodes back via the Spark Connector, register graph features in Unity Catalog Feature Store, train a baseline vs graph-augmented model, and quantify the lift.

Quick Start

1. Prerequisites

Databricks workspace (Unity Catalog enabled, serverless or a cluster with the Neo4j Spark Connector installed)
A running Neo4j Aura instance with GDS enabled (AuraDS or Aura Plugin)
Databricks CLI installed and authenticated locally

2. Configure secrets

Copy the sample env file and fill in your Aura credentials:

cp .env.sample .env
# edit .env with your NEO4J_URI / NEO4J_USER / NEO4J_PASSWORD

Push them into the Databricks secret scope neo4j-graph-engineering:

./setup_secrets.sh

The notebooks read credentials via:

dbutils.secrets.get("neo4j-graph-engineering", "uri")
dbutils.secrets.get("neo4j-graph-engineering", "username")
dbutils.secrets.get("neo4j-graph-engineering", "password")

3. Generate data and push to Neo4j (Databricks)

Import the four .py files into Databricks — they are saved in Databricks notebook source format — then run the first two in order:

00_setup_and_data — builds the synthetic Delta tables (accounts, merchants, transactions, P2P transfers) with fraud rings planted.
01_neo4j_ingest — pushes those Delta tables into your Neo4j Aura instance as a typed property graph via the Neo4j Spark Connector.

4. Compute graph features in Aura

Switch to the Neo4j Aura Workspace → Query tab and walk through 02_aura_gds_guide to project the graph and run PageRank, Louvain, and Node Similarity. The Cypher steps are also available as plain markdown in aura_gds_guide.md if you prefer to follow along outside a Databricks notebook.

When you're done, every Account node carries risk_score, community_id, and similarity_score properties.

5. Pull features and train models (Databricks)

Back in Databricks, run 03_pull_and_model to read the enriched Account nodes via the Spark Connector, register the graph features in Unity Catalog Feature Store, and train the baseline vs graph-augmented GradientBoostingClassifier models — ending with a head-to-head lift comparison and estimated fraud-savings impact.

Unity Catalog Defaults

The notebooks create and use:

Catalog: graph_feature_engineering_demo
Schema: neo4j_webinar

Change the CATALOG / SCHEMA constants at the top of each notebook if you prefer a different location.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph-Augmented Intelligence — Databricks + Neo4j Aura

Notebooks

Quick Start

1. Prerequisites

2. Configure secrets

3. Generate data and push to Neo4j (Databricks)

4. Compute graph features in Aura

5. Pull features and train models (Databricks)

Unity Catalog Defaults

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env.sample		.env.sample
.gitignore		.gitignore
00_setup_and_data.py		00_setup_and_data.py
01_neo4j_ingest.py		01_neo4j_ingest.py
02_aura_gds_guide.py		02_aura_gds_guide.py
03_pull_and_model.py		03_pull_and_model.py
README.md		README.md
aura_gds_guide.md		aura_gds_guide.md
setup_secrets.sh		setup_secrets.sh

Folders and files

Latest commit

History

Repository files navigation

Graph-Augmented Intelligence — Databricks + Neo4j Aura

Notebooks

Quick Start

1. Prerequisites

2. Configure secrets

3. Generate data and push to Neo4j (Databricks)

4. Compute graph features in Aura

5. Pull features and train models (Databricks)

Unity Catalog Defaults

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages