Azure DocumentDB Customer 360 AI Accelerator

What Is It?

The Customer 360 AI Solution Accelerator is a ready-to-deploy reference implementation that demonstrates how to build a modern, intelligent customer data platform using Azure DocumentDB (with MongoDB compatibility). It is delivered as a self-contained Jupyter Notebook (C360_Mongo_vCore.ipynb) that walks through the full lifecycle — from synthetic data generation to AI-powered vector search — against a live DocumentDB cluster.

Who Is It For?

Banks and financial institutions modernizing legacy systems
Teams migrating from on-premises MongoDB or MongoDB Atlas
Architects and developers exploring AI, RAG, and real-time analytics use cases on Azure DocumentDB

What's Included?

Component	Details
Synthetic data generation	100 customer profiles + 500 bank transactions + 500 credit card transactions, generated with the `Faker` library and persisted as CSV files under `data/`
Data ingestion	Loads the CSVs into three DocumentDB collections: `customers`, `customer_bank_trans`, `customer_card_trans`
Aggregation pipelines	Six ready-to-run MongoDB aggregation examples (see below)
`$graphLookup` traversal	Graph-based retrieval of all transactions linked to a single customer
Graph visualization	Customer → Transaction → Merchant/Category graphs rendered with `networkx` and `matplotlib`
Customer segmentation	Power BI-style dashboards (spending distribution, segment counts, source breakdown) built with `matplotlib` and `seaborn`
Vector search + RAG	Embeddings generated via Azure OpenAI, stored in a `customer_embeddings` collection, and queried with IVF, HNSW, or DiskANN vector indexes

Notebook Walkthrough

1. Setup & Configuration

Credentials are loaded from config/config.env. The notebook expects the following keys:

DOCUMENTDB_CONN_STRING        – Azure DocumentDB (MongoDB-compatible) connection string
OPENAI_API_ENDPOINT           – Azure OpenAI endpoint URL
OPENAI_API_TYPE               – "azure"
OPENAI_API_VERSION            – API version (e.g. 2024-02-01)
OPENAI_EMBEDDINGS_DEPLOYMENT  – Deployment name for the embeddings model (e.g. text-embedding-ada-002)
OPENAI_COMPLETIONS_DEPLOYMENT – Deployment name for the completions model

2. Synthetic Data Generation

Uses Faker to create realistic customer profiles (name, email, phone, address, DOB, account open date) and two transaction datasets — bank (deposit / withdrawal / transfer) and credit card (six spend categories: groceries, travel, electronics, restaurants, clothing, utilities). All records are saved to data/ as CSV files before ingestion.

3. Data Ingestion into DocumentDB

Reads the CSV files back with pandas and bulk-inserts them into DocumentDB using pymongo. The database is named customer360 and the three collections are created automatically on first insert.

4. Aggregation Pipelines

Six example pipelines demonstrate common Customer 360 queries:

Accounts older than 5 years — date arithmetic with $addFields and $dateFromString
Credit card grocery transactions over $300 — filtered $match on category and amount
Bank withdrawals over $200 — type and amount filter
Transactions grouped by customer — $group with $sum for count and total amount
Monthly credit card spending summary — $addFields month/year extraction + $group
Customer–transaction join — $lookup to enrich customer documents with both transaction collections

5. Graph Lookups

Uses MongoDB's $graphLookup stage to traverse relationships between a customer and all of their bank and credit card transactions in a single query. Results are then visualized as node graphs (customer → transactions → merchants → categories) using networkx and matplotlib.

6. Customer Segmentation Dashboards

Combines bank and credit card data to calculate total per-customer spending, then segments customers into four tiers:

Segment	Total Spending
Low	< $500
Medium	$500 – $2,000
High	$2,000 – $5,000
Premium	> $5,000

Three seaborn/matplotlib charts are produced: a spending distribution box plot by segment, a customer count bar chart by segment, and a total spending comparison by transaction source (bank vs. credit card).

7. Vector Search & RAG with Azure OpenAI

Customer profiles and individual transactions are serialized to natural-language strings, embedded with Azure OpenAI, and stored in the customer_embeddings collection. Three vector index types are demonstrated:

Index	Best for	Recommended tier
IVF (vector-ivf)	< 10,000 vectors, fast build times	M10 / M20
HNSW (vector-hnsw)	Up to 50,000 vectors, higher recall	M30+
DiskANN (vector-diskann)	500,000+ vectors, high throughput	M30+

Sample RAG queries (e.g. "Which customers have frequent withdrawals?", "Find customers with high-value transactions across both bank and credit card") are executed against the vector index and results are returned with cosine similarity scores.

Prerequisites

Azure DocumentDB cluster (MongoDB-compatible vCore)
Azure OpenAI resource with an embeddings deployment (e.g. text-embedding-ada-002)
Python 3.9+ with Jupyter Notebook or VS Code with the Jupyter extension
Write permissions to create files under the local data/ directory

Setup

Clone this repository.
Install dependencies:
```
pip install -r requirements.txt
```
Edit config/config.env and fill in your DocumentDB connection string and Azure OpenAI credentials.
Open C360_Mongo_vCore.ipynb in VS Code or Jupyter and run cells sequentially.

Why Use This Accelerator?

Try before you build — Explore DocumentDB capabilities in a sandboxed environment with realistic, auto-generated data
Accelerate time-to-value — Pre-built pipelines, graph lookups, dashboards, and RAG components are ready to adapt to your schema
Showcase to stakeholders — Demonstrates real-world use cases including customer segmentation, spend analysis, graph traversal, and AI-powered intelligent search

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
.gitignore		.gitignore
C360_Mongo_vCore.ipynb		C360_Mongo_vCore.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure DocumentDB Customer 360 AI Accelerator

What Is It?

Who Is It For?

What's Included?

Notebook Walkthrough

1. Setup & Configuration

2. Synthetic Data Generation

3. Data Ingestion into DocumentDB

4. Aggregation Pipelines

5. Graph Lookups

6. Customer Segmentation Dashboards

7. Vector Search & RAG with Azure OpenAI

Prerequisites

Setup

Why Use This Accelerator?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Azure DocumentDB Customer 360 AI Accelerator

What Is It?

Who Is It For?

What's Included?

Notebook Walkthrough

1. Setup & Configuration

2. Synthetic Data Generation

3. Data Ingestion into DocumentDB

4. Aggregation Pipelines

5. Graph Lookups

6. Customer Segmentation Dashboards

7. Vector Search & RAG with Azure OpenAI

Prerequisites

Setup

Why Use This Accelerator?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages