Skip to content

AzureCosmosDB/azure-documentdb-mongo-customer360-ai-accelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure DocumentDB Customer 360 AI Accelerator

What Is It?

The Customer 360 AI Solution Accelerator is a ready-to-deploy reference implementation that demonstrates how to build a modern, intelligent customer data platform using Azure DocumentDB (with MongoDB compatibility). It is delivered as a self-contained Jupyter Notebook (C360_Mongo_vCore.ipynb) that walks through the full lifecycle — from synthetic data generation to AI-powered vector search — against a live DocumentDB cluster.

Who Is It For?

  • Banks and financial institutions modernizing legacy systems
  • Teams migrating from on-premises MongoDB or MongoDB Atlas
  • Architects and developers exploring AI, RAG, and real-time analytics use cases on Azure DocumentDB

What's Included?

Component Details
Synthetic data generation 100 customer profiles + 500 bank transactions + 500 credit card transactions, generated with the Faker library and persisted as CSV files under data/
Data ingestion Loads the CSVs into three DocumentDB collections: customers, customer_bank_trans, customer_card_trans
Aggregation pipelines Six ready-to-run MongoDB aggregation examples (see below)
$graphLookup traversal Graph-based retrieval of all transactions linked to a single customer
Graph visualization Customer → Transaction → Merchant/Category graphs rendered with networkx and matplotlib
Customer segmentation Power BI-style dashboards (spending distribution, segment counts, source breakdown) built with matplotlib and seaborn
Vector search + RAG Embeddings generated via Azure OpenAI, stored in a customer_embeddings collection, and queried with IVF, HNSW, or DiskANN vector indexes

Notebook Walkthrough

1. Setup & Configuration

Credentials are loaded from config/config.env. The notebook expects the following keys:

DOCUMENTDB_CONN_STRING        – Azure DocumentDB (MongoDB-compatible) connection string
OPENAI_API_ENDPOINT           – Azure OpenAI endpoint URL
OPENAI_API_TYPE               – "azure"
OPENAI_API_VERSION            – API version (e.g. 2024-02-01)
OPENAI_EMBEDDINGS_DEPLOYMENT  – Deployment name for the embeddings model (e.g. text-embedding-ada-002)
OPENAI_COMPLETIONS_DEPLOYMENT – Deployment name for the completions model

2. Synthetic Data Generation

Uses Faker to create realistic customer profiles (name, email, phone, address, DOB, account open date) and two transaction datasets — bank (deposit / withdrawal / transfer) and credit card (six spend categories: groceries, travel, electronics, restaurants, clothing, utilities). All records are saved to data/ as CSV files before ingestion.

3. Data Ingestion into DocumentDB

Reads the CSV files back with pandas and bulk-inserts them into DocumentDB using pymongo. The database is named customer360 and the three collections are created automatically on first insert.

4. Aggregation Pipelines

Six example pipelines demonstrate common Customer 360 queries:

  1. Accounts older than 5 years — date arithmetic with $addFields and $dateFromString
  2. Credit card grocery transactions over $300 — filtered $match on category and amount
  3. Bank withdrawals over $200 — type and amount filter
  4. Transactions grouped by customer$group with $sum for count and total amount
  5. Monthly credit card spending summary$addFields month/year extraction + $group
  6. Customer–transaction join$lookup to enrich customer documents with both transaction collections

5. Graph Lookups

Uses MongoDB's $graphLookup stage to traverse relationships between a customer and all of their bank and credit card transactions in a single query. Results are then visualized as node graphs (customer → transactions → merchants → categories) using networkx and matplotlib.

6. Customer Segmentation Dashboards

Combines bank and credit card data to calculate total per-customer spending, then segments customers into four tiers:

Segment Total Spending
Low < $500
Medium $500 – $2,000
High $2,000 – $5,000
Premium > $5,000

Three seaborn/matplotlib charts are produced: a spending distribution box plot by segment, a customer count bar chart by segment, and a total spending comparison by transaction source (bank vs. credit card).

7. Vector Search & RAG with Azure OpenAI

Customer profiles and individual transactions are serialized to natural-language strings, embedded with Azure OpenAI, and stored in the customer_embeddings collection. Three vector index types are demonstrated:

Index Best for Recommended tier
IVF (vector-ivf) < 10,000 vectors, fast build times M10 / M20
HNSW (vector-hnsw) Up to 50,000 vectors, higher recall M30+
DiskANN (vector-diskann) 500,000+ vectors, high throughput M30+

Sample RAG queries (e.g. "Which customers have frequent withdrawals?", "Find customers with high-value transactions across both bank and credit card") are executed against the vector index and results are returned with cosine similarity scores.

Prerequisites

  • Azure DocumentDB cluster (MongoDB-compatible vCore)
  • Azure OpenAI resource with an embeddings deployment (e.g. text-embedding-ada-002)
  • Python 3.9+ with Jupyter Notebook or VS Code with the Jupyter extension
  • Write permissions to create files under the local data/ directory

Setup

  1. Clone this repository.
  2. Install dependencies:
    pip install -r requirements.txt
  3. Edit config/config.env and fill in your DocumentDB connection string and Azure OpenAI credentials.
  4. Open C360_Mongo_vCore.ipynb in VS Code or Jupyter and run cells sequentially.

Why Use This Accelerator?

  • Try before you build — Explore DocumentDB capabilities in a sandboxed environment with realistic, auto-generated data
  • Accelerate time-to-value — Pre-built pipelines, graph lookups, dashboards, and RAG components are ready to adapt to your schema
  • Showcase to stakeholders — Demonstrates real-world use cases including customer segmentation, spend analysis, graph traversal, and AI-powered intelligent search

About

The Customer 360 AI Solution Accelerator is a ready‑to‑deploy reference implementation that demonstrates how to build a modern, intelligent customer data platform using Azure DocumentDB (with MongoDB compatibility).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors