The Customer 360 AI Solution Accelerator is a ready-to-deploy reference implementation that demonstrates how to build a modern, intelligent customer data platform using Azure DocumentDB (with MongoDB compatibility). It is delivered as a self-contained Jupyter Notebook (C360_Mongo_vCore.ipynb) that walks through the full lifecycle — from synthetic data generation to AI-powered vector search — against a live DocumentDB cluster.
- Banks and financial institutions modernizing legacy systems
- Teams migrating from on-premises MongoDB or MongoDB Atlas
- Architects and developers exploring AI, RAG, and real-time analytics use cases on Azure DocumentDB
| Component | Details |
|---|---|
| Synthetic data generation | 100 customer profiles + 500 bank transactions + 500 credit card transactions, generated with the Faker library and persisted as CSV files under data/ |
| Data ingestion | Loads the CSVs into three DocumentDB collections: customers, customer_bank_trans, customer_card_trans |
| Aggregation pipelines | Six ready-to-run MongoDB aggregation examples (see below) |
$graphLookup traversal |
Graph-based retrieval of all transactions linked to a single customer |
| Graph visualization | Customer → Transaction → Merchant/Category graphs rendered with networkx and matplotlib |
| Customer segmentation | Power BI-style dashboards (spending distribution, segment counts, source breakdown) built with matplotlib and seaborn |
| Vector search + RAG | Embeddings generated via Azure OpenAI, stored in a customer_embeddings collection, and queried with IVF, HNSW, or DiskANN vector indexes |
Credentials are loaded from config/config.env. The notebook expects the following keys:
DOCUMENTDB_CONN_STRING – Azure DocumentDB (MongoDB-compatible) connection string
OPENAI_API_ENDPOINT – Azure OpenAI endpoint URL
OPENAI_API_TYPE – "azure"
OPENAI_API_VERSION – API version (e.g. 2024-02-01)
OPENAI_EMBEDDINGS_DEPLOYMENT – Deployment name for the embeddings model (e.g. text-embedding-ada-002)
OPENAI_COMPLETIONS_DEPLOYMENT – Deployment name for the completions model
Uses Faker to create realistic customer profiles (name, email, phone, address, DOB, account open date) and two transaction datasets — bank (deposit / withdrawal / transfer) and credit card (six spend categories: groceries, travel, electronics, restaurants, clothing, utilities). All records are saved to data/ as CSV files before ingestion.
Reads the CSV files back with pandas and bulk-inserts them into DocumentDB using pymongo. The database is named customer360 and the three collections are created automatically on first insert.
Six example pipelines demonstrate common Customer 360 queries:
- Accounts older than 5 years — date arithmetic with
$addFieldsand$dateFromString - Credit card grocery transactions over $300 — filtered
$matchon category and amount - Bank withdrawals over $200 — type and amount filter
- Transactions grouped by customer —
$groupwith$sumfor count and total amount - Monthly credit card spending summary —
$addFieldsmonth/year extraction +$group - Customer–transaction join —
$lookupto enrich customer documents with both transaction collections
Uses MongoDB's $graphLookup stage to traverse relationships between a customer and all of their bank and credit card transactions in a single query. Results are then visualized as node graphs (customer → transactions → merchants → categories) using networkx and matplotlib.
Combines bank and credit card data to calculate total per-customer spending, then segments customers into four tiers:
| Segment | Total Spending |
|---|---|
| Low | < $500 |
| Medium | $500 – $2,000 |
| High | $2,000 – $5,000 |
| Premium | > $5,000 |
Three seaborn/matplotlib charts are produced: a spending distribution box plot by segment, a customer count bar chart by segment, and a total spending comparison by transaction source (bank vs. credit card).
Customer profiles and individual transactions are serialized to natural-language strings, embedded with Azure OpenAI, and stored in the customer_embeddings collection. Three vector index types are demonstrated:
| Index | Best for | Recommended tier |
|---|---|---|
| IVF (vector-ivf) | < 10,000 vectors, fast build times | M10 / M20 |
| HNSW (vector-hnsw) | Up to 50,000 vectors, higher recall | M30+ |
| DiskANN (vector-diskann) | 500,000+ vectors, high throughput | M30+ |
Sample RAG queries (e.g. "Which customers have frequent withdrawals?", "Find customers with high-value transactions across both bank and credit card") are executed against the vector index and results are returned with cosine similarity scores.
- Azure DocumentDB cluster (MongoDB-compatible vCore)
- Azure OpenAI resource with an embeddings deployment (e.g.
text-embedding-ada-002) - Python 3.9+ with Jupyter Notebook or VS Code with the Jupyter extension
- Write permissions to create files under the local
data/directory
- Clone this repository.
- Install dependencies:
pip install -r requirements.txt
- Edit
config/config.envand fill in your DocumentDB connection string and Azure OpenAI credentials. - Open
C360_Mongo_vCore.ipynbin VS Code or Jupyter and run cells sequentially.
- Try before you build — Explore DocumentDB capabilities in a sandboxed environment with realistic, auto-generated data
- Accelerate time-to-value — Pre-built pipelines, graph lookups, dashboards, and RAG components are ready to adapt to your schema
- Showcase to stakeholders — Demonstrates real-world use cases including customer segmentation, spend analysis, graph traversal, and AI-powered intelligent search