A multi-agent AI system that ingests AML transaction data, detects money laundering typologies, generates FinCEN-compliant SAR narrative drafts, and provides a human-in-the-loop investigator review interface. Inspired by the Co-Investigator AI research paper.
- Multi-Agent Orchestration — 14 specialized agents coordinated by a LangGraph StateGraph with supervisor routing and parallel execution via Send API
- SAR Narrative Generation — DSPy-optimized Chain-of-Thought prompts produce FinCEN 5W1H-compliant narratives with per-section confidence scores
- Human-in-the-Loop Review — LangGraph interrupt checkpoints pause the pipeline for investigator approval, with iterative feedback loops (up to 3 revision cycles)
- PII Protection — Microsoft Presidio masks all sensitive data before LLM inference; reversible anonymization with PostgreSQL-backed mapping store and full audit trail
- False Positive Bypass — Supervisor agent detects low-confidence cases and exits the pipeline early without generating unnecessary SAR narratives
- Programmatic Prompt Optimization — DSPy MIPROv2 signatures tune prompt templates against a golden evaluation dataset
- Compliance Validation (Agent-as-Judge) — Rule-based checks + LLM judge score narratives on 5W1H completeness, factual grounding, regulatory keywords, and objective tone
- Full Audit Trail — Every agent action, LLM call, confidence score, and human decision is logged to PostgreSQL for SR 11-7 model risk management compliance
Raw IBM AML CSV → Data Ingestion → PII Masking (Presidio) → Crime Type Detection
→ Planning Agent (Supervisor) → [4 Typology Agents in parallel] → External Intelligence (Mock MCP)
→ Narrative Generation (DSPy + CoT) → Compliance Validation (Agent-as-Judge)
→ Human Review (Streamlit) → Feedback Agent → Final SAR → Audit Log
┌─────────────────────┐
│ Data Ingestion │
│ & Structuring │
└────────┬────────────┘
│
┌────────▼────────────┐
│ AI-Privacy Guard │
│ (Presidio) │
└────────┬────────────┘
│
┌────────▼────────────┐
│ Crime Type │
│ Detection Agent │
└────────┬────────────┘
│
┌────────▼────────────┐
│ Planning Agent │◄──── Dynamic Memory (FAISS)
│ (Supervisor) │ ├── Regulatory Memory
└────────┬────────────┘ ├── Historical Narrative Memory
│ └── Typology Pattern Memory
┌──────────────┼──────────────┐
│ │ │
┌─────────▼──┐ ┌───────▼────┐ ┌──────▼───────┐
│ Typology │ │ Typology │ │ External │
│ Agents x4 │ │ Agents │ │ Intelligence │
│ (parallel) │ │ (contd) │ │ Agent (Mock) │
└─────────┬──┘ └───────┬────┘ └──────┬───────┘
│ │ │
└──────────────┼──────────────┘
│
┌────────▼────────────┐
│ Narrative │
│ Generation Agent │
│ (DSPy + CoT) │
└────────┬────────────┘
│
┌────────▼────────────┐
│ Compliance │
│ Validation Agent │
│ (Agent-as-Judge) │
└────────┬────────────┘
│
┌────────▼────────────┐ ┌──────────────────┐
│ Human Review │◄────►│ Feedback Agent │
│ (Streamlit UI) │ │ (Iterative) │
│ [INTERRUPT POINT] │ └──────────────────┘
└────────┬────────────┘
│
┌────────▼────────────┐
│ Final SAR Output │
│ + Audit Trail │
└─────────────────────┘
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Primary language |
| Agent Orchestration | LangGraph | Multi-agent StateGraph with supervisor routing |
| Prompt Optimization | DSPy | Programmatic prompt tuning with MIPROv2 |
| LLM (Primary) | Groq — Llama 3.3 70B | Reasoning, narrative generation |
| LLM (Fast) | Groq — Llama 3.1 8B | DSPy optimization, lightweight tasks |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Vector embeddings for RAG |
| Vector Store | FAISS | 3-tier in-memory vector search |
| Database | PostgreSQL 16+ | Structured data, audit logs, checkpoints |
| PII Masking | Microsoft Presidio + spaCy | Anonymize/de-anonymize PII |
| Experiment Tracking | MLflow | Prompt versioning, evaluation metrics |
| API Framework | FastAPI | REST API with SSE streaming |
| Frontend | Streamlit | Multi-page investigator review UI |
| Data Processing | Pandas + Pydantic | Schema validation, feature engineering |
| ORM / Migrations | SQLAlchemy + Alembic | Database abstraction, schema versioning |
# 1. Clone
git clone https://github.com/yourusername/FIU_CoAgents.git
cd FIU_CoAgents
# 2. Environment
cp .env.example .env
# Edit .env — add your GROQ_API_KEY
# 3. Install
pip install -e ".[dev]"
python -m spacy download en_core_web_lg
# 4. Database
brew install postgresql@16 && brew services start postgresql@16
createdb fiu_coagents
make setup
# 5. Run
make run-api # FastAPI on :8000
make run-ui # Streamlit on :8501curl -X POST http://localhost:8000/api/investigations/CASE-001/run \
-H "Content-Type: application/json" \
-d '{
"case_id": "CASE-001",
"enriched_transactions": [...],
"case_summary": {"total_amount": 95000, "risk_level": "HIGH"},
"account_profiles": {}
}'curl -N http://localhost:8000/api/investigations/CASE-001/run/stream \
-H "Content-Type: application/json" \
-d '{"case_id": "CASE-001", "enriched_transactions": [...]}'streamlit run ui/app.py --server.port 8501SUSPICIOUS ACTIVITY REPORT — NARRATIVE
FILING INSTITUTION: First National Bank
REPORT PERIOD: 2022-09-01 to 2022-09-15
CASE REFERENCE: CASE-001
SUMMARY
Multiple structured cash deposits totaling $94,500 detected across 12
transactions, systematically kept below the $10,000 CTR threshold.
SUBJECT IDENTIFICATION (WHO)
Account holder [REDACTED-1], operating accounts 8000ECA410 and
8000ED0210 at banks 11 and 15...
NATURE OF SUSPICIOUS ACTIVITY (WHAT)
Structuring / smurfing pattern detected with 92% confidence...
FIU_CoAgents/
├── src/
│ ├── config.py # pydantic-settings configuration
│ ├── agents/ # LangGraph agent nodes
│ │ ├── state.py # SARInvestigationState TypedDict
│ │ ├── graph.py # StateGraph definition + build_graph()
│ │ ├── supervisor.py # Planning Agent (supervisor routing)
│ │ ├── crime_detection/ # Crime type detection (rule + LLM)
│ │ ├── typology/ # 4 specialized typology agents
│ │ ├── intelligence/ # External intelligence (mock MCP)
│ │ ├── narrative/ # DSPy narrative generation
│ │ ├── validation/ # Compliance validation (Agent-as-Judge)
│ │ └── feedback/ # Iterative feedback agent
│ ├── data/ # Data ingestion, schemas, case builder
│ ├── privacy/ # Presidio guard, PII mapping, audit
│ ├── memory/ # FAISS 3-tier RAG memory
│ ├── api/ # FastAPI routes + middleware
│ └── db/ # SQLAlchemy models, Alembic migrations
├── evaluation/ # Golden dataset + 5 custom scorers
├── ui/ # Streamlit multi-page app
├── tests/ # 300 unit + integration tests
├── data/ # Raw, processed, golden, regulatory
├── scripts/ # Setup, load, build, optimize
└── docs/ # Architecture, API, evaluation docs
The supervisor routes through 8 phases, with a false positive bypass:
| Phase | Agent | Output |
|---|---|---|
| 1 | Crime Type Detection | detected_crime_types, confidence scores |
| 2 | 4 Typology Agents (parallel) | Structuring, Layering, Round-Tripping, Sanctions |
| 2a | False Positive Bypass | If max confidence < 25%, pipeline exits early |
| 3 | External Intelligence | Mock sanctions, PEP, adverse media checks |
| 4 | Narrative Generation (DSPy) | FinCEN 5W1H SAR narrative draft |
| 5 | Compliance Validation | Rule-based + LLM judge scoring |
| 6 | Human Review (interrupt) | Investigator approval/edit/reject |
| 7 | Feedback Agent | Structured revision instructions |
| 8 | Final Output | Approved SAR + audit trail |
9 scoring dimensions evaluated against a golden dataset of 50 labeled cases:
| Scorer | Type | Threshold |
|---|---|---|
| Typology Detection Accuracy | Rule + ML | >= 0.85 |
| Confidence Calibration (ECE) | Statistical | <= 0.10 |
| Narrative Completeness (5W1H) | Rule-based | >= 0.90 |
| Factual Grounding | Rule + LLM | >= 0.95 |
| Regulatory Compliance | Rule-based | >= 0.90 |
| Narrative Quality (LLM Judge) | LLM | >= 3.5 / 5 |
| PII Leakage Rate | Rule-based | 0.00 |
| End-to-End Latency | Timer | <= 120s |
| False Positive Rate | Statistical | <= 0.15 |
make eval # Run full evaluation suite, logs to MLflow- Architecture — System design, data flow, key decisions
- Agent Specifications — Per-agent contracts and state fields
- API Reference — Endpoint catalog with request/response models
- Evaluation Methodology — Golden dataset, scoring, MLflow
- Regulatory Alignment — FinCEN, BSA, SR 11-7 mapping
- Deployment Guide — Setup, environment, running, testing
make test # pytest with coverage
make lint # ruff check + format + mypy
make run-api # FastAPI dev server
make run-ui # Streamlit dev server
make eval # Run evaluation suite
make clean # Remove cachesYash Patel — GitHub
MIT License. See LICENSE for details.
- Co-Investigator AI — Research paper inspiring the multi-agent architecture
- IBM AML Dataset — Synthetic transaction data for development and evaluation
- FinCEN SAR Guidelines — Regulatory framework for narrative structure