AegisAI is an agentic AI system for Account Takeover (ATO) detection in digital banking and fintech environments. Unlike traditional fraud systems that optimize for detection accuracy, AegisAI treats trust as a system-level property rather than a model metric.
AegisAI is a governed decision-making system designed to protect high-stakes digital environments from account takeover. It uses a multi-agent architecture to separate risk prediction from decision authority, ensuring that AI only makes decisions when confidence is high and policy allows. The system features a human-in-the-loop escalation mechanism for ambiguous cases and an immutable, hash-chained audit trail for regulatory compliance and transparency.
Key Value Propositions:
- Controlled Autonomy: AI only decides when it's certain; humans handle the rest.
- Traceable Decisions: Every action is logged with its full reasoning and lineage.
- Policy-First Governance: Hard constraints override model outputs to ensure safety.
- Enterprise-Ready: Designed with modularity, scalability, and auditability at its core.
Traditional fraud detection systems focus on accuracy metrics (Precision/Recall), which often fails in high-stakes production environments:
- The False Positive Paradox: A 99% accurate model that blocks 1% of legitimate users can destroy customer trust and increase support costs.
- The Transparency Gap: Black-box models cannot explain why a user was blocked, making regulatory compliance and manual review difficult.
- The Authority Risk: Purely automated systems can be "tricked" by novel attacks if they don't have a mechanism for restraint.
AegisAI solves this by prioritizing Trust and Reliability over raw accuracy, explicitly modeling uncertainty and escalating to humans when necessary.
AegisAI follows a multi-agent, parallel-execution architecture where no single model has absolute authority.
graph TD
subgraph API_LAYER[" API Layer "]
GW[FastAPI Gateway - POST /evaluate-login]
HLT[GET /health and GET /ready]
end
subgraph ORCH_LAYER[" Orchestration Layer "]
IV[Input Validation]
DF[DecisionFlow]
AR[AgentRouter - parallel fan-out]
end
subgraph AGENT_LAYER[" Agent Layer "]
DA[Detection Agent - XGBoost]
BA[Behavior Agent - Isolation Forest]
NA[Network Agent - Graph Neural Network]
CA[Confidence Agent - Uncertainty Gating]
EA[Explanation Agent - LLM / Template]
end
subgraph POLICY_LAYER[" Policy and Governance Layer "]
PE[PolicyEngine - policy_rules.yaml]
EQ[Escalation Queue]
HO[Human Override - Review UI]
end
subgraph STORAGE_LAYER[" Storage Layer "]
AL[Audit Logger - hash-chained JSONL]
S3[AWS S3 - Immutable WORM Logs]
DY[AWS DynamoDB - Metadata Index]
end
subgraph MON_LAYER[" Monitoring Layer "]
PR[Prometheus]
GR[Grafana]
ML[MLflow - Model Registry]
end
subgraph INFRA_LAYER[" Infra Layer - Terraform and ECS Fargate "]
ECS[AWS ECS Fargate - Auto-scaling]
TF[Terraform IaC - ecs / iam / storage]
end
GW --> IV
IV --> DF
DF --> AR
AR --> DA
AR --> BA
AR --> NA
DA --> CA
BA --> CA
NA --> CA
CA -->|high confidence| EA
CA -->|low confidence| EQ
EA --> PE
EQ --> HO
HO --> PE
PE --> AL
AL --> S3
AL --> DY
PE -.->|metrics| PR
PR --> GR
EA -.->|artifacts| ML
INFRA_LAYER -.->|hosts| API_LAYER
| Agent | Purpose | Primary Signal | Constraint |
|---|---|---|---|
| Detection | Anomaly Detection | XGBoost risk scores | Cannot make decisions |
| Behavior | Pattern Consistency | Isolation Forest patterns | No network context |
| Network | Relational Risk | Graph Neural Network (GNN) | Evidence only, no verdicts |
| Confidence | Gatekeeper | Uncertainty/Disagreement | Cannot generate actions |
| Explanation | Action & Reasoning | LLM/Template-based | Must obey policy rules |
- Separation of Prediction and Decision: Agents produce signals, but only the orchestrator (gated by confidence) produces decisions. This prevents single-model bias from cascading.
- Immutability over Performance: We use a frozen
DecisionContextand hash-chained audit logs. This adds a slight latency overhead but guarantees that no decision can be tampered with post-hoc. - Conservative Action Defaults: When in doubt, the system defaults to
CHALLENGEorESCALATErather thanALLOWorBLOCK, prioritizing security and user trust. - Vanilla CSS & Standard Frameworks: For the UI (if applicable) and API, we use standard, flexible tools like FastAPI and Vanilla CSS to avoid vendor lock-in and ensure long-term maintainability.
Detailed architectural decisions are recorded in docs/adr/.
AegisAI is designed to fail gracefully. Known failure modes include:
- The Travel Trap: A legitimate user logging in from a new country might trigger high risk. Mitigation: Confidence agent detects novelty and escalates to human review.
- Low-and-Slow Attacks: Sophisticated attackers might mimic behavior perfectly. Mitigation: Network Agent identifies shared infrastructure and relational anomalies.
- Systemic Bias: Human reviewers might consistently override certain types of correct AI decisions. Mitigation: Human Override Rate monitoring and periodic policy/model retraining.
- Infrastructure Latency: High-traffic bursts might slow down parallel execution. Mitigation: Timeouts and fallback "Safe Mode" actions.
We track metrics that reflect real-world trust, not just model performance:
| Metric | Goal | Rationale |
|---|---|---|
| False Positive Rate (FPR) | < 5% | Minimizes harm to legitimate users. |
| Escalation Rate | 5-30% | Measures how often the AI exercises healthy restraint. |
| Human Override Rate | < 20% | Indicates AI-Human alignment and model drift. |
| Calibration Error | < 0.10 | Ensures the AI is "honest" about its uncertainty. |
| Policy Violation Count | 0 | Proves the governance layer is enforcing hard limits. |
We maintain a high-performance decision pipeline validated at two levels:
| Metric | Goal | Benchmark |
|---|---|---|
| Mean Latency | < 1.0 ms | benchmarks/latency_benchmark.py |
| Throughput | > 2000 req/s | benchmarks/latency_benchmark.py |
export PYTHONPATH=$PYTHONPATH:$(pwd)/src
python benchmarks/latency_benchmark.pyExercises the full HTTP stack end-to-end: FastAPI gateway → DecisionFlow → PolicyEngine → AuditLogger.
Under a 200-user concurrent load test on a single-node environment, AegisAI sustained ~309 RPS with 0% error rate and P95 latency of 500ms at ~92% CPU utilization, indicating compute-bound scaling behavior.
| Metric | Result (200 Users) | How measured |
|---|---|---|
| P50 HTTP Latency | 300 ms | Locust built-in stats |
| P95 HTTP Latency | 500 ms | Locust built-in stats |
| P99 HTTP Latency | 590 ms | Locust built-in stats |
| Steady State RPS | 309.3 | Locust total RPS |
| Error Rate | 0.00% | Locust failure counter |
| Mean CPU (Local Docker) | 92.5% | psutil sampler |
| Peak CPU (Local Docker) | 99.1% | psutil sampler |
Monitoring Tools:
- Prometheus/Grafana: For real-time metrics (latency, throughput, escalation spikes).
- MLflow: For model versioning and artifact tracking.
- S3 Audit Store: For historical analysis and compliance reporting.
AegisAI is built on a serverless, horizontally-scalable architecture designed to handle high-concurrency fraud detection.
- Audit Layer: Uses S3 for immutable storage (virtually infinite scale) and DynamoDB for fast metadata lookups (low-latency, auto-scaling).
- Compute: Deployment on AWS ECS (Fargate) ensures the compute layer scales horizontally based on request volume.
- Parallel Agent Execution: The
AgentRouterexecutes independent agents in parallel, keeping end-to-end latency constant even as new agents are added. - Key Considerations: Includes data partitioning (Hot Key mitigation), resource contention management, and future streaming ingestion via Kafka/Kinesis.
Detailed scaling strategies and bottleneck mitigations are documented in docs/SCALING.md.
Hard Policy Constraints (Managed in config/policy_rules.yaml):
- AI cannot permanently block accounts.
- AI cannot terminate user sessions without 90%+ confidence.
- Any decision with >30% agent disagreement must escalate.
Tamper Detection:
- Audit logs use hash-chain integrity. Any change to a past record breaks the chain and alerts the security team.
- Object Lock (WORM) is enabled on S3 audit buckets to prevent deletion.
AegisAI is deployed using a modern, containerized infrastructure-as-code (IaC) approach.
- Infrastructure: Terraform manages AWS resources (ECS, S3, DynamoDB, IAM).
- Containerization: Docker-based builds for reproducible environments.
- Orchestration: AWS ECS (Fargate) for serverless container management.
- CI/CD: GitHub Actions for automated testing, linting, and deployment.
See docs/DEPLOYMENT.md for detailed setup instructions.
- Near-Term: Streaming ingestion via Kafka, real-time Grafana dashboards.
- Medium-Term: Typing patterns and mouse dynamics for richer behavioral signals.
- Long-Term: Federated learning for cross-institution fraud patterns and causal inference for fraud dynamics.
# Clone repository
git clone https://github.com/abi-commits/AegisAI.git
cd AegisAI
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# Install dependencies
pip install -e ".[dev,ml]"python demo.pypytest tests/ -vFor questions about this system design, please open an issue on GitHub.