Skip to content

benjaminsunliu/FraudFrog

Repository files navigation

FraudFrog

FraudFrog is an explainable fraud triage tool for payment-company reviewers. It combines a modular Python scoring engine with a reviewer UI for CSV upload, evidence review, approve/dismiss/escalate decisions, undo, and audit history.

The goal is not to automatically block every suspicious transaction. The goal is to rank likely fraud, explain each flag, and help a human reviewer make fast, confident decisions.

How To Run

Backend scoring:

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python fraud_detector.py transactions.csv scored_transactions.csv Balanced

Frontend reviewer app:

cd flagly
npm install
npm run dev

For a production compile check:

cd flagly
npm run build

Detection Strategy

Each transaction is scored using:

  • per-card amount and behavior baselines
  • rare category, country, channel, device, and IP signals
  • velocity windows for card activity
  • card-testing patterns, including a symmetric (±1h) burst counter that flags every probe in a testing burst — not just the trailing ones a backward-only window catches
  • merchant and category burst detection (catches the cross-card "QuickPay Online" ring)
  • shared device/IP activity across cards

Flag thresholds are calibrated against the dataset's four fraud patterns: Conservative (score ≥ 70, highest precision), Balanced (≥ 60, F1-optimal), Aggressive (≥ 50, highest recall). See docs/HYPOTHESIS_LOG.md for the precision/recall/F1 table.

Reliability Safeguards

  • Low-value dampening prevents normal subscriptions from becoming High/Critical without a strong pattern.
  • CA to US low-value online purchases are treated as weak cross-border signals.
  • Rare IP, device, and category signals are contextual instead of blindly additive.
  • High and Critical levels require strong fraud patterns, not just stacked weak signals.
  • Threshold calibration reports show how many transactions cross score cutoffs.
  • Regression tests cover normal transactions, known false-positive patterns, and strong fraud scenarios.

Reviewer Workflow

Reviewers can:

  • review one flagged transaction at a time
  • inspect score, severity, reasons, baseline comparison, related activity, and timeline
  • approve as legitimate, dismiss the flag, or escalate as likely fraud
  • use keyboard shortcuts and quick-review arrows
  • undo the last decision
  • inspect an audit log
  • benefit from an in-session feedback loop: when the reviewer dismisses the same fraud pattern repeatedly (default: twice), the queue learns it is a likely false positive, de-prioritizes remaining flags of that pattern, and surfaces a banner explaining what it learned
  • export an updated CSV (<file>_reviewed.csv) containing every original column plus risk_score, severity, flagged, is_fraud, review_status, detected_patterns, and reasons — escalated transactions are marked is_fraud=TRUE

Repository Layout

  • fraud_detector.py, transactions.csv, tests/, docs/ — Python scoring engine, data, tests, and product docs (repo root)
  • flagly/ — the Next.js reviewer app (run from here); it calls the root detector via /api/score

Why Not Pure ML?

The dataset does not provide labels, so a supervised model would either be trained on guesses or overfit assumptions. FraudFrog uses explainable rules and behavioral features as the source of truth. Unsupervised anomaly detection could be added as a supporting signal later, but every flag should still have concrete reviewer-facing reasons.

Tests

.venv/bin/python -m unittest discover -s tests -v

Current tests verify:

  • low-value Disney+ subscriptions are not High risk
  • normal grocery transactions stay Low risk
  • high-value gift cards from new identity signals become Critical
  • shared IP activity across cards is flagged
  • card-testing patterns are escalated
  • flagged transactions include explanations

What We Would Do Next

  • train a supervised model once confirmed reviewer labels exist
  • add graph-based entity detection across cards, IPs, devices, and merchants
  • make all production features strictly backward-looking with persisted historical baselines
  • add drift monitoring and threshold recalibration
  • expand role-based audit trails and exportable review notes

About

🏆 Valsoft Challenge Winner at MPC Hacks - FraudFrog combines deterministic fraud scoring, per-card anomaly detection, cross-card fraud signals, AI-assisted case summaries, and a Tinder-inspired swipe review flow for fast transaction triage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors