FraudFrog

FraudFrog is an explainable fraud triage tool for payment-company reviewers. It combines a modular Python scoring engine with a reviewer UI for CSV upload, evidence review, approve/dismiss/escalate decisions, undo, and audit history.

The goal is not to automatically block every suspicious transaction. The goal is to rank likely fraud, explain each flag, and help a human reviewer make fast, confident decisions.

How To Run

Backend scoring:

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python fraud_detector.py transactions.csv scored_transactions.csv Balanced

Frontend reviewer app:

cd flagly
npm install
npm run dev

For a production compile check:

cd flagly
npm run build

Detection Strategy

Each transaction is scored using:

per-card amount and behavior baselines
rare category, country, channel, device, and IP signals
velocity windows for card activity
card-testing patterns, including a symmetric (±1h) burst counter that flags every probe in a testing burst — not just the trailing ones a backward-only window catches
merchant and category burst detection (catches the cross-card "QuickPay Online" ring)
shared device/IP activity across cards

Flag thresholds are calibrated against the dataset's four fraud patterns: Conservative (score ≥ 70, highest precision), Balanced (≥ 60, F1-optimal), Aggressive (≥ 50, highest recall). See docs/HYPOTHESIS_LOG.md for the precision/recall/F1 table.

Reliability Safeguards

Low-value dampening prevents normal subscriptions from becoming High/Critical without a strong pattern.
CA to US low-value online purchases are treated as weak cross-border signals.
Rare IP, device, and category signals are contextual instead of blindly additive.
High and Critical levels require strong fraud patterns, not just stacked weak signals.
Threshold calibration reports show how many transactions cross score cutoffs.
Regression tests cover normal transactions, known false-positive patterns, and strong fraud scenarios.

Reviewer Workflow

Reviewers can:

review one flagged transaction at a time
inspect score, severity, reasons, baseline comparison, related activity, and timeline
approve as legitimate, dismiss the flag, or escalate as likely fraud
use keyboard shortcuts and quick-review arrows
undo the last decision
inspect an audit log
benefit from an in-session feedback loop: when the reviewer dismisses the same fraud pattern repeatedly (default: twice), the queue learns it is a likely false positive, de-prioritizes remaining flags of that pattern, and surfaces a banner explaining what it learned
export an updated CSV (<file>_reviewed.csv) containing every original column plus risk_score, severity, flagged, is_fraud, review_status, detected_patterns, and reasons — escalated transactions are marked is_fraud=TRUE

Repository Layout

fraud_detector.py, transactions.csv, tests/, docs/ — Python scoring engine, data, tests, and product docs (repo root)
flagly/ — the Next.js reviewer app (run from here); it calls the root detector via /api/score

Why Not Pure ML?

The dataset does not provide labels, so a supervised model would either be trained on guesses or overfit assumptions. FraudFrog uses explainable rules and behavioral features as the source of truth. Unsupervised anomaly detection could be added as a supporting signal later, but every flag should still have concrete reviewer-facing reasons.

Tests

.venv/bin/python -m unittest discover -s tests -v

Current tests verify:

low-value Disney+ subscriptions are not High risk
normal grocery transactions stay Low risk
high-value gift cards from new identity signals become Critical
shared IP activity across cards is flagged
card-testing patterns are escalated
flagged transactions include explanations

What We Would Do Next

train a supervised model once confirmed reviewer labels exist
add graph-based entity detection across cards, IPs, devices, and merchants
make all production features strictly backward-looking with persisted historical baselines
add drift monitoring and threshold recalibration
expand role-based audit trails and exportable review notes

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
docs		docs
flagly		flagly
tests		tests
.gitignore		.gitignore
App_logo.png		App_logo.png
App_logo_transparent.png		App_logo_transparent.png
Logo_coloured.png		Logo_coloured.png
README.md		README.md
fraud_detector.py		fraud_detector.py
requirements.txt		requirements.txt
transactions.csv		transactions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FraudFrog

How To Run

Detection Strategy

Reliability Safeguards

Reviewer Workflow

Repository Layout

Why Not Pure ML?

Tests

What We Would Do Next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FraudFrog

How To Run

Detection Strategy

Reliability Safeguards

Reviewer Workflow

Repository Layout

Why Not Pure ML?

Tests

What We Would Do Next

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages