Biometric Refutation Module — AI-Driven Risk Intelligence for 2026 INTERPOL Fugitives

Note: This repository contains my individual contribution to a larger 5-person group project completed for CS610 Applied Machine Learning at Singapore Management University (SMU). The full project spans multiple ML pillars; this repo focuses exclusively on the Synthetic Data Generation and Biometric Prediction components that I built.

Project Context

The full group project built an ML-driven compliance screening pipeline to identify INTERPOL 2026 fugitives against banking client data. The system addresses the limitations of traditional blacklist matching, which is highly manual, one-dimensional and time consuming, by replacing it with a multi-pillar ML architecture. With this project, we aim to reduce analyst fatigue by improving operational efficiency, in turn improving customer satisfaction.

ML Architecture Proposed:

Full pipeline architecture (Group 9):

Pillar	Weight	Component	Owner
Identity Resolution	40%	TF-IDF Semantic Name Matching	Teammate
Crime Severity	30%	RoBERTa Crime Classification	Teammate
Hidden Linkage	20%	GraphSAGE Link Prediction	Teammate
Visual Recognition	10%	Fine-tuned Face-MAE (CV)	Teammate
Biometric Refutation	Gate	Ensemble ML Classifier	Me

The biometric module acts as a conditional veto gate: if the ensemble model confirms a biometric match (confidence ≥ 0.5), the client is immediately escalated to CRITICAL risk (score = 1.0), bypassing the weighted scoring pipeline entirely.

My Contributions

1. Synthetic Data Generation

The problem: Real client-fugitive match data doesn't exist publicly due to privacy constraints. Without labelled training data, no classifier can be built.

My solution: I designed and generated a synthetic dataset of 100,000 client-fugitive comparison pairs (10% matches, 90% non-matches) from the INTERPOL Red Notices dataset (6,479 records via OpenSanctions API).

Generation logic:

~6,000 true bad actors (no variation): Direct copies of INTERPOL records seeded as confirmed matches.
~4,000 synthetic bad actors (with variation): Realistic perturbations to simulate real-world data quality issues:
- Name: random swap, shuffle, slice, drop, or duplicate of characters
- Age: shifted ±1–3 years, 1–6 months, or 1–20 days
- Height: modified by up to ±0.03m
- Hair & eye colour: randomly substituted from INTERPOL's available colour set
50% good actors (complete variation): Biometric features randomly sampled from INTERPOL distributions; names constructed by randomly combining first and last names from the pool (no overlap with real fugitives).
50% good actors (same name, different bio): Names identical to fugitives but biometrics independently sampled — designed as hard negatives to stress-test the model.

Final dataset: 100,000 rows | 10,000 bad actors | 8 engineered features

Feature	Type	Description
`name_similarity`	Continuous (0–1)	SequenceMatcher ratio between client and fugitive name
`age_difference`	Discrete	Absolute age gap in years
`same_gender`	Binary	1 if gender matches
`height_difference`	Continuous	Absolute height gap in metres
`weight_difference`	Continuous	Absolute weight gap in kg
`same_hair_colour`	Binary	1 if hair colour matches
`same_eye_colour`	Binary	1 if eye colour matches
`client_match_label`	Binary	Ground truth: 1 = fugitive match

2. Biometric Prediction (Classical ML + Ensemble)

The problem: Most fugitive database typically only record secondary attributes, such as hair and eye colour. Hence, we lack the reliable primary biometrics (e.g. fingerprints) to verify identification.

Approach: I trained and evaluated classification models to learn soft-biometric features and detect whether the client is one of the fugitives in the list. I used a 75/25 train-test split with K-fold cross-validation. Experimented with four classical models and two ensemble to assess which approach minimises False Negatives and ensures robustness. Scaling was intentionally omitted to preserve feature sparsity.

Classical Model Results

Model	F2	F1	Accuracy	Precision	Recall
Logistic Regression	0.8891	0.8694	0.9729	0.8385	0.9027
Decision Tree	0.8756	0.8415	0.9661	0.7902	0.8999
Random Forest	0.9708	0.9385	0.9870	0.8892	0.9936
XGBoost	0.9702	0.9376	0.9868	0.8879	0.9932

Primary metric is F2-score (weights Recall higher than Precision) because a missed fugitive (False Negative) is catastrophic in AML — far worse than a false alarm.

Key finding from False Negative analysis:

Random Forest struggled to flag confirmed matches when name_similarity = 1.0, age_difference = 1, and gender matched — specifically because mismatched hair/eye colour (intentionally introduced as noise) caused the model to over-penalise the match. This revealed a feature importance imbalance.

Random Forest also achieved high Precision-Recall AUC scores. However, since the model was trained on synthetic labels rather than real-world data, these high scores may indicate data leakage. This raises concerns about the model's robustness and generalisability when applied to authentic client data.

Top 2 features across all models:

age_difference — acts as a hard biological constraint; large gaps are high-confidence false positives
name_similarity — provides strong discriminative power even with minor character variation

Ensemble Approach

To reduce False Negatives and improve robustness, I implemented two ensemble strategies using Logistic Regression, Random Forest, and XGBoost as diverse base learners:

Soft-Voting Ensemble: Averages predicted probability scores across models rather than majority voting, accounting for each model's confidence.
Stacking Ensemble: Trains a meta-model via K-fold cross-validation on base model predictions to learn optimal combination weights.

Ensemble Results

Model	F2	F1	Accuracy	Recall
Random Forest (baseline)	0.9708	0.9385	0.9870	0.9936
Ensemble Soft-Voting	0.9708	0.9380	0.9869	0.9940
Ensemble Stacking	0.9705	0.9385	0.9870	0.9932

Confusion matrix improvement (False Negatives):

Model	False Negatives
Random Forest	20
Ensemble Soft-Voting	11 ✅ (as of 7 Apr 2026, the number was 13)
Ensemble Stacking	15

The Soft-Voting Ensemble reduced False Negatives from 20 → 11, a 45% reduction, making it the selected model for the integration pipeline.

How This Fits Into the Full System

Client Input (Name, Photo, Gender, Biometrics)
        │
        ▼
[BIOMETRIC REFUTATION] ◄── My module
        │
        ├── Confidence ≥ 0.5 → CRITICAL (Final Risk = 1.0) → Freeze + Escalate
        │
        └── Confidence < 0.5 → Pass to weighted scoring pipeline
                                (Identity + Crime + Linkage + Visual)

The biometric gate is intentionally conservative — in AML, over-flagging is by design. Compliance teams prefer false positives over missed fugitives.

Tech Stack

Python, scikit-learn
Pandas, NumPy
XGBoost, Pickle
OpenSanctions INTERPOL Red Notices dataset (6,479 records)

Limitations & Future Work

Synthetic data dependency: The model is trained on generated pairs, not real compliance officer-verified labels. High Precision-Recall AUC may partially reflect data leakage from the synthetic generation process.
Soft biometric uniqueness: Hair colour, eye colour, and height can be altered. A hybrid approach incorporating primary biometrics (fingerprints, iris) when available would improve robustness.
Fixed weights: The 0.5 biometric gate threshold was set based on domain knowledge. Production deployment should tune this per crime severity category using a validation set.

About

Part of CS610 Applied Machine Learning, SMU MITB programme.
Full group project: AI-Driven Risk Intelligence for 2026 INTERPOL Fugitives — Group 9.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Pillar 3 (Biometric Prediction)		Pillar 3 (Biometric Prediction)
assets		assets
outputs		outputs
synthetic_data_generation		synthetic_data_generation
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biometric Refutation Module — AI-Driven Risk Intelligence for 2026 INTERPOL Fugitives

Project Context

My Contributions

1. Synthetic Data Generation

2. Biometric Prediction (Classical ML + Ensemble)

Classical Model Results

Ensemble Approach

Ensemble Results

How This Fits Into the Full System

Tech Stack

Limitations & Future Work

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Biometric Refutation Module — AI-Driven Risk Intelligence for 2026 INTERPOL Fugitives

Project Context

My Contributions

1. Synthetic Data Generation

2. Biometric Prediction (Classical ML + Ensemble)

Classical Model Results

Ensemble Approach

Ensemble Results

How This Fits Into the Full System

Tech Stack

Limitations & Future Work

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages