🚀 Invoice Fraud Detector Service

An end-to-end Machine Learning service that detects fraudulent invoices using XGBoost. This project features a full pipeline: synthetic data generation, model training with SMOTE (oversampling), and a Flask-based web dashboard with a real-time risk speedometer.

🛠️ Setup Instructions

1. Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Run the Pipeline (Sequence is Important!)

You must run these in order to create the "brain" for the app:

Generate Data: python core/generator.py (Creates 100-row fake_invoices.csv)
Train AI: python core/trainer.py (Trains the model and saves .pkl files)
Start Service: python app.py (Launches the dashboard at http://127.0.0.1:5000)

🕵️‍♂️ How to Use the Dashboard

The AI is trained to recognize specific patterns of risk. To see the "Speedometer" in action, try these test cases:

✅ Scenario 1: The Trusted Partner (Low Risk)

Vendor: Small Ltd
Amount: 250
Verdict: The needle will stay in the Green (Low Risk).

⚠️ Scenario 2: High-Value Fraud (High Risk)

Vendor: QuickPay UK
Amount: 45000
Verdict: The needle will swing to Red (High Risk) because the AI recognizes the suspicious vendor name and unusually high amount.

🧪 Pro Tip: Find Your Own Test Cases

Open data/raw/fake_invoices.csv. Any row where is_fraud is 1 will trigger a high risk score. Any row where is_fraud is 0 should come back clear!

✨ Interactive Features

🎲 One-Click Demo: Use the "Load Random Sample" button to automatically pull a real record from the dataset. This allows you to quickly test both fraudulent and legitimate scenarios without manual entry.
📈 Live Risk Gauge: The dashboard features a dynamic SVG/CSS needle that reflects the AI's confidence score in real-time.
📜 Audit Log: Every analysis is saved to a session log, allowing you to compare different vendors and risk profiles side-by-side.

💻 Tech Stack

Backend: Pydantic, Joblib, Pandas, Faker
Machine Learning: XGBoost, Scikit-learn, Imbalanced-learn (SMOTE)
Frontend: Flask, HTML5/CSS3 (Animated Gauge), JavaScript (Fetch API)

🧪 Automated Testing

This project includes a comprehensive test suite to ensure the data generator and AI API are perfectly synced. Run them with:

pytest

🤝 Contributing

Contributions are welcome! If you have ideas to improve the fraud detection logic or the dashboard UI:
Fork the Project.
Create your Feature Branch (git checkout -b feature/AmazingFeature).
Commit your Changes (git commit -m 'Add some AmazingFeature').
Push to the Branch (git push origin feature/AmazingFeature).
Open a Pull Request.

📝 Notes

Data Privacy: This project uses synthetic data generated by Faker. No real invoice data is included or required to run the demo.
Model Accuracy: The XGBoost model is trained on a small synthetic sample (100 rows by default). For higher accuracy in a production setting, increase the n value in generator.py and retrain.
CORS: Ensure Flask-CORS is active if you plan to host the frontend and backend on different ports.

🗺️ Roadmap

[ ] Batch Processing: Ability to upload an entire CSV for bulk fraud scanning.

[ ] User Auth: Secure login for finance team members.

[ ] Email Alerts: Auto-notify admins when a "High Risk" invoice is detected.

❤️ Thanks

Scikit-learn & XGBoost: For the heavy lifting in the ML pipeline Faker - For helping create the fake data.

Built By Roy Peters Click here for contact details😁

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
core		core
data		data
models		models
screenshots		screenshots
templates		templates
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
app.py		app.py
demo.mp4		demo.mp4
pytest.ini		pytest.ini
requirements.txt		requirements.txt
scanner.py		scanner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Invoice Fraud Detector Service

🛠️ Setup Instructions

1. Create a Virtual Environment

2. Install Dependencies

Run the Pipeline (Sequence is Important!)

🕵️‍♂️ How to Use the Dashboard

✅ Scenario 1: The Trusted Partner (Low Risk)

⚠️ Scenario 2: High-Value Fraud (High Risk)

🧪 Pro Tip: Find Your Own Test Cases

✨ Interactive Features

💻 Tech Stack

🧪 Automated Testing

🤝 Contributing

📝 Notes

🗺️ Roadmap

❤️ Thanks

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Invoice Fraud Detector Service

🛠️ Setup Instructions

1. Create a Virtual Environment

2. Install Dependencies

Run the Pipeline (Sequence is Important!)

🕵️‍♂️ How to Use the Dashboard

✅ Scenario 1: The Trusted Partner (Low Risk)

⚠️ Scenario 2: High-Value Fraud (High Risk)

🧪 Pro Tip: Find Your Own Test Cases

✨ Interactive Features

💻 Tech Stack

🧪 Automated Testing

🤝 Contributing

📝 Notes

🗺️ Roadmap

❤️ Thanks

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages