Skip to content

reory/Invoice-Fraud-Detector-Service

Repository files navigation

🚀 Invoice Fraud Detector Service

License Repo Size scikit-learn XGBoost Pandas Pydantic Flask Joblib Imblearn Faker Last Commit

An end-to-end Machine Learning service that detects fraudulent invoices using XGBoost. This project features a full pipeline: synthetic data generation, model training with SMOTE (oversampling), and a Flask-based web dashboard with a real-time risk speedometer.


🛠️ Setup Instructions

1. Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

2. Install Dependencies

pip install -r requirements.txt

Run the Pipeline (Sequence is Important!)

You must run these in order to create the "brain" for the app:

  • Generate Data: python core/generator.py (Creates 100-row fake_invoices.csv)

  • Train AI: python core/trainer.py (Trains the model and saves .pkl files)

  • Start Service: python app.py (Launches the dashboard at http://127.0.0.1:5000)


🕵️‍♂️ How to Use the Dashboard

The AI is trained to recognize specific patterns of risk. To see the "Speedometer" in action, try these test cases:

✅ Scenario 1: The Trusted Partner (Low Risk)

  • Vendor: Small Ltd

  • Amount: 250

  • Verdict: The needle will stay in the Green (Low Risk).

⚠️ Scenario 2: High-Value Fraud (High Risk)

  • Vendor: QuickPay UK

  • Amount: 45000

  • Verdict: The needle will swing to Red (High Risk) because the AI recognizes the suspicious vendor name and unusually high amount.

🧪 Pro Tip: Find Your Own Test Cases

Open data/raw/fake_invoices.csv. Any row where is_fraud is 1 will trigger a high risk score. Any row where is_fraud is 0 should come back clear!


✨ Interactive Features

  • 🎲 One-Click Demo: Use the "Load Random Sample" button to automatically pull a real record from the dataset. This allows you to quickly test both fraudulent and legitimate scenarios without manual entry.
  • 📈 Live Risk Gauge: The dashboard features a dynamic SVG/CSS needle that reflects the AI's confidence score in real-time.
  • 📜 Audit Log: Every analysis is saved to a session log, allowing you to compare different vendors and risk profiles side-by-side.

💻 Tech Stack

  • Backend: Pydantic, Joblib, Pandas, Faker

  • Machine Learning: XGBoost, Scikit-learn, Imbalanced-learn (SMOTE)

  • Frontend: Flask, HTML5/CSS3 (Animated Gauge), JavaScript (Fetch API)


🧪 Automated Testing

This project includes a comprehensive test suite to ensure the data generator and AI API are perfectly synced. Run them with:

pytest


🤝 Contributing

  • Contributions are welcome! If you have ideas to improve the fraud detection logic or the dashboard UI:

  • Fork the Project.

  • Create your Feature Branch (git checkout -b feature/AmazingFeature).

  • Commit your Changes (git commit -m 'Add some AmazingFeature').

  • Push to the Branch (git push origin feature/AmazingFeature).

  • Open a Pull Request.


📝 Notes

  • Data Privacy: This project uses synthetic data generated by Faker. No real invoice data is included or required to run the demo.

  • Model Accuracy: The XGBoost model is trained on a small synthetic sample (100 rows by default). For higher accuracy in a production setting, increase the n value in generator.py and retrain.

  • CORS: Ensure Flask-CORS is active if you plan to host the frontend and backend on different ports.


🗺️ Roadmap

[ ] Batch Processing: Ability to upload an entire CSV for bulk fraud scanning.

[ ] User Auth: Secure login for finance team members.

[ ] Email Alerts: Auto-notify admins when a "High Risk" invoice is detected.


❤️ Thanks

Scikit-learn & XGBoost: For the heavy lifting in the ML pipeline Faker - For helping create the fake data.


Built By Roy Peters Click here for contact details😁

About

An end-to-end Machine Learning microservice that detects high-risk financial transactions using XGBoost. Featuring a real-time Flask API and an interactive risk-dashboard, it demonstrates a complete AI pipeline from synthetic data generation to live visual analysis.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors