An end-to-end Machine Learning service that detects fraudulent invoices using XGBoost. This project features a full pipeline: synthetic data generation, model training with SMOTE (oversampling), and a Flask-based web dashboard with a real-time risk speedometer.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtYou must run these in order to create the "brain" for the app:
-
Generate Data: python core/generator.py (Creates 100-row fake_invoices.csv)
-
Train AI: python core/trainer.py (Trains the model and saves .pkl files)
-
Start Service: python app.py (Launches the dashboard at http://127.0.0.1:5000)
The AI is trained to recognize specific patterns of risk. To see the "Speedometer" in action, try these test cases:
-
Vendor: Small Ltd
-
Amount: 250
-
Verdict: The needle will stay in the Green (Low Risk).
-
Vendor: QuickPay UK
-
Amount: 45000
-
Verdict: The needle will swing to Red (High Risk) because the AI recognizes the suspicious vendor name and unusually high amount.
Open data/raw/fake_invoices.csv. Any row where is_fraud is 1 will trigger a high risk score. Any row where is_fraud is 0 should come back clear!
- 🎲 One-Click Demo: Use the "Load Random Sample" button to automatically pull a real record from the dataset. This allows you to quickly test both fraudulent and legitimate scenarios without manual entry.
- 📈 Live Risk Gauge: The dashboard features a dynamic SVG/CSS needle that reflects the AI's confidence score in real-time.
- 📜 Audit Log: Every analysis is saved to a session log, allowing you to compare different vendors and risk profiles side-by-side.
-
Backend: Pydantic, Joblib, Pandas, Faker
-
Machine Learning: XGBoost, Scikit-learn, Imbalanced-learn (SMOTE)
-
Frontend: Flask, HTML5/CSS3 (Animated Gauge), JavaScript (Fetch API)
This project includes a comprehensive test suite to ensure the data generator and AI API are perfectly synced. Run them with:
pytest
-
Contributions are welcome! If you have ideas to improve the fraud detection logic or the dashboard UI:
-
Fork the Project.
-
Create your Feature Branch (git checkout -b feature/AmazingFeature).
-
Commit your Changes (git commit -m 'Add some AmazingFeature').
-
Push to the Branch (git push origin feature/AmazingFeature).
-
Open a Pull Request.
-
Data Privacy: This project uses synthetic data generated by Faker. No real invoice data is included or required to run the demo.
-
Model Accuracy: The XGBoost model is trained on a small synthetic sample (100 rows by default). For higher accuracy in a production setting, increase the n value in generator.py and retrain.
-
CORS: Ensure Flask-CORS is active if you plan to host the frontend and backend on different ports.
[ ] Batch Processing: Ability to upload an entire CSV for bulk fraud scanning.
[ ] User Auth: Secure login for finance team members.
[ ] Email Alerts: Auto-notify admins when a "High Risk" invoice is detected.
Scikit-learn & XGBoost: For the heavy lifting in the ML pipeline Faker - For helping create the fake data.
Built By Roy Peters Click here for contact details😁