A FastAPI backend service for a privacy-first personal finance app that extracts transaction details from bank SMS notifications using production-grade advanced algorithms and universal pattern learning.
The game-changer that revolutionizes SMS parsing!
- 🧠 Field-Agnostic Learning: Automatically learns extraction patterns for ANY field (merchant, bank_name, beneficiary, VPA, etc.)
- 🚀 Zero Code Changes: Add new fields → System auto-learns patterns
- 📊 Three Learning Modes:
- Refinement: Clean up messy extractions (remove " PA", " Avl Limit")
- Discovery: Find completely missed fields (learn bank_name from structure)
- Structural: Learn from SMS structure and context
- 🎯 Six Extraction Strategies: Structural → Position → Stop Words → Text → Removal → Validation
- ⚡ Lightning Fast: <5ms learning, ~3-5ms application per SMS
- 📈 80-95% Accuracy: After just 3-5 corrections per pattern type
- ♾️ Infinite Scalability: Works for ANY field automatically
How it works: User corrects once (e.g., "Axis Bank Card") → System learns pattern → Automatically extracts for different banks (e.g., "ICICI Bank Card") ✨
See UNIVERSAL_LEARNING_QUICKSTART.md for complete guide!
The breakthrough that makes ML models independent of JSON files!
- 🔄 Automatic Retraining: After 10 feedbacks, automatically converts JSON patterns → ML training data → Retrains models
- 📦 ML Independence: Models internalize learned patterns, reducing JSON dependency from 100% → 10-20%
- ⚡ Performance Boost: 5-10ms (JSON lookup) → 3-7ms (pure ML inference)
- 🎯 Gradual Evolution: Week 1: 90% JSON → Month 3: 80% ML → Month 6+: 95% ML
- 🚀 Production Ready: Deploy just the ML models, JSON becomes optional fallback
- 📊 Track Progress:
/retrain/statusendpoint shows retraining stats and ML adoption
How it works:
- User gives feedback → Patterns stored in JSON (immediate learning)
- After 10+ feedbacks → Auto-converts patterns to training data
- Retrains merchant classifier, field extractors, type classifier
- Updated models handle 80-90% of requests WITHOUT JSON!
Result: Best of both worlds - instant learning (JSON) + internalized knowledge (ML)! 🎉
See AUTO_RETRAIN_SYSTEM.md for complete technical guide!
Password-protected web interface for ML model management!
- 🔒 Secure Access: Environment-based password authentication
- 📊 System Monitoring: Pattern counts, health checks, retrain status
- 📤 Model Export: Download ML models + training data as ZIP files
- 📥 Model Import: Restore models from backups (solves deployment model loss!)
- 🔄 Manual Retrain: Trigger retraining anytime
- 🚀 Production Ready: Complete backup/restore workflow
Access URL: http://localhost:8000/admin.html
Key Use Case: Export models before deployment → Import after deployment → No learning lost! ✨
See ADMIN_DASHBOARD_GUIDE.md for complete usage guide!
- 6 Major Algorithms: Ensemble Learning, Bayesian Inference, Fuzzy Matching, Multi-pass Parsing, Weighted Scoring, Adaptive Learning
- 75-105% Improvement: Over basic regex parsing
- 94.8% Overall Accuracy: Up from 82% in v3.1.0
- Confidence Scores: ML-calibrated confidence for every extraction
See V3.2.0_ENHANCED_SUMMARY.md for v3.2.0 details.
- 🌟 Universal Pattern Learning: Field-agnostic learning for ANY field (merchant, bank_name, beneficiary, VPA, transfer_type, etc.)
- 🚀 Zero Code Changes: New fields automatically learned without modifying code
- 🧠 Three Learning Modes:
- Refinement: Clean messy extractions (learn to remove " PA", " Avl Limit")
- Discovery: Find missed fields (learn bank_name from SMS structure)
- Structural: Learn from context (beneficiary after "to", type after "via")
- 🎯 Six Extraction Strategies: Multi-stage fallback pipeline for maximum accuracy
- ⚡ Real-Time Learning: <5ms to learn pattern from feedback
- 📈 Generalization: Learn from ONE SMS → Apply to ALL similar SMS
- ♾️ Future-Proof: Add
beneficiary_emailtomorrow → Auto-learned!
- 🚀 6 Advanced Algorithms: Ensemble Learning, Bayesian Inference, Fuzzy Matching, Multi-pass Parsing, Weighted Scoring, Adaptive Learning
- 🎯 94.8% Overall Accuracy: Up from 82% in basic parsers
- 🔧 Enhanced Specialized Parsers: EnhancedCardParser, EnhancedUPIParser, EnhancedBankParser
- 📊 Confidence Scoring: ML-calibrated confidence for every field extraction
- 🧠 Fuzzy Matching: 167 merchant normalizations (e.g., "AMZN" → "Amazon")
- 🔄 Adaptive Learning: Self-improvement from usage patterns (pattern_stats.json)
- 📈 Feature Importance: Weighted scoring (35% card mask, 30% amount, etc.)
- 🔢 Multi-pass Parsing: 4-stage extraction (normalize → context → extract → validate)
- 🎯 Transaction Type Classification: ML model determines CARD, UPI, or BANK_TRANSFER
- 🔧 Specialized Parsers: Separate ML models for each transaction type
- 📚 Continuous Learning: Models improve from user feedback
- 💾 Local File Storage: All training data in JSON files
- 🎨 High Accuracy: 80-85% accuracy with basic parsers
- Pattern Recognition: Multi-stage context-aware extraction
- Unicode Support: Handles Union Bank and other special characters
- 80%+ Accuracy: Improved merchant and field extraction
- Simple Parsing: Basic regex-based extraction
- ~60% Accuracy: Foundation for more advanced versions
- Privacy-First: No SMS reading permissions required - users copy/paste messages
- Multi-Bank Support: Works with 20+ Indian banks
- Fraud Detection: Flag suspicious transactions
- RESTful API: Clean FastAPI endpoints for mobile app integration
SMS Input
↓
Pass 0A: Exact Cache Check (100% confidence if match)
↓
Pass 0B: Universal Pattern Learning Check ⭐ NEW!
│ (Template match? Apply learned patterns for ALL fields)
│ ↓
│ Template: "Spent INR {AMOUNT} {BANK} Card no. {CARD} at {MERCHANT}"
│ Patterns: {merchant: {removal_rules, stop_words, position_hints},
│ bank_name: {structural_pattern, position_hints}, ...}
│ ↓
│ Apply Six-Strategy Extraction:
│ 1. Structural Pattern (most specific)
│ 2. Position Hints (after/before keywords)
│ 3. Stop Words (boundary detection)
│ 4. Reasonable Chunk (1-5 words)
│ 5. Removal Rules (cleanup)
│ 6. Validation (quality check)
↓
Pass 1: Text Normalization (Unicode NFKD, special char mapping)
↓
Pass 2: Context Building (Bayesian priors, keyword detection)
↓
Pass 3: ML Type Classifier → Enhanced Specialized Parser
│ ↓
├─ CARD → EnhancedCardParser (Ensemble Learning, Fuzzy Matching)
├─ UPI → EnhancedUPIParser (Priority extraction: path > VPA → merchant)
└─ BANK_TRANSFER → EnhancedBankParser (IMPS/NEFT/RTGS detection)
↓
Pass 4: Apply Universal Learned Patterns (if not Pass 0B)
↓
Pass 5: Confidence Scoring (Weighted features, business rules)
↓
Result + Confidence (0-100%) + Source (cache/template/parser)
↓
User Feedback → Universal Learning ⭐
│ ↓
│ For EACH corrected field:
│ 1. Analyze SMS structure (amounts, cards, keywords, segments)
│ 2. Learn field pattern (removal_rules, stop_words, position_hints)
│ 3. Store in template_corrections.json
│ ↓
│ Next SMS with same template → Auto-apply patterns!
Key Features:
- 🌟 Universal Learning: Field-agnostic pattern learning (works for ANY field!)
- Ensemble Learning: Multiple patterns with weighted voting
- Fuzzy Matching: merchant_aliases.csv (167 normalizations)
- Bayesian Inference: Context-aware scoring (+10-15% boost)
- Adaptive Learning: pattern_stats.json (self-improvement)
See ADVANCED_ALGORITHMS.md and UNIVERSAL_LEARNING_QUICKSTART.md for details.
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env # Edit .env with your configuration -
Run database migrations:
alembic upgrade head
-
Start the development server:
uvicorn app.main:app --reload
The API will be available at http://localhost:8000 with interactive docs at http://localhost:8000/docs.
app/
├── main.py # FastAPI application entry point
├── core/ # Core configuration and utilities
├── api/ # API routes and endpoints
├── models/ # Database models
├── schemas/ # Pydantic schemas
├── services/ # Business logic and ML services
├── ml/ # ML models and training scripts
└── database.py # Database configuration
POST /transactions/parse- Parse SMS text and extract transaction detailsGET /transactions/- List user transactionsPOST /transactions/- Create/update transactionGET /categories/- Get transaction categoriesPOST /categories/learn- Train category classification
Run tests:
pytestFormat code:
black app/
isort app/MIT License
Follow these steps to deploy this FastAPI app to Render.com:
-
Create a new Web Service on Render and connect your GitHub repository (branch: main).
-
Set the Build Command to one of the following (use the second if you get Rust/Cargo/maturin errors):
Simple (default):
pip install -r requirements.txt
If you see Cargo/maturin errors during package metadata preparation ("Read-only file system"), use the included build helper which sets writable Cargo dirs:
bash render_build.sh
-
Set the Start Command to:
gunicorn -k uvicorn.workers.UvicornWorker app.main:app -b 0.0.0.0:$PORT -
Add these environment variables in Render's dashboard:
- MODEL_PATH = ./models
- CREATE_TABLES = false
- SECRET_KEY = (generate a secure secret)
- SPACY_MODEL = en_core_web_sm
-
If your app uses a pre-trained model file (
category_classifier.joblib), make sure the file is committed tomodels/orapp/models/, and setMODEL_PATHaccordingly. -
Deploy. Render will install dependencies, build, and run the service.
Notes:
- For small apps, the free Render plan is sufficient for testing. For production, use a paid plan and secure environment variables.
- If you prefer Docker, add a
Dockerfileand deploy via a Docker service on Render.
app/ ├── main.py # FastAPI application entry point ├── core/ │ ├── config.py # Configuration settings ├── database.py # Database setup ├── models/ │ └── transaction.py # Database models ├── schemas/ │ └── transaction.py # Pydantic schemas ├── services/ │ └── sms_parser.py # SMS parsing service ├── ml/ │ ├── category_classifier.py # ML category prediction │ └── fraud_detector.py # Fraud detection └── api/ ├── transactions.py # Transaction endpoints └── categories.py # Category endpoints
Key Features Implemented: SMS Parser Service: Rule-based SMS parsing that extracts:
Transaction amount Merchant name Date/time Card mask (last 4 digits) Transaction type (debit/credit) Account balance ML Models:
Category Classifier: Uses TF-IDF + Naive Bayes for automatic categorization Fraud Detector: Rule-based suspicious transaction detection Database Models:
Transaction storage with all extracted fields Category management Merchant pattern learning API Endpoints:
POST /api/v1/transactions/parse - Parse SMS text Transaction CRUD operations Category management Next Steps: To complete the setup, you'll need to:
Install Python (if not already installed) Install dependencies: pip install -r requirements.txt Set up database: Configure your database URL in .env Run migrations: alembic upgrade head Start the server: uvicorn app.main:app --reload