Personal Finance SMS Parser - Backend

A FastAPI backend service for a privacy-first personal finance app that extracts transaction details from bank SMS notifications using production-grade advanced algorithms and universal pattern learning.

⚡ Recent Updates (v3.3.0 - Jan 2025)

🌟 NEW: Universal Learning System

The game-changer that revolutionizes SMS parsing!

🧠 Field-Agnostic Learning: Automatically learns extraction patterns for ANY field (merchant, bank_name, beneficiary, VPA, etc.)
🚀 Zero Code Changes: Add new fields → System auto-learns patterns
📊 Three Learning Modes:
- Refinement: Clean up messy extractions (remove " PA", " Avl Limit")
- Discovery: Find completely missed fields (learn bank_name from structure)
- Structural: Learn from SMS structure and context
🎯 Six Extraction Strategies: Structural → Position → Stop Words → Text → Removal → Validation
⚡ Lightning Fast: <5ms learning, ~3-5ms application per SMS
📈 80-95% Accuracy: After just 3-5 corrections per pattern type
♾️ Infinite Scalability: Works for ANY field automatically

How it works: User corrects once (e.g., "Axis Bank Card") → System learns pattern → Automatically extracts for different banks (e.g., "ICICI Bank Card") ✨

See UNIVERSAL_LEARNING_QUICKSTART.md for complete guide!

🤖 NEW: Auto-Retraining System

The breakthrough that makes ML models independent of JSON files!

🔄 Automatic Retraining: After 10 feedbacks, automatically converts JSON patterns → ML training data → Retrains models
📦 ML Independence: Models internalize learned patterns, reducing JSON dependency from 100% → 10-20%
⚡ Performance Boost: 5-10ms (JSON lookup) → 3-7ms (pure ML inference)
🎯 Gradual Evolution: Week 1: 90% JSON → Month 3: 80% ML → Month 6+: 95% ML
🚀 Production Ready: Deploy just the ML models, JSON becomes optional fallback
📊 Track Progress: /retrain/status endpoint shows retraining stats and ML adoption

How it works:

User gives feedback → Patterns stored in JSON (immediate learning)
After 10+ feedbacks → Auto-converts patterns to training data
Retrains merchant classifier, field extractors, type classifier
Updated models handle 80-90% of requests WITHOUT JSON!

Result: Best of both worlds - instant learning (JSON) + internalized knowledge (ML)! 🎉

See AUTO_RETRAIN_SYSTEM.md for complete technical guide!

🎛️ NEW: Admin Dashboard

Password-protected web interface for ML model management!

🔒 Secure Access: Environment-based password authentication
📊 System Monitoring: Pattern counts, health checks, retrain status
📤 Model Export: Download ML models + training data as ZIP files
📥 Model Import: Restore models from backups (solves deployment model loss!)
🔄 Manual Retrain: Trigger retraining anytime
🚀 Production Ready: Complete backup/restore workflow

Access URL: http://localhost:8000/admin.html

Key Use Case: Export models before deployment → Import after deployment → No learning lost! ✨

See ADMIN_DASHBOARD_GUIDE.md for complete usage guide!

🚀 Previous Updates (v3.2.0 - Oct 2025)

Advanced Algorithms Implementation

6 Major Algorithms: Ensemble Learning, Bayesian Inference, Fuzzy Matching, Multi-pass Parsing, Weighted Scoring, Adaptive Learning
75-105% Improvement: Over basic regex parsing
94.8% Overall Accuracy: Up from 82% in v3.1.0
Confidence Scores: ML-calibrated confidence for every extraction

See V3.2.0_ENHANCED_SUMMARY.md for v3.2.0 details.

🚀 Features

V3.3 (Latest) - Universal Learning System ⭐

🌟 Universal Pattern Learning: Field-agnostic learning for ANY field (merchant, bank_name, beneficiary, VPA, transfer_type, etc.)
🚀 Zero Code Changes: New fields automatically learned without modifying code
🧠 Three Learning Modes:
- Refinement: Clean messy extractions (learn to remove " PA", " Avl Limit")
- Discovery: Find missed fields (learn bank_name from SMS structure)
- Structural: Learn from context (beneficiary after "to", type after "via")
🎯 Six Extraction Strategies: Multi-stage fallback pipeline for maximum accuracy
⚡ Real-Time Learning: <5ms to learn pattern from feedback
📈 Generalization: Learn from ONE SMS → Apply to ALL similar SMS
♾️ Future-Proof: Add beneficiary_email tomorrow → Auto-learned!

V3.2 - Production-Grade Advanced Algorithms

🚀 6 Advanced Algorithms: Ensemble Learning, Bayesian Inference, Fuzzy Matching, Multi-pass Parsing, Weighted Scoring, Adaptive Learning
🎯 94.8% Overall Accuracy: Up from 82% in basic parsers
🔧 Enhanced Specialized Parsers: EnhancedCardParser, EnhancedUPIParser, EnhancedBankParser
📊 Confidence Scoring: ML-calibrated confidence for every field extraction
🧠 Fuzzy Matching: 167 merchant normalizations (e.g., "AMZN" → "Amazon")
🔄 Adaptive Learning: Self-improvement from usage patterns (pattern_stats.json)
📈 Feature Importance: Weighted scoring (35% card mask, 30% amount, etc.)
🔢 Multi-pass Parsing: 4-stage extraction (normalize → context → extract → validate)

V3.1 - ML-Powered Multi-Stage Parser

🎯 Transaction Type Classification: ML model determines CARD, UPI, or BANK_TRANSFER
🔧 Specialized Parsers: Separate ML models for each transaction type
📚 Continuous Learning: Models improve from user feedback
💾 Local File Storage: All training data in JSON files
🎨 High Accuracy: 80-85% accuracy with basic parsers

V2 - Advanced Pattern-Based Parser

Pattern Recognition: Multi-stage context-aware extraction
Unicode Support: Handles Union Bank and other special characters
80%+ Accuracy: Improved merchant and field extraction

V1 - Basic Rule-Based Parser

Simple Parsing: Basic regex-based extraction
~60% Accuracy: Foundation for more advanced versions

Core Features (All Versions)

Privacy-First: No SMS reading permissions required - users copy/paste messages
Multi-Bank Support: Works with 20+ Indian banks
Fraud Detection: Flag suspicious transactions
RESTful API: Clean FastAPI endpoints for mobile app integration

🏗️ V3.3 Architecture (with Universal Learning)

SMS Input
    ↓
Pass 0A: Exact Cache Check (100% confidence if match)
    ↓
Pass 0B: Universal Pattern Learning Check ⭐ NEW!
    │    (Template match? Apply learned patterns for ALL fields)
    │    ↓
    │    Template: "Spent INR {AMOUNT} {BANK} Card no. {CARD} at {MERCHANT}"
    │    Patterns: {merchant: {removal_rules, stop_words, position_hints},
    │               bank_name: {structural_pattern, position_hints}, ...}
    │    ↓
    │    Apply Six-Strategy Extraction:
    │    1. Structural Pattern (most specific)
    │    2. Position Hints (after/before keywords)
    │    3. Stop Words (boundary detection)
    │    4. Reasonable Chunk (1-5 words)
    │    5. Removal Rules (cleanup)
    │    6. Validation (quality check)
    ↓
Pass 1: Text Normalization (Unicode NFKD, special char mapping)
    ↓
Pass 2: Context Building (Bayesian priors, keyword detection)
    ↓
Pass 3: ML Type Classifier → Enhanced Specialized Parser
    │                           ↓
    ├─ CARD → EnhancedCardParser (Ensemble Learning, Fuzzy Matching)
    ├─ UPI → EnhancedUPIParser (Priority extraction: path > VPA → merchant)
    └─ BANK_TRANSFER → EnhancedBankParser (IMPS/NEFT/RTGS detection)
    ↓
Pass 4: Apply Universal Learned Patterns (if not Pass 0B)
    ↓
Pass 5: Confidence Scoring (Weighted features, business rules)
    ↓
Result + Confidence (0-100%) + Source (cache/template/parser)
    ↓
User Feedback → Universal Learning ⭐
    │    ↓
    │    For EACH corrected field:
    │    1. Analyze SMS structure (amounts, cards, keywords, segments)
    │    2. Learn field pattern (removal_rules, stop_words, position_hints)
    │    3. Store in template_corrections.json
    │    ↓
    │    Next SMS with same template → Auto-apply patterns!

Key Features:

🌟 Universal Learning: Field-agnostic pattern learning (works for ANY field!)
Ensemble Learning: Multiple patterns with weighted voting
Fuzzy Matching: merchant_aliases.csv (167 normalizations)
Bayesian Inference: Context-aware scoring (+10-15% boost)
Adaptive Learning: pattern_stats.json (self-improvement)

See ADVANCED_ALGORITHMS.md and UNIVERSAL_LEARNING_QUICKSTART.md for details.

🚀 Quick Start

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Run database migrations:
```
alembic upgrade head
```
Start the development server:
```
uvicorn app.main:app --reload
```

The API will be available at http://localhost:8000 with interactive docs at http://localhost:8000/docs.

Project Structure

app/
├── main.py              # FastAPI application entry point
├── core/                # Core configuration and utilities
├── api/                 # API routes and endpoints
├── models/              # Database models
├── schemas/             # Pydantic schemas
├── services/           # Business logic and ML services
├── ml/                 # ML models and training scripts
└── database.py         # Database configuration

API Endpoints

POST /transactions/parse - Parse SMS text and extract transaction details
GET /transactions/ - List user transactions
POST /transactions/ - Create/update transaction
GET /categories/ - Get transaction categories
POST /categories/learn - Train category classification

Development

Run tests:

pytest

Format code:

black app/
isort app/

License

MIT License

Deploying to Render (Quick guide)

Follow these steps to deploy this FastAPI app to Render.com:

Create a new Web Service on Render and connect your GitHub repository (branch: main).
Set the Build Command to one of the following (use the second if you get Rust/Cargo/maturin errors):

Simple (default):
```
pip install -r requirements.txt
```
If you see Cargo/maturin errors during package metadata preparation ("Read-only file system"), use the included build helper which sets writable Cargo dirs:
```
bash render_build.sh
```

Set the Start Command to:

gunicorn -k uvicorn.workers.UvicornWorker app.main:app -b 0.0.0.0:$PORT

Add these environment variables in Render's dashboard:
- MODEL_PATH = ./models
- CREATE_TABLES = false
- SECRET_KEY = (generate a secure secret)
- SPACY_MODEL = en_core_web_sm
If your app uses a pre-trained model file (category_classifier.joblib), make sure the file is committed to models/ or app/models/, and set MODEL_PATH accordingly.
Deploy. Render will install dependencies, build, and run the service.

Notes:

For small apps, the free Render plan is sufficient for testing. For production, use a paid plan and secure environment variables.
If you prefer Docker, add a Dockerfile and deploy via a Docker service on Render.

app/ ├── main.py # FastAPI application entry point ├── core/ │ ├── config.py # Configuration settings ├── database.py # Database setup ├── models/ │ └── transaction.py # Database models ├── schemas/ │ └── transaction.py # Pydantic schemas ├── services/ │ └── sms_parser.py # SMS parsing service ├── ml/ │ ├── category_classifier.py # ML category prediction │ └── fraud_detector.py # Fraud detection └── api/ ├── transactions.py # Transaction endpoints └── categories.py # Category endpoints

Key Features Implemented: SMS Parser Service: Rule-based SMS parsing that extracts:

Transaction amount Merchant name Date/time Card mask (last 4 digits) Transaction type (debit/credit) Account balance ML Models:

Category Classifier: Uses TF-IDF + Naive Bayes for automatic categorization Fraud Detector: Rule-based suspicious transaction detection Database Models:

Transaction storage with all extracted fields Category management Merchant pattern learning API Endpoints:

POST /api/v1/transactions/parse - Parse SMS text Transaction CRUD operations Category management Next Steps: To complete the setup, you'll need to:

Install Python (if not already installed) Install dependencies: pip install -r requirements.txt Set up database: Configure your database URL in .env Run migrations: alembic upgrade head Start the server: uvicorn app.main:app --reload

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
app		app
data		data
docs		docs
exports		exports
models		models
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.renderignore		.renderignore
Dockerfile		Dockerfile
README.md		README.md
docs.html		docs.html
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personal Finance SMS Parser - Backend

⚡ Recent Updates (v3.3.0 - Jan 2025)

🌟 NEW: Universal Learning System

🤖 NEW: Auto-Retraining System

🎛️ NEW: Admin Dashboard

🚀 Previous Updates (v3.2.0 - Oct 2025)

Advanced Algorithms Implementation

🚀 Features

V3.3 (Latest) - Universal Learning System ⭐

V3.2 - Production-Grade Advanced Algorithms

V3.1 - ML-Powered Multi-Stage Parser

V2 - Advanced Pattern-Based Parser

V1 - Basic Rule-Based Parser

Core Features (All Versions)

🏗️ V3.3 Architecture (with Universal Learning)

🚀 Quick Start

Project Structure

API Endpoints

Development

License

Deploying to Render (Quick guide)

About

Uh oh!

Releases

Packages

Languages

arsallanShahab/finance-sms-parser

Folders and files

Latest commit

History

Repository files navigation

Personal Finance SMS Parser - Backend

⚡ Recent Updates (v3.3.0 - Jan 2025)

🌟 NEW: Universal Learning System

🤖 NEW: Auto-Retraining System

🎛️ NEW: Admin Dashboard

🚀 Previous Updates (v3.2.0 - Oct 2025)

Advanced Algorithms Implementation

🚀 Features

V3.3 (Latest) - Universal Learning System ⭐

V3.2 - Production-Grade Advanced Algorithms

V3.1 - ML-Powered Multi-Stage Parser

V2 - Advanced Pattern-Based Parser

V1 - Basic Rule-Based Parser

Core Features (All Versions)

🏗️ V3.3 Architecture (with Universal Learning)

🚀 Quick Start

Project Structure

API Endpoints

Development

License

Deploying to Render (Quick guide)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages