Cashboard is a robust, extensible Python solution for automated expense classification, enrichment, and reporting from UPI-heavy bank statements. Built to learn, adapt, and makes expense tracking a breeze.
Tags:
ETL finance data-engineering SQLAlchemy CLI bank-statements classification reporting personal-finance automation
Motivation:
Bank statements today contain a large number of UPI transactions that are extremely difficult to categorize using any available tools. Cashboard is designed to solve this problem by identifying commonly occurring UPI patterns in bank statements and recognizing recurring transactions of the same category. It memorizes these patterns, making it easier and quicker to categorize transactions over time.
Cashboard Classifier solves the tedious problem of manually categorizing and analyzing personal or business bank transactions. It ingests raw bank statements, applies intelligent classification (rule-based and AI-ready), enriches data with Paytm UPI lookups, supports manual correction, and exports clean, categorized data for reporting or further analysis.
- Problem: Manual expense tracking is error-prone, time-consuming, and non-scalable.
- Solution: An automated, modular pipeline for ingesting, transforming, classifying, and reporting on financial transactions.
- Impact: Saves hours of manual work, improves financial visibility, and enables data-driven decision making.
- Command-Line Interface (CLI): One-command processing of bank statements.
- Programmatic API: Use the pipeline in your own Python scripts.
- Extensible Pipeline: Modular ETL stages for easy customization and extension.
- Database Integration: SQLAlchemy ORM, Alembic migrations, and persistent storage.
- Manual Correction: Export uncategorized transactions for review and keyword enrichment.
- Paytm UPI Support: Enriches uncategorized records with Paytm data.
- Reporting: Generate clean CSVs and database tables for downstream analysis.
- Visualization-ready: (Planned) hooks for dashboards and visual analytics.
- Robust Error Handling: Validates input, handles edge cases, and logs issues.
- Highly Configurable: Supports multiple banks, custom columns, and user-defined categories.
See full feature list & usage β
flowchart TD
A["User Input (CLI/Script)"] --> B["Ingestor"]
B --> J["Checkpoint Store (DB/File)"]
B --> C["Transformer"]
C --> J["Checkpoint Store (DB/File)"]
C --> D["Paytm Lookup (optional)"]
D --> E["Classifier"]
E --> J["Checkpoint Store (DB/File)"]
E --> F["File Correction / Manual Correction"]
F --> G["Database (SQLAlchemy)"]
F --> H["Reporting/Export"]
G --> H
H --> I["Final Output (CSV/DB/Report)"]
style A fill:#f9f,stroke:#333,stroke-width:2px
style I fill:#bfb,stroke:#333,stroke-width:2px
style J fill:#ffa500,stroke:#333,stroke-width:2px
style G fill:#ffa500,stroke:#333,stroke-width:2px
- Ingestor: Loads and standardizes raw bank/Paytm files.
- Transformer: Extracts and normalizes transaction details.
- Paytm Lookup: Enriches uncategorized records with Paytm UPI data.
- Classifier: Assigns categories using rule-based (and optionally AI) logic.
- File/Manual Correction: Enables user-driven enrichment and learning.
- Database: Stores all stages and mappings for persistence and analytics.
- Reporting/Export: Outputs clean, categorized data for analysis or BI tools.
Deep dive: System Architecture β
- Ingestion: Load CSV/XLSX bank statements, standardize columns, and clean data.
- Transformation: Extract payment mode, payee, UPI ID, and derive new features.
- Paytm Lookup (optional): Match uncategorized transactions with Paytm UPI data for better classification.
- Classification: Assign categories using a keyword-driven, extensible rule engine.
- Manual/File Correction: Export uncategorized records for user review and keyword enrichment.
- Publishing: Save final, categorized data to CSV and/or database for reporting.
- Reporting & Visualization: (Planned) Generate reports and dashboards for insights.
Technical workflow details β
| Layer | Technology | Rationale/Highlights |
|---|---|---|
| Language | Python 3.8+ | Modern, robust, and widely used for data engineering |
| Data Handling | pandas | Fast, flexible ETL and data manipulation |
| ORM/DB | SQLAlchemy | Scalable, production-grade database integration |
| Migrations | Alembic | Reliable schema evolution |
| CLI | argparse | Simple, user-friendly command-line interface |
| Packaging | setuptools/PEP517 | Standards-based, easy to distribute |
| Testing | (Planned) pytest | For robust, automated testing |
| Reporting | pandas, CSV | Easy export and downstream analysis |
| Visualization | (Planned) | Dashboard hooks, visual analytics |
- Modular Pipeline: Each ETL stage is a separate, testable component.
- Extensibility: Add new banks, categories, or enrichment steps with minimal code changes.
- Database Schema: Normalized, extensible, and migration-ready (Alembic).
- Error Handling: Validates input, handles missing/invalid data, and logs issues.
- Performance: Vectorized pandas operations, batch DB writes, and progress storage.
- User-Centric: Interactive CLI, manual correction, and easy customization.
- Real-World Ready: Handles messy, real-world bank/UPI data and evolving requirements.
More on design & engineering β
# Recommended: use a virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Or install in editable mode
pip install -e .expense-classifier --path <BANK_STATEMENT_FILE> --bank-code <BANK_CODE> --account <ACCOUNT_NAME> [options]- See FEATURES.md for all CLI options and examples.
from expense_classifier.pipeline import Pipeline
pipeline = Pipeline(
bank_code="SBI",
file_path="my_statement.csv",
account_name="Savings Account",
paytm_lookup=True,
paytm_file_path="paytm.xlsx"
)
pipeline.ingest()
pipeline.transform()
pipeline.join_paytm()
pipeline.categorize()
pipeline.file_correction()
final_df, final_table, final_path = pipeline.publish_data()Contributions are welcome! Please see CONTRIBUTING.md for guidelines, code style, and how to get started.
This project is licensed under the MIT License. See LICENSE for details.
- ARCHITECTURE.md: System architecture and design deep dive
- FEATURES.md: Full feature list and usage examples
- DATA_ENGINEERING.md: ETL/data pipeline technical details
- CONTRIBUTING.md: Contribution guidelines
- VISUALIZATION.md: Reporting and dashboarding options
