Cashboard: Expense Classifier

Cashboard is a robust, extensible Python solution for automated expense classification, enrichment, and reporting from UPI-heavy bank statements. Built to learn, adapt, and makes expense tracking a breeze.

Tags:
ETL finance data-engineering SQLAlchemy CLI bank-statements classification reporting personal-finance automation

🚀 Introduction

Motivation:

Bank statements today contain a large number of UPI transactions that are extremely difficult to categorize using any available tools. Cashboard is designed to solve this problem by identifying commonly occurring UPI patterns in bank statements and recognizing recurring transactions of the same category. It memorizes these patterns, making it easier and quicker to categorize transactions over time.

Cashboard Classifier solves the tedious problem of manually categorizing and analyzing personal or business bank transactions. It ingests raw bank statements, applies intelligent classification (rule-based and AI-ready), enriches data with Paytm UPI lookups, supports manual correction, and exports clean, categorized data for reporting or further analysis.

Problem: Manual expense tracking is error-prone, time-consuming, and non-scalable.
Solution: An automated, modular pipeline for ingesting, transforming, classifying, and reporting on financial transactions.
Impact: Saves hours of manual work, improves financial visibility, and enables data-driven decision making.

✨ Features

Command-Line Interface (CLI): One-command processing of bank statements.
Programmatic API: Use the pipeline in your own Python scripts.
Extensible Pipeline: Modular ETL stages for easy customization and extension.
Database Integration: SQLAlchemy ORM, Alembic migrations, and persistent storage.
Manual Correction: Export uncategorized transactions for review and keyword enrichment.
Paytm UPI Support: Enriches uncategorized records with Paytm data.
Reporting: Generate clean CSVs and database tables for downstream analysis.
Visualization-ready: (Planned) hooks for dashboards and visual analytics.
Robust Error Handling: Validates input, handles edge cases, and logs issues.
Highly Configurable: Supports multiple banks, custom columns, and user-defined categories.

See full feature list & usage →

🏗️ Project Architecture

flowchart TD
    A["User Input (CLI/Script)"] --> B["Ingestor"]
    B --> J["Checkpoint Store (DB/File)"]
    B --> C["Transformer"]
    C --> J["Checkpoint Store (DB/File)"]
    C --> D["Paytm Lookup (optional)"]
    D --> E["Classifier"]
    E --> J["Checkpoint Store (DB/File)"]
    E --> F["File Correction / Manual Correction"]
    F --> G["Database (SQLAlchemy)"]
    F --> H["Reporting/Export"]
    G --> H
    H --> I["Final Output (CSV/DB/Report)"]
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#bfb,stroke:#333,stroke-width:2px
    style J fill:#ffa500,stroke:#333,stroke-width:2px
    style G fill:#ffa500,stroke:#333,stroke-width:2px

Ingestor: Loads and standardizes raw bank/Paytm files.
Transformer: Extracts and normalizes transaction details.
Paytm Lookup: Enriches uncategorized records with Paytm UPI data.
Classifier: Assigns categories using rule-based (and optionally AI) logic.
File/Manual Correction: Enables user-driven enrichment and learning.
Database: Stores all stages and mappings for persistence and analytics.
Reporting/Export: Outputs clean, categorized data for analysis or BI tools.

Deep dive: System Architecture →

⚙️ How It Works

Ingestion: Load CSV/XLSX bank statements, standardize columns, and clean data.
Transformation: Extract payment mode, payee, UPI ID, and derive new features.
Paytm Lookup (optional): Match uncategorized transactions with Paytm UPI data for better classification.
Classification: Assign categories using a keyword-driven, extensible rule engine.
Manual/File Correction: Export uncategorized records for user review and keyword enrichment.
Publishing: Save final, categorized data to CSV and/or database for reporting.
Reporting & Visualization: (Planned) Generate reports and dashboards for insights.

Technical workflow details →

🛠️ Technology Stack

Layer	Technology	Rationale/Highlights
Language	Python 3.8+	Modern, robust, and widely used for data engineering
Data Handling	pandas	Fast, flexible ETL and data manipulation
ORM/DB	SQLAlchemy	Scalable, production-grade database integration
Migrations	Alembic	Reliable schema evolution
CLI	argparse	Simple, user-friendly command-line interface
Packaging	setuptools/PEP517	Standards-based, easy to distribute
Testing	(Planned) pytest	For robust, automated testing
Reporting	pandas, CSV	Easy export and downstream analysis
Visualization	(Planned)	Dashboard hooks, visual analytics

🧠 Design Decisions & Engineering Highlights

Modular Pipeline: Each ETL stage is a separate, testable component.
Extensibility: Add new banks, categories, or enrichment steps with minimal code changes.
Database Schema: Normalized, extensible, and migration-ready (Alembic).
Error Handling: Validates input, handles missing/invalid data, and logs issues.
Performance: Vectorized pandas operations, batch DB writes, and progress storage.
User-Centric: Interactive CLI, manual correction, and easy customization.
Real-World Ready: Handles messy, real-world bank/UPI data and evolving requirements.

🚦 Getting Started

Installation

# Recommended: use a virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Or install in editable mode
pip install -e .

CLI Usage

expense-classifier --path <BANK_STATEMENT_FILE> --bank-code <BANK_CODE> --account <ACCOUNT_NAME> [options]

See FEATURES.md for all CLI options and examples.

Programmatic Usage

from expense_classifier.pipeline import Pipeline

pipeline = Pipeline(
    bank_code="SBI",
    file_path="my_statement.csv",
    account_name="Savings Account",
    paytm_lookup=True,
    paytm_file_path="paytm.xlsx"
)
pipeline.ingest()
pipeline.transform()
pipeline.join_paytm()
pipeline.categorize()
pipeline.file_correction()
final_df, final_table, final_path = pipeline.publish_data()

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines, code style, and how to get started.

📄 License

This project is licensed under the MIT License. See LICENSE for details.

📚 More Documentation

ARCHITECTURE.md: System architecture and design deep dive
FEATURES.md: Full feature list and usage examples
DATA_ENGINEERING.md: ETL/data pipeline technical details
CONTRIBUTING.md: Contribution guidelines
VISUALIZATION.md: Reporting and dashboarding options

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
expense_classifier		expense_classifier
migrations		migrations
reporting		reporting
static		static
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Categorization_Plan.md		Categorization_Plan.md
DATA_ENGINEERING.md		DATA_ENGINEERING.md
FEATURES.md		FEATURES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VISUALIZATION.md		VISUALIZATION.md
Visualizer_Options.md		Visualizer_Options.md
generate_report.py		generate_report.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cashboard: Expense Classifier

🚀 Introduction

✨ Features

🏗️ Project Architecture

⚙️ How It Works

🛠️ Technology Stack

🧠 Design Decisions & Engineering Highlights

🚦 Getting Started

Installation

CLI Usage

Programmatic Usage

🤝 Contributing

📄 License

📚 More Documentation

About

Uh oh!

Releases 2

Packages

Languages

License

Petrinax/expense-classifier

Folders and files

Latest commit

History

Repository files navigation

Cashboard: Expense Classifier

🚀 Introduction

✨ Features

🏗️ Project Architecture

⚙️ How It Works

🛠️ Technology Stack

🧠 Design Decisions & Engineering Highlights

🚦 Getting Started

Installation

CLI Usage

Programmatic Usage

🤝 Contributing

📄 License

📚 More Documentation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages