Venmito: Data Engineering Project

Overview

Venmito is a full-stack data engineering project designed to clean, process, and analyze data from various formats into a structured SQLite database. The project includes a Flask backend for data processing and API handling, and a Vite/React frontend for user interaction and visualization. There is comprehensive documentation in the Documentation folder for both the front-end and back-end.

The project follows a systematic approach:

Data Processing and Cleaning: Datasets from JSON, YAML, CSV, and XML formats are ingested and cleaned using Python and Pandas. The processed data is saved as structured CSV files.
Database Creation: The cleaned data is loaded into an SQLite database (venmito.db) via SQLAlchemy, creating a relational structure with defined tables and relationships.
Frontend Display with Filtering: Full data tables are displayed on the React frontend with filtering options by various criteria such as names, devices, and promotions. The frontend incorporates dynamic visualizations like pie charts, bar charts, and line graphs for better data interpretation.
API Endpoints for Additional Insights: The backend provides RESTful APIs to query data, generate statistics, and derive actionable insights using the OpenAI API for advanced analysis.

The goal is to provide insights about clients, transactions, and promotions by consolidating and querying data from diverse sources such as JSON, YAML, CSV, and XML.

Features

Data Ingestion and Cleaning:
- Processes raw datasets in multiple formats (JSON, YAML, CSV, XML) using Python and Pandas.
- Cleans and normalizes data, saving processed outputs to CSV files.
- Please note: The ingestion files rely on relative paths from the root directory because they are invoked by flask_app.py. By default, the processed files and the final database is given. However, if the processed csv files in backend/data/processed (but not the processed folder itself!) and the venmito.db file in backend/storage are removed, they will be automatically regenerated when the backend server (flask_app.py) is run. To test the ingestion files individually, their relative paths must be adjusted accordingly within their respective scripts.
Database Integration:
- Consolidates processed data into an SQLite database (venmito.db) using SQLAlchemy.
- Creates Python objects for database tables for seamless queries.
API Endpoints:
- Provides RESTful API endpoints for querying and analyzing data.
- Integrates a Large Language Model (LLM) via the OpenAI API to generate actionable business insights from processed data.
- Examples include filtering clients and retrieving statistics.
Frontend Features:
- Built with React and Vite, styled with Tailwind CSS.
- Pages for filtering and visualizing data (e.g., using charts for client distributions).
Technology Stack:
- Backend:
  - Flask: Lightweight framework for building RESTful APIs.
  - SQLAlchemy: ORM for interacting with the SQLite database.
  - SQLite: Lightweight, file-based relational database.
  - Pandas: For data cleaning, processing, and transformation.
  - OpenAI API: Used to generate additional insights.
- Frontend:
  - React: Component-based library for building user interfaces.
  - Vite: Fast build tool for frontend development.
  - Tailwind CSS: Utility-first CSS framework for styling.
  - Chart.js: For visualizing data with charts.
  - React-Router-Dom: For client-side routing.

Setup Instructions

Prerequisites

Python 3.8+
Node.js (v18+) and npm/yarn

Backend Setup

Clone the repository:
```
git clone <repository-url>
```
Navigate to the backend directory:
```
cd backend
```
Create a Python virtual environment:
- On Mac/Linux:
```
python3 -m venv venv
```
- On Windows:
```
python -m venv venv
```
Activate the virtual environment:
- On Mac/Linux:
```
source venv/bin/activate
```
- On Windows:
```
venv\Scripts\activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install the required dependencies:
```
npm install
```

Environment Configuration

This repository includes a .env file that contains an OpenAI API key required for accessing the LLM insights functionality.

Navigate to the .env file located in the backend directory.
Insert API key with your own OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key_here
```
Save the file and restart the backend server to apply the changes.

By default, the provided key should suffice for evaluation. However, replacing it with your own ensures you have full control over API usage and avoids potential issues with shared key limitations.

Running the Project

Backend

Ensure your Python virtual environment is activated (refer to the activation steps above).
Navigate to the backend directory:
```
cd backend
```
Start the Flask backend:
```
python flask_app.py
```
The backend will be available at:
http://127.0.0.1:5000

Frontend

Open a new terminal (keep the backend terminal running) and navigate to the frontend directory:
```
cd frontend
```
Start the React frontend:
```
npm run dev
```
The frontend will be available at:
http://localhost:5173

Usage Instructions

Backend

Run the Flask backend:
```
python flask_app.py
```
The backend will be available at http://127.0.0.1:5000.
Access API endpoints:
- Example: Filter people by name and location:
```
GET /api/people/filter?first_name=John&city=New York
```
- Query insights, e.g., device distribution:
```
GET /api/insights/devices
```

Frontend

Run the React frontend (in a second terminal). Navigate to the frontend directory and run the following command:
```
npm run dev
```
The frontend will be available at http://localhost:5173.

PLEASE NOTE

When navigating to a page, it may take a few moments for the data to load. If the page appears empty initially, it should populate after a short delay. This is particularly true for the Insights page, which interacts with OpenAI's API. While data is being fetched, a "loading insights" message will be displayed. Please allow a few moments for this process to complete.

Project Structure

Backend

flask_app.py: Main entry point, initializes data processing and serves APIs.
ingestion/: Contains scripts for loading and cleaning raw datasets.
api/: Defines API endpoints.
models/: SQLAlchemy models for database tables.
storage/: Stores the SQLite database and related files.

Frontend

src/pages/: React components for application pages (e.g., PeoplePage, PromotionsPage).
src/components/: Reusable components (e.g., Navbar, PieChart).
App.jsx: Main application file, defines routes and navigation.

API Documentation

People API

Filter People:
- Endpoint: GET /api/people/filter
- Query Parameters: first_name, last_name, city, country, device (array)
- Example:
```
GET /api/people/filter?first_name=Jane&country=USA&device=Android
```
Get Statistics:
- Endpoint: GET /api/people/stats
- Description: Returns stats on clients, such as device ownership and country distribution.

Design Decisions

SQLite: Chosen for simplicity and ease of setup.
React with Vite: Enables fast and modular frontend development.
Flask: Lightweight and efficient for API and backend logic.

Future Improvements

Improve AI Insights through fine-tuning or an RAG Architecture
Make front end responsive

Developed by Pranav Vishal
Email: pvishal@uwaterloo.ca

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Documentation		Documentation
backend		backend
frontend		frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Venmito: Data Engineering Project

Overview

Features

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Environment Configuration

Running the Project

Backend

Frontend

Usage Instructions

Backend

Frontend

PLEASE NOTE

Project Structure

Backend

Frontend

API Documentation

People API

Design Decisions

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

pranavishal/Venmito

Folders and files

Latest commit

History

Repository files navigation

Venmito: Data Engineering Project

Overview

Features

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Environment Configuration

Running the Project

Backend

Frontend

Usage Instructions

Backend

Frontend

PLEASE NOTE

Project Structure

Backend

Frontend

API Documentation

People API

Design Decisions

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages