Skip to content

Latest commit

 

History

History
35 lines (25 loc) · 1.4 KB

File metadata and controls

35 lines (25 loc) · 1.4 KB

ReceiptAnalysis

Project for a receipt analysis of a dataset from kaggle.

Full report (CS only) is here

Requirements

  • This project has not been optimized to run on a PC with any RAM size, 32Gb is thus recommended to analyze 100k of records.
  • For more records, optimization or bigger RAM is necessary.
  • Python v3.11.x (used through pyenv)

Usage

  1. Clone the repository
  2. Register to Kaggle
  3. Download dataset eCommerce purchase history from electronics store
  4. Place the dataset (kz.csv) into /data directory
  5. Install poetry
  6. Install depencences cd /path/to/ReceiptAnalysis && poetry install
  7. Run preprocessing poetry run preprocessing
  8. Run clustering poetry run clustering
  9. Generate associative rules poetry run associative_rules

Outputs

  • Saved numpy matrices are in /data
  • Saved matplotlib figures are in /images
  • Saved text outputs of the scripts are in /outputs
  • Saved clusters and cluster rules are in /rules

Used tutorials