Skip to content

A modular book recommendation engine with custom training pipeline and Streamlit UI, deployed on AWS EC2 for scalable production use. Implements deployment best practices for real-world deployment.

Notifications You must be signed in to change notification settings

RohitKrish46/book-recommender-system

Repository files navigation

Book Recommender System

Python 3.10+ uv License: MIT Docker AWS EC2 docker pulls Streamlit 1.45.1+

Book Recommender System is a modular and extensible book recommendation engine, featuring a custom training pipeline, robust logging and exception handling, and an interactive Streamlit interface for real-time book recommendations. The project is fully deployed on AWS EC2, providing a scalable, production-ready environment for live usage and demonstrations. Designed with clean architecture and best practices, it supports both experimentation and real-world deployment.

Table of Contents

Overview

This system recommends books based on collaborative filtering. It includes:

  • A multi-stage pipeline for training the recommender.

  • A Streamlit UI for training the model and getting recommendations.

  • Modular design for easy extension and experimentation.

All components (data ingestion, transformation, training, and inference) are wrapped with logging and exception handling for production readiness.

Features

  • Book Recommendations using Nearest Neighbors on user ratings.

  • Trainable Engine: Execute an end-to-end pipeline for data ingestion to model training.

  • Modular Design: Add or replace pipeline components independently.

  • Robust Logging and Error Management with centralized log tracking.

  • Interactive UI using Streamlit for recommendation queries and pipeline triggers.

  • Model Persistence with pickle-based artifact storage and reusability.

Architecture and Workflow

Book Recommendation Architecture

image

Model Training Pipeline Workflow

image

Pipeline Workflow

The training pipeline consists of the following steps:

  1. Data Ingestion:
    Downloads and ingests the dataset using a factory pattern. Only downloads if the file does not exist.

  2. Data Validation / Preprocessing:
    Validates and preprocesses the ingested data.

  3. Data Transformation:
    Transforms the validated data for model training.

  4. Model Training:
    Trains the recommendation model and saves the trained model artifact.

Each step is wrapped in exception handling and logs errors using the internal logging system.

AWS EC2 Deployment Guide

Deploying the Streamlit App on AWS EC2

1. Launch an EC2 Instance

  • Log in to your AWS Console.
  • Launch/Create a Ubuntu-based EC2 instance.
  • Configure port 8501 to be open in the security group (for Streamlit access).

2. Connect to the EC2 Instance from inside the AWS console

3. Set Up Docker

# Update system and install dependencies
sudo apt-get update -y
sudo apt-get upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker

4. Deploy our project

# Clone your project (replace with your repo)
git clone https://github.com/RohitKrish46/book-recommender-system.git
cd book-recommender-system

# Build and run the Docker image (adviseable to use your docker user_id -> docker build -t {username}/bookapp:latest .)
docker build -t rokrr/bookapp:latest .
docker run -d -p 8501:8501 rokrr/bookapp

5. Access the App Open your browser and navigate to: http://<EC2_PUBLIC_IP>:8501

Additional commands (Optional)

1. Stop/Remove Containers

docker stop <container_id>
docker rm $(docker ps -a -q)

2. Push/Pull Docker Image

docker login
docker push entbappy/stapp:latest   # Push to registry
docker pull entbappy/stapp:latest    # Pull latest image

Usage

Initial Setup Using UV (Recommended)

1. Install UV (if not already installed):

# bash
pipx install uv

# Using curl
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with pip
python -m pip install uv

2. Create & activate a virtual environment:

uv venv <virtual-env-name>

3. Activate Envoirnment

# Linux/macOS
source <virtual-env-name>/bin/activate

# Windows
.\<virtual-env-name>\Scripts\Activate

4. Clone this repo

git clone https://github.com/RohitKrish46/book-recommender-system.git

Now just get all your content into the uv managed repo

5. Install Dependencies:

uv pip install -r requirements.txt

Train the Recommendation Engine

python main.py

This triggers the entire training pipeline:

  • Ingests and validates data
  • Builds pivot tables
  • Trains Nearest Neighbors model
  • Saves trained artifacts for inference

Run the Streamlit App

streamlit run app.py

Visit http://localhost:8501 to interact with the UI.

Streamlit UI Features:

  • Train Engine: Run training from the UI
  • Get Recommendations: Type a book name to see similar recommendations
  • Cover Images: Displays book covers alongside titles

Application Screenshots

  1. Home page: A page with an introduction to the app image
  2. Train Recommender system: Click the button to freshly train the recommender system image
  3. Get similar recommendations: Choose a book you like image
  4. About this app image

Internal Conventions

  • Logging:
    Uses recommender.logger.log for consistent logging across modules.

  • Exception Handling:
    All major operations are wrapped in try/except blocks and raise AppException for unified error management.

  • Configuration:
    Uses an AppConfiguration object to manage paths for models and serialized objects.

  • Artifacts:
    Trained models and serialized data (e.g., pivot tables, ratings) are loaded and saved using pickle.

Folder Structure

book-recommender-system/
├── app.py                           # Streamlit app interface
├── main.py                          # Training pipeline trigger
├── recommender/
│   ├── components/                  # All modular pipeline steps
│   │   ├── data_ingestion.py
│   │   ├── data_validation.py
│   │   ├── data_transformation.py
│   │   └── model_training.py
│   ├── constants/
│   │   └── __init__.py              # constant configs
│   ├── entity/
│   │   └── config_entity.py         # Dataclass for configs
│   ├── exception/
│   │   └── exception.py             # AppException class
│   ├── logger/
│   │   └── log.py                   # AppLogger class
│   ├── pipelines/
│   │   └── training_pipeline.py     # Orchestrates all components
│   └── utils/
│       └── load_yaml.py             # AppConfiguration manager
├── artifacts/                     
│   ├── dataset/                 
│   │   ├── clean_data/              # Preprocessed data
│   │   ├── ingested_data/           # Extracted csv's
│   │   ├── raw_data/                # Dataset's raw zip file
│   │   └── transformed_data/        # Pivot files
│   ├── serialized_objects/          # Pickle files
│   ├── trained_model/               # Stores trained model
├── config/
│   │   ├── config.yaml/             # Main Configuration
├── templates/
│   │   ├── book_names.pkl/          # Book Names for Streamlit
├── Dockerfile                       # Docker Image Config
├── requirements.txt
└── README.md

Technologies Stack

Programming Language

Python

Machine Learning

Scikit-learn SciPy

Data Manipulation & Visualization

Pandas NumPy Seaborn Matplotlib

Deployment & Web Framework

Docker AWS EC2 Streamlit

Package Manager

uv

Future Improvements

Below are some enhancements planned for future versions:

  • Hybrid Recommendation Models: Combine collaborative filtering with content-based filtering using book metadata (genres, authors, descriptions, etc.) for improved personalization.

  • Incorporate NLP Models: Integrate transformer-based models like BERT to analyze book descriptions or user reviews for semantic recommendations.

  • CI/CD Integration: Set up automated testing and deployment workflows using GitHub Actions or similar tools.

  • Graph-based Recommendation: Explore knowledge graph embeddings or user-book interaction graphs to enhance recommendation diversity and explainability.

About

A modular book recommendation engine with custom training pipeline and Streamlit UI, deployed on AWS EC2 for scalable production use. Implements deployment best practices for real-world deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published