Skip to content

phyllis-sy-wu/LearnDL

Repository files navigation

LearnDL: Sentiment Classification API

A Deep Learning API for sentiment analysis using transformer-based models (BERT). This project provides a complete pipeline for training, deploying, and using sentiment classification models through REST API.

Features

  • Data Preprocessing: Automated pipeline for text data preprocessing
  • Model Training Pipeline: Train custom sentiment classifiers using transformer embeddings
  • REST API: FastAPI-based endpoints for model inference and training
  • Cloud Storage Integration: Save and load models from cloud storage
  • Docker Support: Containerized deployment with Redis backend
  • Web Interface: React-based frontend for model management

Project Structure

LearnDL/
├── api/                    # FastAPI application
│   ├── main.py            # Main application entry point
│   └── router/            # API route handlers
├── data/                  # Data management
│   ├── read_data.py       # Data loading utilities
│   └── data.csv           # Training data (not in repo)
├── data_preprocess_pipeline/  # Text preprocessing
├── model_training_pipeline/   # Model training components
│   ├── train.py           # Training entry point
│   ├── classify_model.py  # Classifier architecture
│   └── embed_model.py     # Embedding model utilities
├── model_prediction/      # Inference components
├── cloud_storage/         # Cloud storage integration
├── database/              # Redis database client
├── slides-app/            # React frontend
├── requirements.txt       # Python dependencies
├── Dockerfile             # Docker container config
└── docker-compose.yaml    # Multi-container setup

Quick Start

Prerequisites

  • Python 3.11+
  • Docker and Docker Compose (for containerized deployment)
  • Git

Installation

  1. Clone the repository

    git clone <repository-url>
    cd LearnDL
  2. Choose your preferred way to run the application:

    Option A: Docker Compose (Recommended)

    docker-compose up --build

    Option B: Local Development

    1. Create virtual environment

      python -m venv venv
      source venv/bin/activate  # On Windows: venv\Scripts\activate
    2. Install dependencies

      pip install -r requirements.txt
    3. Set up environment variables

      cp .env.example .env
      # Edit .env with your configuration
    4. Start Redis (if running locally)

      redis-server
    5. Run the API

      python api/main.py

    The API will be available at http://localhost:8000

Configuration

Environment Variables

Create a .env file with the following variables:

# Redis Configuration
REDIS_HOST=localhost

# Cloud Storage (optional)
DO_REGION=<region>
DO_ENDPOINT=https://<region>.digitaloceanspaces.com
DO_ACCESS_KEY=<spaces-access-key>
DO_SECRET_KEY=<spaces-secret-key>
DO_BUCKET_NAME=<bucket-name>

API Usage

Once the API is running, access the interactive Swagger UI for testing and full documentation at: http://localhost:8000/docs

Health Check

curl http://localhost:8000/model_api/health_check

Model Training

This guide provides a step-by-step walkthrough of the training, monitoring, and inference workflow for the LearnDL platform.

1. Initiate Model Training (POST /model_api/train)

To start a training task, send a POST request with your specific model configuration.

Dataset Options
  • Local Dataset: Place your .csv file in the data/ directory of the project (e.g., data/data.csv).
  • Cloud Dataset: To use DigitalOcean Spaces, ensure your environment variables (DO_ACCESS_KEY, SECRET_KEY) are configured.

Public Dataset Options

Dataset Task num_classes Data Path URL
Spam SMS Spam Detection 2 https://deep-learning-project.tor1.cdn.digitaloceanspaces.com/public/spam.csv
IMDB Sentiment Analysis 2 https://deep-learning-project.tor1.cdn.digitaloceanspaces.com/public/IMDB.csv
News Topic Classification 20 https://deep-learning-project.tor1.cdn.digitaloceanspaces.com/public/News.csv
Training Request (CURL Example)
curl -X 'POST' 'http://localhost:8000/model_api/train' \
-H 'Content-Type: application/json' \
-d '{
  "user_id": "test",
  "training_session_id": "session_001",
  "config": {
    "classifier_config": { 
      "model_name": "default", 
      "classifier_type": "GRU", 
      "hidden_neurons": 512 
    },
    "embed_model_config": { 
      "embed_model": "bert_model", 
      "fine_tune_mode": "freeze_all" 
    },
    "training_config": { 
      "learning_rate": 0.001, 
      "batch_size": 16, 
      "n_epochs": 5 
    },
    "data_config": { 
      "data_path": "data/data.csv", 
      "class_map": { "label_to_id": {"ham": 0, "spam": 1} } 
    }
  }
}'

2. Monitor Training Status (GET /model_api/get_train_status)

Training is an asynchronous process. Use this endpoint to check the progress of your active session.

How to use

In the Swagger UI, input the user_id and training_session_id that you defined in Step 1.

curl -X 'GET' \
  'http://localhost:8000/model_api/get_train_status?user_id=test&training_session_id=session_001' \
  -H 'accept: application/json
Training Completion Response

Once the status reaches "completed", the API returns a comprehensive evaluation report including:

  • Metrics: Accuracy, Precision, Recall, and F1-Score.
  • Confusion Matrix: A visual breakdown of classification performance across all labels.
  • Learning Curves: Historical data for Loss and Accuracy (Train vs. Val) across epochs.
  • Attention Visualization: Raw scores showing which tokens influenced the model's decision.
  • Embedding 2D: A T-SNE projection of the data points in latent space.

2. Interrupt a Training (POST /model_api/cancel_train)

If you need to stop a training session before it completes, use the cancel_train endpoint.

curl -X 'POST' \
'http://localhost:8000/model_api/cancel_train?user_id=test&training_session_id=session_001' \
-H 'accept: application/json' \
-d ''

Behavior: Sending this request sets an internal stop_signal for the specific session. The backend will finish the current batch, clean up allocated GPU/CPU memory, and set the session status to cancelled.

3. Run Real-Time Predictions (POST /model_api/model_output)

After training is complete, the model weights (.pth) are saved to the cloud. You can now perform inference on new text.

Important

Critical Requirement: The config parameters (specifically hidden_neurons, num_classes, and classifier_type) MUST match the settings used during the training phase for the weights to load correctly.

Prediction Request (CURL Example)
curl -X 'POST' \
  'http://localhost:8000/model_api/model_output?user_id=test&training_session_id=session_001' \
  -H 'Content-Type: application/json' \
  -d '{
  "user_input": "Congratulations! You have won a $1,000 Walmart gift card. Click here to claim now.",
  "config": {
    "classifier_config": {
      "model_name": "default",
      "hidden_neurons": 512,
      "num_classes": 2,
      "classifier_type": "GRU"
    },
    "embed_model_config": { "embed_model": "bert_model", "fine_tune_mode": "freeze_all" },
    "data_config": {
      "class_map": {
        "id_to_label": { "0": "ham", "1": "spam" }
      }
    }
  }
}'
Prediction Output

The API returns the predicted label along with a confidence breakdown and an attention map:

{
  "predicted_label": "spam",
  "top_confidences": [
    { "class": "spam", "confidence": 0.98 },
    { "class": "ham", "confidence": 0.02 }
  ],
  "attention_visualization": {
    "text": "...",
    "tokens": ["congratulations", "won", "$", "1,000", "..."],
    "scores": [0.04, 0.02, 0.07, 0.01, "..."]
  }
}

Model Configuration

Training parameters can be customized via the API:

  • embed_model: Pre-trained transformer model (e.g., "bert-base-uncased", "roberta-base")
  • classifier_hidden_dims: Neural network architecture
  • learning_rate: Training learning rate
  • batch_size: Training batch size
  • num_epochs: Number of training epochs

Documentation

For detailed technical implementation and pipeline pseudocode, please refer to CONFIGURATION.md.

Additional documentation:

Data Format

The system expects training data in CSV format with the following structure:

input,output
"I really liked this product...",positive
"Not worth the money.",negative
  • Place your data.csv file in the data/ directory
  • Input column: text to classify
  • Output column: sentiment label ("positive" or "negative")

Development

Running Tests

Note: For an interactive walkthrough and example usage, see demo_notebook.ipynb in the project root.

Deployment

Production Deployment

  1. Build production image

    docker build -t learndl:latest .
  2. Run with external Redis

    docker run -p 8000:8000 \
      -e REDIS_HOST=your-redis-host \
      learndl:latest

Related Projects

For a complete fullstack version with frontend integration, see: ECE1724H_Advanced_Web

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors