Skip to content

exoplanet-spaceapps/back_end_1.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exoplanet Prediction Backend Service

A production-ready Flask-based API service for serving machine learning models to predict exoplanet characteristics. This service supports multiple model types including Random Forest, XGBoost, and Neural Networks with ensemble capabilities.

Features

  • Multiple Model Support: Random Forest, XGBoost, and PyTorch Neural Networks
  • Ensemble Predictions: Combine multiple models for improved accuracy
  • Batch Processing: Process multiple files in a single request
  • Data Validation: Comprehensive input validation and preprocessing
  • Feature Alignment: Automatic feature matching between training and inference
  • Configurable: Flexible configuration through JSON files or environment variables
  • Production Ready: Logging, error handling, and monitoring capabilities
  • RESTful API: Clean, well-documented endpoints
  • Type Safety: Comprehensive type hints throughout the codebase

Project Structure

back_end_1.0/
├── __init__.py          # Legacy Flask application
├── app.py              # Main Flask application with all improvements
├── config.py           # Configuration management
├── logger.py           # Logging utilities
├── io_utils.py         # Input/output and data handling
├── preprocess.py       # Data preprocessing and feature engineering
├── models.py           # Model loading and prediction utilities
├── requirements.txt    # Python dependencies
└── tests/              # Unit tests
    ├── test_io_utils.py
    ├── test_preprocess.py
    └── test_models.py

Installation

Requirements

  • Python 3.8 or higher
  • pip package manager
  • (Optional) CUDA-capable GPU for PyTorch acceleration

Setup Steps

  1. Clone the repository:
git clone <repository-url>
cd back_end_1.0
  1. Create a virtual environment:
python -m venv venv

# On Windows
venv\Scripts\activate

# On Linux/Mac
source venv/bin/activate
  1. Install dependencies:
# Install core dependencies
pip install -r back_end_1_0/requirements.txt

# For development (includes testing and linting tools)
pip install -r back_end_1_0/requirements.txt
  1. Create configuration file (optional):
# Create a config.json file in the project root
cat > config.json << EOF
{
  "api": {
    "host": "0.0.0.0",
    "port": 5000,
    "debug": false,
    "max_file_size": 104857600,
    "enable_cors": true
  },
  "models": {
    "rf": {
      "path": "path/to/rf_model.pkl",
      "aligner_path": "path/to/rf_aligner.pkl",
      "threshold": 0.5,
      "enabled": true
    },
    "xgb": {
      "path": "path/to/xgb_model.pkl",
      "aligner_path": "path/to/xgb_aligner.pkl",
      "threshold": 0.5,
      "enabled": true
    },
    "nn": {
      "path": "path/to/nn_model.pt",
      "aligner_path": "path/to/nn_aligner.pkl",
      "threshold": 0.5,
      "batch_size": 1024,
      "enabled": true
    }
  },
  "logging": {
    "level": "INFO",
    "file_path": "logs/app.log",
    "enable_console": true,
    "enable_file": true
  }
}
EOF

Usage

Starting the Server

Development Mode

# Using the Flask development server
python -m back_end_1_0.app

# Or with environment variables
export FLASK_APP=back_end_1_0.app
export FLASK_ENV=development
flask run

Production Mode

# Using Gunicorn (recommended for production)
gunicorn -w 4 -b 0.0.0.0:5000 back_end_1_0.app:create_app()

# With custom configuration
gunicorn -w 4 -b 0.0.0.0:5000 --timeout 300 --max-requests 1000 back_end_1_0.app:create_app()

Environment Variables

You can configure the service using environment variables:

# API Configuration
export API_HOST=0.0.0.0
export API_PORT=5000
export API_DEBUG=false

# Model Paths
export MODEL_RF_PATH=/path/to/rf_model.pkl
export MODEL_RF_ALIGNER_PATH=/path/to/rf_aligner.pkl
export MODEL_XGB_PATH=/path/to/xgb_model.pkl
export MODEL_XGB_ALIGNER_PATH=/path/to/xgb_aligner.pkl
export MODEL_NN_PATH=/path/to/nn_model.pt
export MODEL_NN_ALIGNER_PATH=/path/to/nn_aligner.pkl

# Logging
export LOG_LEVEL=INFO
export LOG_FILE=logs/app.log

# Configuration File
export CONFIG_FILE=config.json

API Endpoints

Health Check

GET /health

Returns service health status and available models.

Service Information

GET /info

Returns service capabilities and configuration.

Data Validation

POST /validate
Content-Type: multipart/form-data

Parameters:
- file: CSV or ZIP file
- is_zip: "true" if file is ZIP (optional)

Validates uploaded data without making predictions.

Random Forest Predictions

POST /predict/rf
Content-Type: multipart/form-data

Parameters:
- file: CSV file with features
- model_path: Path to model file (optional, uses config)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column for metrics (optional)
- threshold: Classification threshold (optional, default: 0.5)

XGBoost Predictions

POST /predict/xgb
Content-Type: multipart/form-data

Parameters:
- file: CSV or ZIP file
- is_zip: "true" if file is ZIP
- model_path: Path to model file (optional)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column (optional)
- threshold: Classification threshold (optional)

Neural Network Predictions

POST /predict/nn
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- model_path: Path to PyTorch model (optional)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column (optional)
- threshold: Classification threshold (optional)
- batch_size: Batch size for inference (optional)

Ensemble Predictions

POST /predict/ensemble
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- models: Comma-separated model names (default: "rf,xgb,nn")
- weights: Comma-separated weights (optional)
- voting: "soft" or "hard" (default: "soft")
- label_col: Name of label column (optional)

All Models Predictions

POST /predict
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- label_col: Name of label column (optional)

Returns predictions from all configured models.

Batch Processing

POST /batch
Content-Type: multipart/form-data

Parameters:
- files: Multiple CSV files
- model: Model type to use (default: "ensemble")

Response Format

All prediction endpoints return JSON responses with the following structure:

{
  "model": "model_name",
  "predictions": [0, 1, 0, ...],
  "probabilities": [0.23, 0.87, 0.15, ...],
  "threshold": 0.5,
  "n_samples": 100,
  "n_features": 50,
  "metrics": {
    "accuracy": 0.85,
    "precision": 0.82,
    "recall": 0.88,
    "f1": 0.85
  }
}

Testing

Running Unit Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=back_end_1_0 --cov-report=html

# Run specific test file
pytest tests/test_models.py

# Run with verbose output
pytest -v

# Run only fast tests (exclude slow tests)
pytest -m "not slow"

Code Quality

# Format code with Black
black back_end_1_0/

# Check code style with flake8
flake8 back_end_1_0/

# Type checking with mypy
mypy back_end_1_0/

# Sort imports with isort
isort back_end_1_0/

Model Training

To train models compatible with this service:

  1. Feature Aligner: Save the feature names from training
from back_end_1_0.preprocess import FeatureAligner

# During training
aligner = FeatureAligner(feature_names=X_train.columns.tolist())
aligner.save("model_aligner.pkl")
  1. Scikit-learn Models: Use joblib to save
import joblib
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
joblib.dump(model, "rf_model.pkl")
  1. PyTorch Models: Save state dict
import torch

# Save model
torch.save(model.state_dict(), "nn_model.pt")

# Model architecture must match ExoplanetModel class

Docker Deployment

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app

COPY back_end_1_0/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY back_end_1_0/ ./back_end_1_0/
COPY config.json .

EXPOSE 5000

CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "back_end_1_0.app:create_app()"]

Build and run:

docker build -t exoplanet-backend .
docker run -p 5000:5000 -v /path/to/models:/models exoplanet-backend

Performance Optimization

  1. Use batch processing for multiple files
  2. Enable model caching in production
  3. Use GPU acceleration for PyTorch models when available
  4. Implement request rate limiting for public APIs
  5. Use a reverse proxy (nginx) in production
  6. Enable response compression for large predictions

Troubleshooting

Common Issues

  1. Model not found error:

    • Check model paths in config.json
    • Ensure model files exist and are readable
    • Verify file permissions
  2. Memory errors with large files:

    • Adjust max_file_size in configuration
    • Use batch processing for large datasets
    • Increase available RAM
  3. Slow predictions:

    • Enable GPU acceleration for PyTorch
    • Reduce batch size for neural networks
    • Use ensemble only when necessary
  4. Feature mismatch errors:

    • Ensure feature aligner is properly configured
    • Check that input data has expected columns
    • Verify preprocessing pipeline matches training

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

  • Create an issue on GitHub
  • Contact the development team
  • Check the documentation

Changelog

Version 1.0.0 (Current)

  • Initial release with multi-model support
  • Comprehensive refactoring and improvements
  • Added ensemble predictions
  • Implemented configuration management
  • Added logging and monitoring
  • Created comprehensive test suite
  • Added batch processing capabilities
  • Improved error handling and validation

About

Flask backend to run model inference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •