Exoplanet Prediction Backend Service

A production-ready Flask-based API service for serving machine learning models to predict exoplanet characteristics. This service supports multiple model types including Random Forest, XGBoost, and Neural Networks with ensemble capabilities.

Features

Multiple Model Support: Random Forest, XGBoost, and PyTorch Neural Networks
Ensemble Predictions: Combine multiple models for improved accuracy
Batch Processing: Process multiple files in a single request
Data Validation: Comprehensive input validation and preprocessing
Feature Alignment: Automatic feature matching between training and inference
Configurable: Flexible configuration through JSON files or environment variables
Production Ready: Logging, error handling, and monitoring capabilities
RESTful API: Clean, well-documented endpoints
Type Safety: Comprehensive type hints throughout the codebase

Project Structure

back_end_1.0/
├── __init__.py          # Legacy Flask application
├── app.py              # Main Flask application with all improvements
├── config.py           # Configuration management
├── logger.py           # Logging utilities
├── io_utils.py         # Input/output and data handling
├── preprocess.py       # Data preprocessing and feature engineering
├── models.py           # Model loading and prediction utilities
├── requirements.txt    # Python dependencies
└── tests/              # Unit tests
    ├── test_io_utils.py
    ├── test_preprocess.py
    └── test_models.py

Installation

Requirements

Python 3.8 or higher
pip package manager
(Optional) CUDA-capable GPU for PyTorch acceleration

Setup Steps

Clone the repository:

git clone <repository-url>
cd back_end_1.0

Create a virtual environment:

python -m venv venv

# On Windows
venv\Scripts\activate

# On Linux/Mac
source venv/bin/activate

Install dependencies:

# Install core dependencies
pip install -r back_end_1_0/requirements.txt

# For development (includes testing and linting tools)
pip install -r back_end_1_0/requirements.txt

Create configuration file (optional):

# Create a config.json file in the project root
cat > config.json << EOF
{
  "api": {
    "host": "0.0.0.0",
    "port": 5000,
    "debug": false,
    "max_file_size": 104857600,
    "enable_cors": true
  },
  "models": {
    "rf": {
      "path": "path/to/rf_model.pkl",
      "aligner_path": "path/to/rf_aligner.pkl",
      "threshold": 0.5,
      "enabled": true
    },
    "xgb": {
      "path": "path/to/xgb_model.pkl",
      "aligner_path": "path/to/xgb_aligner.pkl",
      "threshold": 0.5,
      "enabled": true
    },
    "nn": {
      "path": "path/to/nn_model.pt",
      "aligner_path": "path/to/nn_aligner.pkl",
      "threshold": 0.5,
      "batch_size": 1024,
      "enabled": true
    }
  },
  "logging": {
    "level": "INFO",
    "file_path": "logs/app.log",
    "enable_console": true,
    "enable_file": true
  }
}
EOF

Usage

Starting the Server

Development Mode

# Using the Flask development server
python -m back_end_1_0.app

# Or with environment variables
export FLASK_APP=back_end_1_0.app
export FLASK_ENV=development
flask run

Production Mode

# Using Gunicorn (recommended for production)
gunicorn -w 4 -b 0.0.0.0:5000 back_end_1_0.app:create_app()

# With custom configuration
gunicorn -w 4 -b 0.0.0.0:5000 --timeout 300 --max-requests 1000 back_end_1_0.app:create_app()

Environment Variables

You can configure the service using environment variables:

# API Configuration
export API_HOST=0.0.0.0
export API_PORT=5000
export API_DEBUG=false

# Model Paths
export MODEL_RF_PATH=/path/to/rf_model.pkl
export MODEL_RF_ALIGNER_PATH=/path/to/rf_aligner.pkl
export MODEL_XGB_PATH=/path/to/xgb_model.pkl
export MODEL_XGB_ALIGNER_PATH=/path/to/xgb_aligner.pkl
export MODEL_NN_PATH=/path/to/nn_model.pt
export MODEL_NN_ALIGNER_PATH=/path/to/nn_aligner.pkl

# Logging
export LOG_LEVEL=INFO
export LOG_FILE=logs/app.log

# Configuration File
export CONFIG_FILE=config.json

API Endpoints

Health Check

GET /health

Returns service health status and available models.

Service Information

GET /info

Returns service capabilities and configuration.

Data Validation

POST /validate
Content-Type: multipart/form-data

Parameters:
- file: CSV or ZIP file
- is_zip: "true" if file is ZIP (optional)

Validates uploaded data without making predictions.

Random Forest Predictions

POST /predict/rf
Content-Type: multipart/form-data

Parameters:
- file: CSV file with features
- model_path: Path to model file (optional, uses config)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column for metrics (optional)
- threshold: Classification threshold (optional, default: 0.5)

XGBoost Predictions

POST /predict/xgb
Content-Type: multipart/form-data

Parameters:
- file: CSV or ZIP file
- is_zip: "true" if file is ZIP
- model_path: Path to model file (optional)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column (optional)
- threshold: Classification threshold (optional)

Neural Network Predictions

POST /predict/nn
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- model_path: Path to PyTorch model (optional)
- aligner_path: Path to feature aligner (optional)
- label_col: Name of label column (optional)
- threshold: Classification threshold (optional)
- batch_size: Batch size for inference (optional)

Ensemble Predictions

POST /predict/ensemble
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- models: Comma-separated model names (default: "rf,xgb,nn")
- weights: Comma-separated weights (optional)
- voting: "soft" or "hard" (default: "soft")
- label_col: Name of label column (optional)

All Models Predictions

POST /predict
Content-Type: multipart/form-data

Parameters:
- file: CSV file
- label_col: Name of label column (optional)

Returns predictions from all configured models.

Batch Processing

POST /batch
Content-Type: multipart/form-data

Parameters:
- files: Multiple CSV files
- model: Model type to use (default: "ensemble")

Response Format

All prediction endpoints return JSON responses with the following structure:

{
  "model": "model_name",
  "predictions": [0, 1, 0, ...],
  "probabilities": [0.23, 0.87, 0.15, ...],
  "threshold": 0.5,
  "n_samples": 100,
  "n_features": 50,
  "metrics": {
    "accuracy": 0.85,
    "precision": 0.82,
    "recall": 0.88,
    "f1": 0.85
  }
}

Testing

Running Unit Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=back_end_1_0 --cov-report=html

# Run specific test file
pytest tests/test_models.py

# Run with verbose output
pytest -v

# Run only fast tests (exclude slow tests)
pytest -m "not slow"

Code Quality

# Format code with Black
black back_end_1_0/

# Check code style with flake8
flake8 back_end_1_0/

# Type checking with mypy
mypy back_end_1_0/

# Sort imports with isort
isort back_end_1_0/

Model Training

To train models compatible with this service:

Feature Aligner: Save the feature names from training

from back_end_1_0.preprocess import FeatureAligner

# During training
aligner = FeatureAligner(feature_names=X_train.columns.tolist())
aligner.save("model_aligner.pkl")

Scikit-learn Models: Use joblib to save

import joblib
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
joblib.dump(model, "rf_model.pkl")

PyTorch Models: Save state dict

import torch

# Save model
torch.save(model.state_dict(), "nn_model.pt")

# Model architecture must match ExoplanetModel class

Docker Deployment

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app

COPY back_end_1_0/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY back_end_1_0/ ./back_end_1_0/
COPY config.json .

EXPOSE 5000

CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "back_end_1_0.app:create_app()"]

Build and run:

docker build -t exoplanet-backend .
docker run -p 5000:5000 -v /path/to/models:/models exoplanet-backend

Performance Optimization

Use batch processing for multiple files
Enable model caching in production
Use GPU acceleration for PyTorch models when available
Implement request rate limiting for public APIs
Use a reverse proxy (nginx) in production
Enable response compression for large predictions

Troubleshooting

Common Issues

Model not found error:
- Check model paths in config.json
- Ensure model files exist and are readable
- Verify file permissions
Memory errors with large files:
- Adjust max_file_size in configuration
- Use batch processing for large datasets
- Increase available RAM
Slow predictions:
- Enable GPU acceleration for PyTorch
- Reduce batch size for neural networks
- Use ensemble only when necessary
Feature mismatch errors:
- Ensure feature aligner is properly configured
- Check that input data has expected columns
- Verify preprocessing pipeline matches training

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Create an issue on GitHub
Contact the development team
Check the documentation

Changelog

Version 1.0.0 (Current)

Initial release with multi-model support
Comprehensive refactoring and improvements
Added ensemble predictions
Implemented configuration management
Added logging and monitoring
Created comprehensive test suite
Added batch processing capabilities
Improved error handling and validation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
back_end_1_0		back_end_1_0
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.example.json		config.example.json
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini

exoplanet-spaceapps/back_end_1.0

Folders and files

Latest commit

History

Repository files navigation