Skip to content

Acquarts/flight-price-ml-production

Repository files navigation

✈️ Flight Price ML Production

End-to-end Machine Learning system for flight price prediction with REST API, web interface, CI/CD pipeline, and AWS deployment

Python FastAPI Streamlit XGBoost scikit-learn Pandas AWS Docker pytest CI/CD License: MIT


📋 Table of Contents


🎯 Overview

Production-ready Machine Learning system that predicts flight prices in India using historical data. The project demonstrates MLOps best practices with:

  • ML Model: XGBoost regressor achieving R² ~0.988
  • REST API: FastAPI with automatic Swagger documentation
  • Web Interface: Interactive Streamlit dashboard
  • CI/CD: Automated testing and deployment with GitHub Actions
  • Cloud Infrastructure: AWS Elastic Beanstalk with auto-scaling
  • Testing: Comprehensive test suite with pytest

Dataset: 300,153 historical flight records with 11 predictor variables

Performance Metrics:

  • RMSE: ~2,450 INR
  • MAE: ~1,260 INR
  • R²: ~0.988

🏗️ Architecture

┌──────────────────────────────────────────────────────────────┐
│                         Users                                │
└────────────┬─────────────────────────┬───────────────────────┘
             │                         │
    ┌────────▼─────────┐      ┌────────▼──────────┐
    │  Streamlit UI    │      │   External Apps   │
    │  (Frontend)      │      │   (API Clients)   │
    └────────┬─────────┘      └────────┬──────────┘
             │                         │
             └─────────┬───────────────┘
                       │
              ┌────────▼────────────────────┐
              │    FastAPI REST API         │
              │  ┌──────────────────────┐   │
              │  │  Request Validation  │   │
              │  │     (Pydantic)       │   │
              │  └──────────┬───────────┘   │
              │             │               │
              │  ┌──────────▼───────────┐   │
              │  │  Preprocessing       │   │
              │  │  Pipeline            │   │
              │  └──────────┬───────────┘   │
              │             │               │
              │  ┌──────────▼───────────┐   │
              │  │  XGBoost Model       │   │
              │  │  (R² = 0.988)        │   │
              │  └──────────────────────┘   │
              └─────────────────────────────┘
                       │
         ┌─────────────┴──────────────┐
         │                            │
    ┌────▼──────────┐        ┌────────▼─────────┐
    │ AWS Elastic   │        │  GitHub Actions  │
    │ Beanstalk     │        │  (CI/CD)         │
    │               │        │                  │
    │ • Auto-scaling│        │ • Auto-test      │
    │ • Load Balancer│       │ • Auto-deploy    │
    │ • CloudWatch  │        │                  │
    └───────────────┘        └──────────────────┘

Data Flow:

  1. User inputs flight parameters via Streamlit UI or API
  2. FastAPI validates input with Pydantic schemas
  3. Preprocessing pipeline transforms features
  4. XGBoost model generates prediction
  5. Response returned with predicted price and metadata

✨ Key Features

🤖 Machine Learning

  • Optimized XGBoost model with RandomizedSearchCV hyperparameter tuning
  • Advanced feature engineering: cyclic encoding for time, target encoding for high-cardinality features
  • Reproducible pipeline with scikit-learn ColumnTransformer
  • High accuracy: RMSE ~2,450 INR, MAE ~1,260 INR, R² ~0.988
  • Feature importance analysis for model interpretability

🌐 REST API (FastAPI)

  • Individual prediction endpoint (POST /predict)
  • Batch prediction endpoint (POST /predict/batch)
  • Health monitoring (GET /health)
  • Feature importance analysis (GET /feature-importance)
  • Interactive documentation with Swagger UI (/docs)
  • Data validation with Pydantic
  • CORS enabled for cross-origin requests
  • Structured logging for debugging

💻 Web Interface (Streamlit)

  • Intuitive UI with real-time predictions
  • Single prediction form with dropdown selectors
  • Price comparison tool (same flight, different classes/dates)
  • CSV batch upload for multiple predictions
  • Feature importance visualization
  • API health status indicator
  • Download results as CSV

🚀 DevOps & Production

  • CI/CD pipeline with GitHub Actions
  • Automated testing with pytest (7 test cases)
  • Auto-deployment to AWS Elastic Beanstalk
  • Docker support for containerization
  • CloudWatch integration for logging and monitoring
  • Auto-scaling configuration
  • Application Load Balancer for high availability

🛠️ Tech Stack

Category Technologies
ML/Data Science XGBoost scikit-learn Pandas NumPy
Backend FastAPI Uvicorn Pydantic
Frontend Streamlit
Testing pytest pytest-cov
Cloud/DevOps AWS S3 CloudWatch
CI/CD GitHub Actions
Containerization Docker

📁 Project Structure

flight-price-ml-production/
├── .ebextensions/              # AWS Elastic Beanstalk configuration
│   ├── 01_python.config        # Python environment setup
│   └── 02_logging.config       # CloudWatch logging setup
├── .github/workflows/          # GitHub Actions CI/CD
│   └── deploy.yml              # Automated deployment workflow
├── src/                        # Source code
│   ├── api/                    # FastAPI application
│   │   └── main.py             # API endpoints and configuration
│   ├── data/                   # Data processing
│   │   └── make_dataset.py     # Data loading and cleaning
│   ├── features/               # Feature engineering
│   │   └── build_features.py   # Feature transformation pipeline
│   └── models/                 # ML models
│       ├── train.py            # Model training script
│       └── predict.py          # Prediction service
├── tests/                      # Test suite
│   └── test_api.py             # API endpoint tests
├── models/                     # Trained model artifacts
│   ├── xgboost_model.pkl       # Trained XGBoost model (~9MB)
│   └── preprocessor.pkl        # Fitted preprocessing pipeline (~4KB)
├── data/                       # Data directory
│   ├── raw/                    # Raw data files
│   └── processed/              # Processed data files
├── notebooks/                  # Jupyter notebooks (EDA, experiments)
├── application.py              # Entry point for Elastic Beanstalk
├── streamlit_app.py            # Streamlit web interface
├── requirements.txt            # Python dependencies
├── Dockerfile                  # Docker configuration (optional)
├── Procfile                    # Process file for EB
└── README.md                   # This file

💻 Local Installation

Prerequisites

  • Python 3.11 or higher
  • pip or conda package manager
  • Git
  • (Optional) Docker for containerization
  • (Optional) AWS CLI for cloud deployment

Quick Start

1. Clone the repository

git clone https://github.com/Acquarts/flight-price-ml-production.git
cd flight-price-ml-production

2. Set up Python environment

Option A: Using venv (Linux/Mac)

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Option B: Using venv (Windows)

python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt

Option C: Using conda (Recommended for Windows)

conda env create -f environment.yml
conda activate flight-price

3. Download dataset

Place the airlines_flights_data.csv file in:

data/raw/airlines_flights_data.csv

4. Train the model

python -m src.models.train

Output:

  • models/xgboost_model.pkl - Trained XGBoost model (~9MB)
  • models/preprocessor.pkl - Fitted preprocessing pipeline (~4KB)

Expected console output:

INFO - === INITIATING TRAINING ===
INFO - Loading data from data/raw/airlines_flights_data.csv
INFO - Data loaded: 300153 rows, 12 columns
INFO - Data cleaned: (300153, 11)
INFO - Time features encoded
INFO - Train: (240122, 12), Test: (60031, 12)
INFO - Preprocessor created with 30 features
INFO - Data transformed: (240122, 30)
INFO - Training XGBoost with params: {'n_estimators': 600, ...}
INFO - Model trained successfully
INFO - Model metrics:
  RMSE: 2458.58
  MAE: 1258.63
  R²: 0.9883
INFO - Model saved at models/xgboost_model.pkl
INFO - Preprocessor saved at models/preprocessor.pkl
INFO - === TRAINING COMPLETED ===

🚀 Usage

REST API

Start the API server

uvicorn application:application --reload --host 0.0.0.0 --port 8000

Server will start at: http://localhost:8000

Interactive API Documentation

Open in your browser:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

Method Endpoint Description
GET / Root endpoint with API info
GET /health Health check and model status
POST /predict Single flight price prediction
POST /predict/batch Batch predictions for multiple flights
GET /feature-importance Get model's feature importance
GET /docs Interactive Swagger documentation

Example: Single Prediction

cURL:

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "airline": "SpiceJet",
    "flight": "SG-8157",
    "source_city": "Delhi",
    "departure_time": "Evening",
    "stops": "zero",
    "arrival_time": "Night",
    "destination_city": "Mumbai",
    "class": "Economy",
    "duration": 2.17,
    "days_left": 1
  }'

Python:

import requests

url = "http://localhost:8000/predict"

payload = {
    "airline": "SpiceJet",
    "flight": "SG-8157",
    "source_city": "Delhi",
    "departure_time": "Evening",
    "stops": "zero",
    "arrival_time": "Night",
    "destination_city": "Mumbai",
    "class": "Economy",
    "duration": 2.17,
    "days_left": 1
}

response = requests.post(url, json=payload)
result = response.json()

print(f"Predicted Price: ₹{result['predicted_price']:.2f}")
# Output: Predicted Price: ₹8659.98

JavaScript (fetch):

const url = "http://localhost:8000/predict";

const payload = {
  airline: "SpiceJet",
  flight: "SG-8157",
  source_city: "Delhi",
  departure_time: "Evening",
  stops: "zero",
  arrival_time: "Night",
  destination_city: "Mumbai",
  class: "Economy",
  duration: 2.17,
  days_left: 1
};

fetch(url, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify(payload)
})
  .then(res => res.json())
  .then(data => console.log(`Predicted Price: ₹${data.predicted_price}`));

Response:

{
  "predicted_price": 8659.98,
  "currency": "INR",
  "timestamp": "2026-01-20T10:30:00.123456"
}

Example: Batch Prediction

curl -X POST "http://localhost:8000/predict/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "flights": [
      {
        "airline": "SpiceJet",
        "flight": "SG-8157",
        "source_city": "Delhi",
        "departure_time": "Evening",
        "stops": "zero",
        "arrival_time": "Night",
        "destination_city": "Mumbai",
        "class": "Economy",
        "duration": 2.17,
        "days_left": 1
      },
      {
        "airline": "Vistara",
        "flight": "UK-995",
        "source_city": "Delhi",
        "departure_time": "Morning",
        "stops": "zero",
        "arrival_time": "Afternoon",
        "destination_city": "Mumbai",
        "class": "Business",
        "duration": 2.25,
        "days_left": 1
      }
    ]
  }'

Response:

{
  "predictions": [
    {
      "predicted_price": 8659.98,
      "currency": "INR",
      "timestamp": "2026-01-20T10:30:00.123456"
    },
    {
      "predicted_price": 49954.50,
      "currency": "INR",
      "timestamp": "2026-01-20T10:30:00.123456"
    }
  ],
  "total": 2
}

Valid Values for API Fields

Airlines: SpiceJet, AirAsia, Vistara, GO_FIRST, Indigo, Air_India

Cities: Delhi, Mumbai, Bangalore, Kolkata, Hyderabad, Chennai

Time Slots: Early_Morning, Morning, Afternoon, Evening, Night, Late_Night

Stops: zero, one, two_or_more

Class: Economy, Business

Duration: Float (hours), range: 0.5 - 50.0

Days Left: Integer, range: 1 - 49


Streamlit Web Interface

The project includes a complete web interface built with Streamlit for non-technical users.

Running Streamlit Locally

Step 1: Start the API (in one terminal)

uvicorn application:application --reload --host 0.0.0.0 --port 8000

Step 2: Start Streamlit (in another terminal)

streamlit run streamlit_app.py

The web interface will automatically open at: http://localhost:8501

Streamlit Features

📍 Tab 1: Individual Prediction

Interactive form to predict single flight prices:

  • Flight Information

    • Select airline from dropdown
    • Choose origin and destination cities
    • Select cabin class (Economy/Business)
  • Schedule & Details

    • Departure and arrival time slots
    • Number of stops
    • Flight duration (hours)
    • Days until departure (slider: 1-49)
  • Prediction Result

    • Large display of predicted price in INR
    • Route, class, and airline summary
    • Factors influencing the price

Example workflow:

  1. Select "SpiceJet" as airline
  2. Choose "Delhi" → "Mumbai" route
  3. Set "Economy" class
  4. Select "Evening" departure, "Night" arrival
  5. Set "zero" stops, duration 2.17 hours
  6. Slide "Days left" to 1
  7. Click "🚀 Predict Price"
  8. View result: ₹8,659.98
📊 Tab 2: Batch Prediction

Two options for comparing multiple flights:

Option 1: Upload CSV File

  • Upload a CSV with required columns
  • Automatic batch prediction for all rows
  • View results in sortable table
  • Statistics: min/max/average prices
  • Download results as CSV

CSV Format:

airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left
SpiceJet,SG-8157,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1
Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Business,2.25,1

Option 2: Quick Comparator

  • Select airline, route, duration, stops
  • Automatically compares:
    • Economy vs Business class
    • Different booking dates (1, 7, 14, 30 days advance)
  • View in pivot table format
  • Insights on price variation by advance booking

Example output:

Class       | 1 day  | 7 days | 14 days | 30 days
------------|--------|--------|---------|--------
Economy     | ₹8,659 | ₹7,234 | ₹6,892  | ₹6,453
Business    | ₹49,954| ₹45,123| ₹42,876 | ₹40,234
📈 Tab 3: Analysis

Feature Importance Visualization

  • Click "🔍 Get Feature Importance"
  • View bar chart of top 15 features
  • See table with exact importance values
  • Interpretation guide

Top Features (typical):

  1. class_Business - ~45% importance
  2. class_Economy - ~38% importance
  3. duration - ~9% importance
  4. days_left - ~1.4% importance
  5. stops - ~1.2% importance

Model Information

  • Training data: 300,153 flights
  • Model: XGBoost Regressor
  • Metrics: RMSE ~2,450 INR, MAE ~1,260 INR, R² ~0.988
🔍 Sidebar Features

API Status Monitor (real-time):

  • ✅ API Active / ❌ API Unavailable
  • ✅ Model Loaded / ❌ Model Not Loaded
  • Auto-refreshes on page load

Information Panel:

  • Model details
  • Performance metrics
  • Quick start tips

Deploying Streamlit to Cloud

While this project focuses on FastAPI deployment to AWS Elastic Beanstalk, the Streamlit interface can be deployed separately:

Option 1: Streamlit Cloud (Free)

  1. Push your code to GitHub
  2. Go to share.streamlit.io
  3. Connect your GitHub repository
  4. Select streamlit_app.py as the main file
  5. Set environment variable:
    API_URL=https://your-elastic-beanstalk-url.amazonaws.com
    
  6. Deploy

Option 2: Heroku

# Create Procfile for Streamlit
echo "web: streamlit run streamlit_app.py --server.port=$PORT --server.address=0.0.0.0" > Procfile.streamlit

# Deploy
heroku create flight-price-streamlit
git push heroku main
heroku open

Option 3: Docker + AWS ECS

# Dockerfile.streamlit
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt streamlit_app.py ./
COPY src/ ./src/
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8501
CMD ["streamlit", "run", "streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
# Build and push to ECR
docker build -f Dockerfile.streamlit -t flight-price-streamlit .
docker tag flight-price-streamlit:latest <AWS_ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/flight-price-streamlit:latest
docker push <AWS_ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/flight-price-streamlit:latest

# Deploy to ECS (configure task definition and service)

Note: Update API_URL in streamlit_app.py (line 16) to point to your production API:

# For local development
API_URL = "http://localhost:8000"

# For production
API_URL = "http://your-app.elasticbeanstalk.com"

✅ Testing

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage report
pytest tests/ -v --cov=src --cov-report=term-missing

# Run specific test file
pytest tests/test_api.py -v

# Run specific test
pytest tests/test_api.py::test_predict_endpoint -v

Test Suite

Test Description Status
test_root_endpoint Validates root endpoint returns API info
test_health_check Verifies health endpoint and model status
test_predict_endpoint Tests single prediction with valid data
test_predict_invalid_data Validates data validation (422 error)
test_batch_prediction Tests batch prediction endpoint
test_feature_importance Verifies feature importance endpoint
test_docs_endpoint Checks Swagger documentation availability

Expected Output:

======================== test session starts ========================
tests/test_api.py::test_root_endpoint PASSED                  [ 14%]
tests/test_api.py::test_health_check PASSED                   [ 28%]
tests/test_api.py::test_predict_endpoint PASSED               [ 42%]
tests/test_api.py::test_predict_invalid_data PASSED           [ 57%]
tests/test_api.py::test_batch_prediction PASSED               [ 71%]
tests/test_api.py::test_feature_importance PASSED             [ 85%]
tests/test_api.py::test_docs_endpoint PASSED                  [100%]

========================= 7 passed in 1.73s =========================

☁️ AWS Deployment

Prerequisites

  • AWS account with appropriate IAM permissions
  • AWS CLI installed and configured
  • EB CLI installed: pip install awsebcli

Option 1: Manual Deployment with EB CLI

Step 1: Initialize Elastic Beanstalk

eb init -p python-3.11 flight-price-api --region us-east-1

Step 2: Create Environment

eb create flight-price-prod \
  --instance-type t3.medium \
  --envvars MODEL_PATH=models/xgboost_model.pkl,PREPROCESSOR_PATH=models/preprocessor.pkl,LOG_LEVEL=INFO

This creates:

  • EC2 instances (t3.medium)
  • Application Load Balancer
  • Auto Scaling Group (1-4 instances)
  • CloudWatch logging
  • Security groups

Wait 5-10 minutes for environment creation.

Step 3: Deploy Application

# Deploy current code
eb deploy

# Check status
eb status

# View logs
eb logs

# Open in browser
eb open

Step 4: Get Application URL

eb status | grep CNAME
# Output: CNAME: flight-price-prod.us-east-1.elasticbeanstalk.com

Your API will be available at:

http://flight-price-prod.us-east-1.elasticbeanstalk.com

Test it:

curl http://flight-price-prod.us-east-1.elasticbeanstalk.com/health

Managing the Environment

# Scale instances
eb scale 3

# SSH into instance
eb ssh

# View environment info
eb status --verbose

# Restart application
eb restart

# Terminate environment (to stop costs)
eb terminate flight-price-prod

Option 2: Automated Deployment with CI/CD

See CI/CD Pipeline section below.

Cost Management

Estimated Monthly Costs (us-east-1):

  • t3.medium instance: ~$30/month
  • Application Load Balancer: ~$20/month
  • CloudWatch + S3: ~$5/month
  • Total: ~$55/month

Cost Optimization:

  • Use t3.small for low traffic: ~$15/month
  • Terminate environment when not in use: $0
  • Use AWS Free Tier for first year
  • Enable auto-scaling with min=0 for dev environments

To stop costs:

eb terminate flight-price-prod

🔄 CI/CD Pipeline

GitHub Actions Workflow

The project includes a complete CI/CD pipeline that automatically:

  1. ✅ Runs tests on every push
  2. ✅ Builds deployment package
  3. ✅ Uploads to S3
  4. ✅ Deploys to Elastic Beanstalk (on main branch)

Setup Instructions

1. Configure GitHub Secrets

Go to your repository: SettingsSecrets and variablesActionsNew repository secret

Add these secrets:

Secret Name Value Description
AWS_ACCESS_KEY_ID Your AWS access key IAM user credentials
AWS_SECRET_ACCESS_KEY Your AWS secret key IAM user credentials
S3_BUCKET flight-price-deployments S3 bucket for artifacts

2. Create S3 Bucket

aws s3 mb s3://flight-price-deployments --region us-east-1

3. Configure Environment Name

Edit .github/workflows/deploy.yml (line 10):

env:
  AWS_REGION: us-east-1
  EB_APPLICATION_NAME: flight-price-api
  EB_ENVIRONMENT_NAME: flight-price-prod  # Match your EB environment name

4. Trigger Deployment

# Make a change
echo "# Update" >> README.md

# Commit and push
git add README.md
git commit -m "Trigger deployment"
git push origin main

5. Monitor Deployment

Go to: GitHubActions tab

You'll see the workflow running with these steps:

  1. 🧪 Test (runs pytest)
  2. 📦 Build (creates deployment package)
  3. ☁️ Deploy (uploads to EB)

Typical execution time: 2-3 minutes

Workflow File

The workflow is defined in .github/workflows/deploy.yml:

name: Deploy to AWS Elastic Beanstalk

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v --cov=src
  
  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v3
      - uses: aws-actions/configure-aws-credentials@v2
      - run: zip -r deploy.zip .
      - run: aws s3 cp deploy.zip s3://${{ secrets.S3_BUCKET }}/
      - run: aws elasticbeanstalk create-application-version ...
      - run: aws elasticbeanstalk update-environment ...

Deployment Stages

┌─────────────┐
│ git push    │
│  to main    │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Run Tests   │  ← pytest tests/
│             │
└──────┬──────┘
       │ PASS ✅
       ▼
┌─────────────┐
│Build Package│  ← zip deployment files
│             │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ Upload to S3│  ← aws s3 cp
│             │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│Create App   │  ← create-application-version
│  Version    │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Deploy to  │  ← update-environment
│     EB      │
└──────┬──────┘
       │
       ▼
┌─────────────┐
│    Live! 🎉 │  ← API accessible
└─────────────┘

🤖 Model Details

Algorithm

XGBoost (eXtreme Gradient Boosting) - Gradient boosted decision trees optimized for speed and performance.

Hyperparameters

Optimized using RandomizedSearchCV with 20 iterations:

{
    'n_estimators': 600,      # Number of boosting rounds
    'learning_rate': 0.2,     # Step size shrinkage
    'max_depth': 8,           # Maximum tree depth
    'subsample': 1.0,         # Fraction of samples for each tree
    'colsample_bytree': 0.8,  # Fraction of features for each tree
    'gamma': 1,               # Minimum loss reduction for split
    'random_state': 42        # Reproducibility
}

Feature Engineering

Total Features: ~30 after transformation

Transformations Applied:

  1. Cyclic Encoding for time features:

    time_sin = sin(2π × seconds / 86400)
    time_cos = cos(2π × seconds / 86400)

    Applied to: departure_time, arrival_time

  2. One-Hot Encoding for low-cardinality categorical:

    • airline (6 categories)
    • stops (3 categories)
    • class (2 categories)
  3. Target Encoding for high-cardinality categorical:

    • source_city (6 categories)
    • destination_city (6 categories)
  4. Numerical Features (passthrough):

    • duration
    • days_left
    • Time encoding features

Feature Importance

Top 10 features by importance:

Rank Feature Importance Impact
1 class_Business ~45% Strong positive (higher prices)
2 class_Economy ~38% Moderate positive
3 duration ~9% Positive correlation
4 days_left ~1.4% Negative correlation (book early = lower price)
5 stops ~1.2% More stops = lower price
6 departure_time_sin ~0.8% Time of day effect
7 arrival_time_cos ~0.7% Time of day effect
8 source_city (encoded) ~0.6% Origin city effect
9 destination_city (encoded) ~0.5% Destination city effect
10 airline (encoded) ~0.4% Carrier effect

Performance Metrics

Cross-Validation (5-fold):

  • RMSE: 2,449.52 ± 16.93 INR
  • MAE: 1,259.90 ± 7.60 INR
  • R²: 0.9884 ± 0.0002

Test Set:

  • RMSE: 2,458.58 INR (~11.8% of mean price)
  • MAE: 1,258.63 INR (~6.0% of mean price)
  • R²: 0.9883

Interpretation:

  • Model explains 98.8% of price variance
  • Average prediction error: ₹1,258.63
  • Root mean squared error: ₹2,458.58

Training Process

python -m src.models.train

Pipeline:

  1. Load data (300,153 flights)
  2. Clean data (remove duplicates)
  3. Encode time features (cyclic)
  4. Train/test split (80/20)
  5. Create preprocessing pipeline
  6. Fit preprocessor on training data
  7. Transform features
  8. Train XGBoost model
  9. Evaluate on test set
  10. Save model and preprocessor

Artifacts Generated:

  • models/xgboost_model.pkl - Trained model (~9MB)
  • models/preprocessor.pkl - Fitted pipeline (~4KB)

Model Insights

Key Findings:

  1. Class is the dominant factor: Business class flights cost ~5.7x more than Economy
  2. Early booking saves money: Booking 30 days vs 1 day in advance saves ~15-20%
  3. Direct flights are premium: Non-stop flights cost ~10-15% more
  4. Flight duration matters: Longer flights generally cost more (non-linear relationship)
  5. Airline differences: Vistara most expensive, Air India most economical
  6. Time of day: Evening/Night departures slightly more expensive

📚 API Documentation

Complete Endpoint Reference

GET /

Description: Root endpoint with API information

Response:

{
  "message": "Flight Price Predictor API",
  "version": "1.0.0",
  "docs": "/docs"
}

GET /health

Description: Health check and model status

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "timestamp": "2026-01-20T10:30:00.123456"
}

Status Codes:

  • 200: Service healthy
  • 503: Service unhealthy

POST /predict

Description: Predict price for a single flight

Request Body:

{
  "airline": "SpiceJet",
  "flight": "SG-8157",
  "source_city": "Delhi",
  "departure_time": "Evening",
  "stops": "zero",
  "arrival_time": "Night",
  "destination_city": "Mumbai",
  "class": "Economy",
  "duration": 2.17,
  "days_left": 1
}

Response:

{
  "predicted_price": 8659.98,
  "currency": "INR",
  "timestamp": "2026-01-20T10:30:00.123456"
}

Status Codes:

  • 200: Success
  • 422: Validation error
  • 500: Internal server error

POST /predict/batch

Description: Predict prices for multiple flights

Request Body:

{
  "flights": [
    {
      "airline": "SpiceJet",
      "flight": "SG-8157",
      "source_city": "Delhi",
      "departure_time": "Evening",
      "stops": "zero",
      "arrival_time": "Night",
      "destination_city": "Mumbai",
      "class": "Economy",
      "duration": 2.17,
      "days_left": 1
    },
    {
      "airline": "Vistara",
      "flight": "UK-995",
      "source_city": "Delhi",
      "departure_time": "Morning",
      "stops": "zero",
      "arrival_time": "Afternoon",
      "destination_city": "Mumbai",
      "class": "Business",
      "duration": 2.25,
      "days_left": 1
    }
  ]
}

Response:

{
  "predictions": [
    {
      "predicted_price": 8659.98,
      "currency": "INR",
      "timestamp": "2026-01-20T10:30:00.123456"
    },
    {
      "predicted_price": 49954.50,
      "currency": "INR",
      "timestamp": "2026-01-20T10:30:00.123456"
    }
  ],
  "total": 2
}

GET /feature-importance

Description: Get model's feature importance scores

Query Parameters:

  • top_n (optional): Number of top features to return (default: 15)

Response:

{
  "feature_importance": {
    "class_Business": 0.4523,
    "class_Economy": 0.3841,
    "duration": 0.0892,
    "stops_zero": 0.0456,
    "airline_Air_India": 0.0234,
    "destination_city": 0.0198,
    "source_city": 0.0187,
    "stops_one": 0.0154,
    "days_left": 0.0143,
    "airline_Vistara": 0.0112
  },
  "top_n": 10
}

Error Responses

422 Validation Error:

{
  "detail": [
    {
      "loc": ["body", "duration"],
      "msg": "ensure this value is greater than 0",
      "type": "value_error.number.not_gt"
    }
  ]
}

500 Internal Server Error:

{
  "detail": "Error in prediction: [error message]"
}

🤝 Contributing

Contributions are welcome! Please follow these guidelines:

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Run tests: pytest tests/ -v
  5. Commit: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

Code Style

  • Follow PEP 8 guidelines
  • Use type hints where applicable
  • Add docstrings to all functions
  • Write tests for new features
  • Update documentation

Pull Request Process

  1. Ensure all tests pass
  2. Update README.md if needed
  3. Add description of changes
  4. Request review from maintainers

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


👤 Author

Adrián - Acquarts


🙏 Acknowledgments

  • Dataset from Kaggle: Flight Price Prediction Dataset
  • FastAPI documentation and community
  • Streamlit for the amazing UI framework
  • XGBoost developers for the powerful ML library
  • AWS for cloud infrastructure

📞 Support

If you encounter any issues or have questions:

  1. Check the TROUBLESHOOTING.md guide
  2. Open an issue on GitHub
  3. Review existing issues and discussions

🗺️ Roadmap

  • Add authentication (API keys, JWT)
  • Implement rate limiting
  • Add caching layer (Redis)
  • Multi-region deployment
  • Real-time price updates
  • Historical price tracking
  • Price alerts system
  • Mobile app (React Native)
  • GraphQL API
  • Kubernetes deployment option

⭐ If you find this project useful, please consider giving it a star! ⭐

Made with ❤️ by Acquarts

About

End-to-end Machine Learning system for flight price prediction with REST API, web interface, CI/CD pipeline, and AWS deployment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors