Skip to content

LadyJ101/British-Airways-Job-Simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

British Airways Customer Booking Analysis

Overview

This Jupyter Notebook analyzes customer booking data for British Airways to build a predictive model for booking completion. The project includes data loading, exploratory data analysis, feature engineering, and machine learning model implementation using XGBoost and LightGBM.

Dataset

The dataset used is customer_booking.csv, which contains 50,000 rows and 14 columns related to customer bookings.

Columns Description:

num_passengers: Number of passengers

sales_channel: Sales channel (e.g., Internet)

trip_type: Type of trip (e.g., RoundTrip)

purchase_lead: Lead time before purchase (in days)

length_of_stay: Duration of stay

flight_hour: Hour of the flight

flight_day: Day of the flight

route: Route information

booking_origin: Origin of the booking

wants_extra_baggage: Indicator for extra baggage

wants_preferred_seat: Indicator for preferred seat

wants_in_flight_meals: Indicator for in-flight meals

flight_duration: Duration of the flight (in hours)

booking_complete: Target variable indicating booking completion (1) or not (0)

Steps Performed

1. Data Loading and Initial Inspection

  Mounted Google Drive to access the dataset.

  Detected file encoding (ISO-8859-1) and loaded the data.

  Displayed the first 5 rows, column information, summary statistics, and checked for missing values.

2. Data Preprocessing and Feature Engineering

  Encoded categorical variables (sales_channel, trip_type, flight_day, route, booking_origin) using Label Encoding.

  Created new features:

      total_lead_time: Same as purchase_lead.

      is_last_minute_booking: Flag for bookings made less than 3 days in advance.

      total_add_ons: Sum of extra services (baggage, seat, meals).

      Split the data into training and testing sets (80% train, 20% test).

3. Model Training and Evaluation

  XGBoost Model

    Installed and imported XGBoost.

    Trained an XGBoost model with parameters tuned for class imbalance (scale_pos_weight).

    Evaluated the model using classification report and feature importance visualization.

LightGBM Model

    Installed and imported LightGBM.

    Trained a LightGBM model with is_unbalance parameter to handle class imbalance.

    Evaluated the model using classification report, feature importance, and confusion matrix.

4. Results

  Both models were trained and evaluated, with feature importance highlighting key predictors like booking_origin, sales_channel, and total_add_ons.

  The classification reports show precision, recall, and F1-score for both classes (Booking and No Booking).

  Requirements
    
    Python 3.x

    Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, lightgbm

  How to Run

    Upload the customer_booking.csv to your Google Drive.

    Mount Google Drive in the notebook.

    Run the cells sequentially to load data, preprocess, train models, and evaluate results.

  Notes

    The dataset is balanced with no missing values.

    Feature engineering improved model performance by adding meaningful features.

    Both XGBoost and LightGBM were effective, with LightGBM providing additional insights via confusion matrix.

  Author

    Abioye Oluwadamilola Joy

License
    
    This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors