You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This Jupyter Notebook analyzes customer booking data for British Airways to build a predictive model for booking completion. The project includes data loading, exploratory data analysis, feature engineering, and machine learning model implementation using XGBoost and LightGBM.
Dataset
The dataset used is customer_booking.csv, which contains 50,000 rows and 14 columns related to customer bookings.
Columns Description:
num_passengers: Number of passengers
sales_channel: Sales channel (e.g., Internet)
trip_type: Type of trip (e.g., RoundTrip)
purchase_lead: Lead time before purchase (in days)
length_of_stay: Duration of stay
flight_hour: Hour of the flight
flight_day: Day of the flight
route: Route information
booking_origin: Origin of the booking
wants_extra_baggage: Indicator for extra baggage
wants_preferred_seat: Indicator for preferred seat
wants_in_flight_meals: Indicator for in-flight meals
flight_duration: Duration of the flight (in hours)
booking_complete: Target variable indicating booking completion (1) or not (0)
Steps Performed
1. Data Loading and Initial Inspection
Mounted Google Drive to access the dataset.
Detected file encoding (ISO-8859-1) and loaded the data.
Displayed the first 5 rows, column information, summary statistics, and checked for missing values.
2. Data Preprocessing and Feature Engineering
Encoded categorical variables (sales_channel, trip_type, flight_day, route, booking_origin) using Label Encoding.
Created new features:
total_lead_time: Same as purchase_lead.
is_last_minute_booking: Flag for bookings made less than 3 days in advance.
total_add_ons: Sum of extra services (baggage, seat, meals).
Split the data into training and testing sets (80% train, 20% test).
3. Model Training and Evaluation
XGBoost Model
Installed and imported XGBoost.
Trained an XGBoost model with parameters tuned for class imbalance (scale_pos_weight).
Evaluated the model using classification report and feature importance visualization.
LightGBM Model
Installed and imported LightGBM.
Trained a LightGBM model with is_unbalance parameter to handle class imbalance.
Evaluated the model using classification report, feature importance, and confusion matrix.
4. Results
Both models were trained and evaluated, with feature importance highlighting key predictors like booking_origin, sales_channel, and total_add_ons.
The classification reports show precision, recall, and F1-score for both classes (Booking and No Booking).
Requirements
Python 3.x
Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, lightgbm
How to Run
Upload the customer_booking.csv to your Google Drive.
Mount Google Drive in the notebook.
Run the cells sequentially to load data, preprocess, train models, and evaluate results.
Notes
The dataset is balanced with no missing values.
Feature engineering improved model performance by adding meaningful features.
Both XGBoost and LightGBM were effective, with LightGBM providing additional insights via confusion matrix.
Author
Abioye Oluwadamilola Joy
License
This project is licensed under the MIT License.