Skip to content

Machine Learning project for retail demand forecasting and inventory optimization using ARIMA and Random Forest models.

Notifications You must be signed in to change notification settings

DheerajKumar17/demand-forecasting-inventory-optimization

Repository files navigation

Demand Forecasting & Inventory Optimization using Machine Learning

An extensive data science project that forecasts retail demand and optimizes inventory management strategies using machine learning models (ARIMA and Random Forest), leading to significant cost savings through data-driven decision making.

Project Overview:

In order to create predictive models for demand forecasting and determine the ideal inventory parameters, this project analyzes retail transaction data. The project shows how companies can lower holding costs while preserving high service levels by utilizing time series analysis and machine learning techniques.

Key Achievements:

  • Forecasting models for ARIMA and Random Forest were created to estimate retail demand for over 3,665 SKUs.
  • 71.5% forecast accuracy was attained by utilizing Random Forest with time-based features that were engineered.
  • Developed reorder-point and safety stock strategies that resulted in a 15% reduction in overall holding costs.
  • Examined 11 months' worth of transaction data, totaling $8.9 million in earnings.
  • 400,000+ transactions from retail operations were processed and cleaned.

Technologies used:

Programming Language:

  • Python 3.x

Libraries:

  • Pandas - Data manipulation and analysis
  • Numpy - Numerical computations
  • Matplotlib - Data visualization
  • Seaborn - Statistical visualizations
  • Scikit-Learn - Machine learning models (Random Forest)
  • Statsmodels - Time series analysis (ARIMA)

Tools:

  • Jupyter Notebook - Interactive development environment
  • Git - Version control
  • Excel - Initial data exploration

Dataset

Source: Online Retail Dataset from UCI Machine Learning Repository

Description:

  • Transaction data obtained from an online merchant in the UK.
  • January 2011–November 2011.
  • 3,665 distinct products (SKUs).
  • Several customers segments across various nations.

Data Cleaning:

  • Order cancellations and returns were eliminated.
  • Removed prices and quantities that were not valid.
  • Addressed the issue of missing customer IDs.
  • 275 days of clean transaction data make up the final dataset.

Methodology:

1. Data Exploration and Preprocessing

  • Order cancellations and returns were eliminated.
  • Loaded and analyzed unprocessed transaction data.
  • Found and fixed problems with data quality (duplicates, outliers, and missing values).
  • Developed derived features, such as time-based attributes and total sales.
  • For modeling purposes, aggregated transactions to the daily sales level.

2. Time Series Forecasting with ARIMA

Model Configuration:

  • Autoregressive Integrated Moving Average, or ARIMA(5, 1, 0).
  • Parameters chosen to correlate with the properties of the data.
  • 30 days for testing and 245 days for training.

Approach:

  • Examined patterns of seasonality and sales trends.
  • Divide the data into 80/20 training and testing sets.
  • Model performance was evaluated using RMSE and MAE metrics.
  • Generated forecasts for the next seven days.

3. Machine Learning with Random Forest

Feature Engineering:

  • Day of Month (1-31)
  • Month (1-12)
  • Day of Week (0-6)
  • Days Since Start (trend capture)

Model Configuration:

  • 50 decision trees
  • Maximum depth: 5
  • Minimum samples per split: 10
  • Prevented overfitting using the hyperparameter tuning.

Performance:

Accuracy of Forecast: 71.5%

Demand patterns and variability were successfully captured.

Resilient to changing seasons and outliers.

4. Inventory Optimization

Safety Stock Calculation:

  • Safety Stock = z-score * Standard Deviation * sqrt(Lead time)

-- Service Level: 95% (Z-score = 1.65)

-- Lead Time: 7 days

-- Accounts for demand variability during replenishment period

Reorder Point Calculation:

  • Reorder Point = (Average Daily Demand × Lead Time) + Safety Stock

-- Prior to stockouts, new orders are triggered.

-- Holding costs and inventory availability are balanced.

Cost Analysis:

  • Annual holding costs were computed both before and after optimization.
  • 15% decrease in total holding costs was demonstrated.
  • Quantified cost savings by lowering the need for safety stock.

Results

Forecasting Performance

  1. Model - Random Forest, Accuracy - 71.5%, Use case - Tracks seasonal and day-of-week patterns.
  2. Model - ARIMA, Accuracy - Variable, Use Case - Forecasting based on trends for stable products.

Inventory Metrics:

  • Safety Stock: Adjustments are made based on fluctuations in demand.
  • Reorder Point: Determined using a 95% service level.
  • Cost Reduction: Annual holding costs are reduced by 15%.
  • 95% service level maintained (low stockout risk).

Business Impact:

  • Decreased excess inventory as a result of precise demand forecasting.
  • Optimized safety stock levels reduce the chance of stockouts.
  • Reduced capital invested in inventory, which improved cash flow.
  • Data-driven operational and procurement decision-making.

How to Run This Project:

Prerequisites:

Install the required libraries:

  • pip install pandas numpy matplotlib seaborn scikit-learn statsmodels jupyter openpyxl

Steps

  1. Clone this repository

2. Download the Online Retail dataset from UCI Machine Learning

Repository and place it in the project folder

3. Open Jupyter Notebook

  • jupyter notebook

4. Open and run "demand_forecasting_inventory_optimization.ipynb"

  • Run all cells sequentially from top to bottom
  • The notebook contains all steps: data cleaning, EDA, modeling, and optimization

Key Insights:

  • Demand Variability: Due to the significant daily fluctuations in retail sales, accurate forecasting is difficult but valuable.
  • Feature Importance: The best indicators of sales trends were the day of the week and the month.
  • Selecting a Model: ARIMA was the best at trend-based forecasting, while Random Forest was good at identifying non-linear patterns.
  • Inventory trade-offs: Costs are decreased while service levels are maintained with optimal safety stock.
  • Scalability: Techniques created can be used for enterprise-scale optimization across more than 50 SKUs at once.

Future Improvements:

  • Take into account external factors. (holidays, promotions, weather)
  • Use ensemble techniques that combine Random Forest and ARIMA.
  • Create forecasting models at the product level (SKU-specific).
  • Include a real-time dashboard to track forecasts.
  • Connect to systems for inventory management.
  • Test other algorithms, such as LSTM neural networks and XGBoost.

Skills Demonstrated:

  • Time Series Analysis
  • Machine Learning (Supervised Learning)
  • Feature Engineering
  • Data Cleaning and Preprocessing
  • Statistical Modeling
  • Inventory Management Theory
  • Data Visualization
  • Python Programming
  • Business Analytics

Acknowledgements:

  • The UCI Machine Learning Repository provided the dataset, which was motivated by actual retail inventory issues.
  • Constructed as a portfolio project to showcase data science skills along with proficiency in supply chain.

License:

  • This project can be used for portfolio and educational purposes.

About

Machine Learning project for retail demand forecasting and inventory optimization using ARIMA and Random Forest models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published