GitHub - DheerajKumar17/demand-forecasting-inventory-optimization: Machine Learning project for retail demand forecasting and inventory optimization using ARIMA and Random Forest models.

Demand Forecasting & Inventory Optimization using Machine Learning

An extensive data science project that forecasts retail demand and optimizes inventory management strategies using machine learning models (ARIMA and Random Forest), leading to significant cost savings through data-driven decision making.

Project Overview:

In order to create predictive models for demand forecasting and determine the ideal inventory parameters, this project analyzes retail transaction data. The project shows how companies can lower holding costs while preserving high service levels by utilizing time series analysis and machine learning techniques.

Key Achievements:

Forecasting models for ARIMA and Random Forest were created to estimate retail demand for over 3,665 SKUs.
71.5% forecast accuracy was attained by utilizing Random Forest with time-based features that were engineered.
Developed reorder-point and safety stock strategies that resulted in a 15% reduction in overall holding costs.
Examined 11 months' worth of transaction data, totaling $8.9 million in earnings.
400,000+ transactions from retail operations were processed and cleaned.

Technologies used:

Programming Language:

Python 3.x

Libraries:

Pandas - Data manipulation and analysis
Numpy - Numerical computations
Matplotlib - Data visualization
Seaborn - Statistical visualizations
Scikit-Learn - Machine learning models (Random Forest)
Statsmodels - Time series analysis (ARIMA)

Tools:

Jupyter Notebook - Interactive development environment
Git - Version control
Excel - Initial data exploration

Dataset

Source: Online Retail Dataset from UCI Machine Learning Repository

Description:

Transaction data obtained from an online merchant in the UK.
January 2011–November 2011.
3,665 distinct products (SKUs).
Several customers segments across various nations.

Data Cleaning:

Order cancellations and returns were eliminated.
Removed prices and quantities that were not valid.
Addressed the issue of missing customer IDs.
275 days of clean transaction data make up the final dataset.

Methodology:

1. Data Exploration and Preprocessing

Order cancellations and returns were eliminated.
Loaded and analyzed unprocessed transaction data.
Found and fixed problems with data quality (duplicates, outliers, and missing values).
Developed derived features, such as time-based attributes and total sales.
For modeling purposes, aggregated transactions to the daily sales level.

2. Time Series Forecasting with ARIMA

Model Configuration:

Autoregressive Integrated Moving Average, or ARIMA(5, 1, 0).
Parameters chosen to correlate with the properties of the data.
30 days for testing and 245 days for training.

Approach:

Examined patterns of seasonality and sales trends.
Divide the data into 80/20 training and testing sets.
Model performance was evaluated using RMSE and MAE metrics.
Generated forecasts for the next seven days.

3. Machine Learning with Random Forest

Feature Engineering:

Day of Month (1-31)
Month (1-12)
Day of Week (0-6)
Days Since Start (trend capture)

Model Configuration:

50 decision trees
Maximum depth: 5
Minimum samples per split: 10
Prevented overfitting using the hyperparameter tuning.

Performance:

Accuracy of Forecast: 71.5%

Demand patterns and variability were successfully captured.

Resilient to changing seasons and outliers.

4. Inventory Optimization

Safety Stock Calculation:

Safety Stock = z-score * Standard Deviation * sqrt(Lead time)

-- Service Level: 95% (Z-score = 1.65)

-- Lead Time: 7 days

-- Accounts for demand variability during replenishment period

Reorder Point Calculation:

Reorder Point = (Average Daily Demand × Lead Time) + Safety Stock

-- Prior to stockouts, new orders are triggered.

-- Holding costs and inventory availability are balanced.

Cost Analysis:

Annual holding costs were computed both before and after optimization.
15% decrease in total holding costs was demonstrated.
Quantified cost savings by lowering the need for safety stock.

Results

Forecasting Performance

Model - Random Forest, Accuracy - 71.5%, Use case - Tracks seasonal and day-of-week patterns.
Model - ARIMA, Accuracy - Variable, Use Case - Forecasting based on trends for stable products.

Inventory Metrics:

Safety Stock: Adjustments are made based on fluctuations in demand.
Reorder Point: Determined using a 95% service level.
Cost Reduction: Annual holding costs are reduced by 15%.
95% service level maintained (low stockout risk).

Business Impact:

Decreased excess inventory as a result of precise demand forecasting.
Optimized safety stock levels reduce the chance of stockouts.
Reduced capital invested in inventory, which improved cash flow.
Data-driven operational and procurement decision-making.

How to Run This Project:

Prerequisites:

Install the required libraries:

pip install pandas numpy matplotlib seaborn scikit-learn statsmodels jupyter openpyxl

Steps

Clone this repository

git clone https://github.com/yourusername/demand-forecasting-project.git
cd demand-forecasting-project

2. Download the Online Retail dataset from UCI Machine Learning

Repository and place it in the project folder

3. Open Jupyter Notebook

jupyter notebook

4. Open and run "demand_forecasting_inventory_optimization.ipynb"

Run all cells sequentially from top to bottom
The notebook contains all steps: data cleaning, EDA, modeling, and optimization

Key Insights:

Demand Variability: Due to the significant daily fluctuations in retail sales, accurate forecasting is difficult but valuable.
Feature Importance: The best indicators of sales trends were the day of the week and the month.
Selecting a Model: ARIMA was the best at trend-based forecasting, while Random Forest was good at identifying non-linear patterns.
Inventory trade-offs: Costs are decreased while service levels are maintained with optimal safety stock.
Scalability: Techniques created can be used for enterprise-scale optimization across more than 50 SKUs at once.

Future Improvements:

Take into account external factors. (holidays, promotions, weather)
Use ensemble techniques that combine Random Forest and ARIMA.
Create forecasting models at the product level (SKU-specific).
Include a real-time dashboard to track forecasts.
Connect to systems for inventory management.
Test other algorithms, such as LSTM neural networks and XGBoost.

Skills Demonstrated:

Time Series Analysis
Machine Learning (Supervised Learning)
Feature Engineering
Data Cleaning and Preprocessing
Statistical Modeling
Inventory Management Theory
Data Visualization
Python Programming
Business Analytics

Acknowledgements:

The UCI Machine Learning Repository provided the dataset, which was motivated by actual retail inventory issues.
Constructed as a portfolio project to showcase data science skills along with proficiency in supply chain.

License:

This project can be used for portfolio and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
demand_forecasting_inventory_optimization.ipynb.ipynb		demand_forecasting_inventory_optimization.ipynb.ipynb
project_results.csv		project_results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

DheerajKumar17/demand-forecasting-inventory-optimization

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages