Gaurav Bhatia — MSc Data Science, AI & Digital Business | GISMA University, Berlin
A production-style demand forecasting engine that compares three machine learning and statistical models — ARIMA, Facebook Prophet, and XGBoost — across 12 real product SKUs using 5 years of weekly sales data.
The project is deployed as an interactive Streamlit dashboard where users can select any SKU, choose a model, and adjust the forecast horizon — seeing live MAE, RMSE, and MAPE metrics instantly.
💡 Inspired by real inventory challenges at Bhatia Traders (Chandigarh, India), where applying demand forecasting reduced excess stock by 10%.
Businesses that cannot accurately forecast demand face two costly problems:
- Overstocking → capital tied up in unsold inventory
- Stockouts → lost sales and unhappy customers
This engine tackles both by forecasting weekly demand per SKU with multiple models and selecting the best performer — enabling smarter purchasing and replenishment decisions.
demand-forecasting-engine/
│
├── P1_Demand_Forecasting_Gaurav.ipynb # Main notebook (run top to bottom)
├── app/
│ └── streamlit_app.py # Interactive dashboard
├── data/
│ └── weekly_sales.csv # Prepared weekly data (auto-generated)
├── models/
│ └── SKU_XX_xgboost.pkl # Saved XGBoost models per SKU
├── reports/
│ └── model_comparison.csv # Full results table
├── requirements.txt
└── README.md
- Raw daily sales data aggregated to weekly granularity
- Filtered to top 12 best-selling SKUs from Store 1
- 5-year date range (2013–2017), ~260 weekly observations per SKU
- Lag features: demand from 1, 2, 4, 8, 13, 26, 52 weeks ago
- Rolling statistics: mean & std deviation over 4, 8, 13, 26-week windows
- Calendar features: week of year, month, quarter, year
- Strict time-based train/test split (no data leakage)
| Model | Approach | Strengths |
|---|---|---|
| ARIMA | Statistical time series | Interpretable, handles trends |
| Facebook Prophet | Decomposition-based | Handles seasonality & holidays automatically |
| XGBoost + Optuna | Gradient boosting + HPO | Best accuracy, uses engineered features |
- Test set: last 12 weeks held out per SKU
- Metrics: MAE, RMSE, MAPE%
- Hyperparameter tuning: Optuna with TimeSeriesSplit cross-validation (20 trials)
Evaluated on the last 12 weeks of data per SKU (held-out test set), averaged across all 12 SKUs:
| Model | Avg MAE | Avg RMSE | Avg MAPE% |
|---|---|---|---|
| ARIMA | 71.44 | 90.11 | 15.49% |
| Prophet | 29.34 | 35.94 | 6.07% ✅ |
| XGBoost | 32.96 | 40.24 | 6.61% |
Key finding: Prophet outperformed XGBoost with a MAPE of 6.07% — both well below the industry benchmark of 10–15%. Prophet's strong performance suggests clear yearly seasonality patterns in this dataset, which it models natively. ARIMA, lacking engineered features, lagged behind at 15.49% — a 60% higher error rate than Prophet.
- SKU selector — switch between any of the 12 product SKUs
- Model selector — compare ARIMA, Prophet, XGBoost side by side
- Adjustable test horizon — slider from 4 to 24 weeks
- Live metrics — MAE, RMSE, MAPE update instantly
- Interactive Plotly chart — hover to see exact forecast vs actual values
| Category | Tools |
|---|---|
| Language | Python 3.10+ |
| Forecasting | ARIMA (statsmodels), Facebook Prophet, XGBoost |
| Hyperparameter Tuning | Optuna |
| Feature Engineering | Pandas, NumPy |
| Visualisation | Plotly, Matplotlib |
| Dashboard | Streamlit |
| Notebook | Google Colab / Jupyter |
- Open
P1_Demand_Forecasting_Gaurav.ipynbin Google Colab - Run cells top to bottom (Shift+Enter or Runtime → Run All)
- Download
train.csvfrom Kaggle and upload when prompted - Launch the Streamlit dashboard in the final cell — you'll get a public ngrok URL
git clone https://github.com/gauravbhatia-bit/demand-forecasting-engine
cd demand-forecasting-engine
pip install -r requirements.txt
python data/prepare_data.py
streamlit run app/streamlit_app.pyDownload train.csv from Kaggle:
👉 https://www.kaggle.com/competitions/demand-forecasting-kernels-only/data
- Prophet outperformed XGBoost (6.07% vs 6.61% MAPE) — counterintuitive but explained by strong yearly seasonality in the dataset, which Prophet models natively without feature engineering
- Optuna saves significant time vs. manual grid search — finds optimal hyperparameters in ~20 trials
- No data leakage is critical in time series — all features are strictly shifted backwards before training
- MAPE alone can be misleading for low-demand SKUs — always report MAE and RMSE alongside it
- Bhatia Traders Sales Analysis — EDA & inventory analysis using Python/Pandas
- Inventory Alert System — SQL-based real-time stock monitoring
I'm Gaurav Bhatia, currently pursuing an MSc in Data Science, AI & Digital Business at GISMA University of Applied Sciences in Berlin, Germany.
Before moving into data science, I managed operations at Bhatia Traders — a sauces & condiments business in Chandigarh, India — where I applied Python and SQL to solve real inventory and supply chain challenges. That hands-on experience is what drives my interest in building practical, business-focused data tools.
I'm actively looking for Data Analyst / Data Science internship opportunities in Berlin and Germany.
Feel free to reach out on LinkedIn or via email at gauravbhatia.gb6@gmail.com.
MIT License — free to use, modify, and distribute with attribution.