This is a fork of the original team effort for the OpenCampus.sh course Introduction into Data Science and Machine Learning (2025/2026). All code was developed independently (Ulf Wendel); no code from other team members was reused.
https://github.com/nixnutz/course_intro_ds_and_ml
This project focuses on sales forecasting for a bakery branch, utilizing historical sales data spanning from July 1, 2013, to July 30, 2018, to inform inventory and staffing decisions. We aim to predict future sales for six specific product categories: Bread, Rolls, Croissants, Confectionery, Cakes, and Seasonal Bread. Our methodology integrates statistical and machine learning techniques, beginning with a baseline linear regression model to identify fundamental trends, and progressing to a sophisticated neural network designed to discern more nuanced patterns and enhance forecast precision. The project includes data preparation, crafting bar charts with confidence intervals for visualization, and fine-tuning models to assess their performance on test data from August 1, 2018, to July 30, 2019, using the Mean Absolute Percentage Error (MAPE) metric for each product category.
Regression
Neural Network Model Version 1.2 (balanced_v1_2.ipynb)
- Sample weights: Brot 3.0, Brötchen 1.5, Kuchen 1.2, Saisonbrot 6.0
- Location: 3_Model/balanced_v1_2.ipynb
- Primary metric: MAPE (Mean Absolute Percentage Error)
- Business target: Precision gain (revenue-weighted improvement)
| Metric | Value |
|---|---|
| Training MAPE | 17.33% |
| Validation MAPE | 18.54% |
| Validation MAE | 30.51 EUR |
| Saisonbrot wMAPE (sales months only) | 31.25% |
| Model | Precision Gain (€/year vs OLS) | Incremental Gain |
|---|---|---|
| OLS (baseline) | 0 | — |
| NN 1.0 | +100,374 | — |
| NN 1.1 | +106,936 | +6,560 vs 1.0 |
| NN 1.2 | +109,306 | +2,370 vs 1.1 |
| Category | MAPE / wMAPE |
|---|---|
| Bread (1) | 18.88% |
| Rolls (2) | 10.95% |
| Croissant (3) | 19.93% |
| Confectionery (4) | 23.33% |
| Cake (5) | 14.79% |
| Seasonal Bread (6) | 31.25% (wMAPE) |
Note: For detailed breakdowns and analysis, see 3_Model/.
The models have largely exhausted the dataset's information capacity; further generic optimization risks fitting noise. We therefore chose a fictitious business goal—precision gain—optimizing for revenue per product group (prioritizing Saisonbrot where the highest gains are expected). See 3_Model/ for details.
GitHub Codespaces is the recommended setup method for this project. This provides a consistent development environment for users without requiring local configuration.
For detailed setup instructions, including:
- GitHub Codespaces setup (recommended)
- Connecting VS Code (or Cursor, a VS Code derivative) to your Codespace
- Local development setup
- Troubleshooting
Please see SETUP.md.