This folder contains the scripts and notebooks used for building and evaluating the baseline linear model for the bakery sales forecast project. The tasks include importing the dataset, fitting a linear regression model, making predictions, and preparing the results for Kaggle submission. The following notes describe what was done in each file:
The file INSTRUCTIONS.md provides an overview and guidelines for building the baseline model. It describes the key steps to focus on, including data import, model fitting, making predictions, and preparing the results for submission.
-
Preparation:
- Imports necessary libraries such as
pandasandstatsmodels. - Reads the training dataset (
df_training_neural_network.csv) for model building.
- Imports necessary libraries such as
-
Building the Linear Model:
- Fits a linear regression model using
statsmodelswithUmsatzas the dependent variable and various features (e.g., product groups, months, holidays, weather conditions) as independent variables. - Outputs the summary of the fitted model, including key metrics such as R-squared, coefficients, and p-values.
- Fits a linear regression model using
-
Create Predictions:
- Loads the test dataset (
df_test_neural_network.csv). - Uses the fitted model to predict
Umsatzfor the test dataset. - Saves the predictions in a CSV file (
df_predictions_linearModel.csv).
- Loads the test dataset (
-
Prepare Kaggle-Upload:
- Loads the predictions file.
- Replaces any NaN values in the
Umsatzcolumn with 0. - Keeps only the
idandUmsatzcolumns for submission. - Saves the final dataframe as
df_kaggle_upload_linearModel.csv.
Overall, this folder ensures that a baseline linear model is correctly built, evaluated, and the predictions are prepared for submission to Kaggle.
-
Product Categories:
Brot: Sales of bread.Broetchen: Sales of rolls.Croissant: Sales of croissants.Konditorei: Sales of confectionery products.Kuchen: Sales of cakes.
-
Events and Holidays:
national_holiday: Indicator for national holidays.christmas_market: Indicator for the presence of a Christmas market.KielerWoche: Indicator for the Kiel Week event.
-
Weather Conditions:
temp_bins_kalt: Indicator for cold temperatures.temp_bins_mild: Indicator for mild temperatures.temp_bins_warm: Indicator for warm temperatures.temp_bins_heiß: Indicator for hot temperatures.
-
Time Variables:
Monat_2: February.Monat_3: March.Monat_4: April.Monat_5: May.Monat_6: June.Monat_7: July.Monat_8: August.Monat_9: September.Monat_10: October.Monat_11: November.Monat_12: December.Wochentag_Di: Indicator for Tuesday.Wochentag_Mi: Indicator for Wednesday.Wochentag_Do: Indicator for Thursday.Wochentag_Fr: Indicator for Friday.Wochentag_Sa: Indicator for Saturday.Wochentag_So: Indicator for Sunday.