This project aims to predict the sales of products in various outlets of Big Mart using machine learning techniques. The dataset contains information about products and their sales in different outlets. The goal is to build a predictive model that can estimate the sales of products based on various features.
- Big Mart Sale_Final.R: The main R script that contains the data loading, preprocessing, and modeling code.
- Train_UWu5bXk.csv: The training dataset containing historical sales data.
- Test_u94Q5KV.csv: The test dataset for which sales predictions need to be made.
- SampleSubmission_TmnO39y.csv: A sample submission file in the required format for submission.
- README.md: This file, providing an overview of the project.
The following R packages are required to run the project:
data.table
: For reading and manipulating data.dplyr
: For data manipulation and joining.ggplot2
: For plotting.caret
: For modeling.corrplot
: For making correlation plots.xgboost
: For building the XGBoost model.cowplot
: For combining multiple plots.
To install the required packages, you can use the following commands in R:
install.packages("data.table")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("caret")
install.packages("corrplot")
install.packages("xgboost")
install.packages("cowplot")
- Load Packages: The script starts by loading the necessary packages.
- Read Datasets: The training and test datasets are read using the
fread
function from thedata.table
package. - Explore Data: The script displays the column names and structure of the training and test datasets.
- Preprocess Data: The script adds a new column
Item_Outlet_Sales
to the test dataset and performs other preprocessing steps (not shown in the excerpt).
Here is an example of how to run the script:
# Load packages
library(data.table)
library(dplyr)
library(ggplot2)
library(caret)
library(corrplot)
library(xgboost)
library(cowplot)
# Read datasets
train = fread("Train_UWu5bXk.csv")
test = fread("Test_u94Q5KV.csv")
submission = fread("SampleSubmission_TmnO39y.csv")
# Display column names
names(train)
names(test)
# Display structure of datasets
str(train)
str(test)
# Add Item_Outlet_Sales to test data
test[, Item_Outlet_Sales := NA]
This project is licensed under the MIT License. See the LICENSE file for more details.
- The dataset is provided by Big Mart for the purpose of this competition.
- The R community for providing the necessary packages and documentation.
For any questions or issues, please contact Rayyan Ahmed at [email protected] or https://www.linkedin.com/in/rayyan-ahmed9477/