Exercise on Computer Hardware Dataset
Data source: https://archive.ics.uci.edu/ml/datasets/Computer+Hardware
0 Data Cleaning
We implement the following transformations to our data:
We also put standarized variables for
1 Model
This exercise follows the procedure in Ein-Dor (1987).
1A Training and Testing dataset
We setup the training and test sets for our hardware data in the ratio 80:20. Since we have 209 oberservations, we have 167 training data points and 42 test data points.
Below is the correlation table of the variables we will be using:
1B Linear Regression Model
Our regression model is as follows:
This is the result of our linear regression model
We mentioned earlier that we got the standardized values of the variables we used. We are doing this because cache memory, channel capacity, and average memory were each measured on a different scale, they must first be standardized before performing sensitivity analyses. Doing a similar run using the standardized values, we get this result:
All three independent variables contribute significantly to explaining variances in the dependent variable, with memory size being the most dominant contributor.
1C Predicting the Target Variable and Evaluating the Accuracy of the Model
Using our model from our training dataset (1B), we will predict target values using the data point of the testing dataset
prediction_norm <- predict(linear_norm_train, test_set)The next line shows statistical metrics that are used for evaluating the performance of a Linear regression model.
data.frame(R2 = R2(prediction_norm, test_set$SQRERF),RMSE = RMSE(prediction_norm, test_set$SQRERF), MAE = MAE(prediction_norm, test_set$SQRERF)) R2 RMSE MAE
1 0.9999812 9.44776 8.444859
