You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2023-10-23-03_scikit_advanced.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ The California Housing dataset contains information about houses in California d
37
37
- Features on different scales
38
38
- Complex relationships between variables
39
39
40
-
The dataset iself contains information about the houses, including features like total area, lot shape, neighborhood information, overall quality, year built, etc. And the target feature that we would like to predict is the `SalePrice`.
40
+
The dataset itself contains information about the houses, including features like total area, lot shape, neighborhood information, overall quality, year built, etc. And the target feature that we would like to predict is the `SalePrice`.
41
41
42
42
Let's load the data and take a look:
43
43
@@ -138,7 +138,7 @@ If we look closer at the feature matrix X, we can see that of those 79 features,
138
138
are of type 'object' (i.e. categorical features), and that some entries are missing. Plus, the target feature
139
139
`SalePrice` has a right skewed value distribution.
140
140
141
-
Therefore, if possible, our pipeline should be able to handle all of this picularities. Even better, let's try
141
+
Therefore, if possible, our pipeline should be able to handle all of this peculiarities. Even better, let's try
142
142
to setup a pipeline that helps us to find the optimal way how to preprocess this dataset.
143
143
144
144
## 2. Feature Analysis
@@ -464,7 +464,7 @@ Prediction accuracy on test data: {score_te*100:.2f}%"
464
464
Prediction accuracy on test data: 8.38%
465
465
466
466
Great, the score seems reasonably good! But now that we know better which preprocessing routine seems to be the
467
-
best (thanks to `RandomizedSearchCV`), let's go ahead and furhter fine-tune the ridge model.
467
+
best (thanks to `RandomizedSearchCV`), let's go ahead and further fine-tune the ridge model.
468
468
469
469
## 8. Fine tune best preprocessing pipeline
470
470
@@ -476,7 +476,7 @@ To further fine tune the best preprocessing pipeline, we can just load the 'best
0 commit comments