Skip to content

Commit b1a5e14

Browse files
authored
Merge pull request #595 from EducationalTestingService/update-scikit-learn
Update scikit-learn to 0.22.2
2 parents 70cd583 + 34b00de commit b1a5e14

File tree

56 files changed

+414
-389
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+414
-389
lines changed

azure-pipelines.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
displayName: "Update conda"
3030
3131
- script: |
32-
conda create --name sklldev --yes --quiet -c conda-forge -c defaults python=%PYTHON_VERSION% numpy=1.17 nose --file conda_requirements.txt
32+
conda create --name sklldev --yes --quiet -c conda-forge -c defaults python=%PYTHON_VERSION% nose --file conda_requirements.txt
3333
conda init cmd.exe
3434
CALL activate sklldev
3535
pip install -e .
@@ -50,4 +50,4 @@ jobs:
5050
inputs:
5151
testResultsFiles: 'nosetests.xml'
5252
testRunTitle: 'SKLL tests'
53-
condition: succeededOrFailed()
53+
condition: succeededOrFailed()

conda_requirements.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ joblib>=0.8
33
numpy
44
pandas
55
ruamel.yaml
6-
scikit-learn==0.21.3
6+
scikit-learn==0.22.2.post1
77
scipy
88
seaborn
99
tabulate

doc/tutorial.rst

+47-31
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,8 @@ need to type the following into a terminal:
115115
116116
That should produce output like::
117117

118-
2017-12-07 11:40:17,381 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Task: evaluate
119-
2017-12-07 11:40:17,381 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Training on train, Test on dev, feature set ['family.csv', 'misc.csv', 'socioeconomic.csv', 'vitals.csv'] ...
118+
2020-03-10 14:25:23,596 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Task: evaluate
119+
2020-03-10 14:25:23,596 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Training on train, Test on dev, feature set ['family.csv', 'misc.csv', 'socioeconomic.csv', 'vitals.csv'] ...
120120
Loading /Users/nmadnani/work/skll/examples/titanic/train/family.csv... done
121121
Loading /Users/nmadnani/work/skll/examples/titanic/train/misc.csv... done
122122
Loading /Users/nmadnani/work/skll/examples/titanic/train/socioeconomic.csv... done
@@ -125,12 +125,28 @@ That should produce output like::
125125
Loading /Users/nmadnani/work/skll/examples/titanic/dev/misc.csv... done
126126
Loading /Users/nmadnani/work/skll/examples/titanic/dev/socioeconomic.csv... done
127127
Loading /Users/nmadnani/work/skll/examples/titanic/dev/vitals.csv... done
128-
2017-12-07 11:40:17,515 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Featurizing and training new RandomForestClassifier model
129-
2017-12-07 11:40:17,515 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - WARNING - Training data will be shuffled to randomize grid search folds. Shuffling may yield different results compared to scikit-learn.
130-
2017-12-07 11:40:21,650 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Best accuracy grid search score: 0.809
131-
2017-12-07 11:40:21,651 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Hyperparameters: bootstrap: True, class_weight: None, criterion: gini, max_depth: 10, max_features: auto, max_leaf_nodes: None, min_impurity_decrease: 0.0, min_impurity_split: None, min_samples_leaf: 1, min_samples_split: 2, min_weight_fraction_leaf: 0.0, n_estimators: 500, n_jobs: 1, oob_score: False, random_state: 123456789, verbose: 0, warm_start: False
132-
2017-12-07 11:40:21,651 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Evaluating predictions
133-
128+
2020-03-10 14:25:23,662 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Featurizing and training new RandomForestClassifier model
129+
2020-03-10 14:25:23,663 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - WARNING - Training data will be shuffled to randomize grid search folds. Shuffling may yield different results compared to scikit-learn.
130+
2020-03-10 14:25:28,129 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Best accuracy grid search score: 0.798
131+
2020-03-10 14:25:28,130 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Hyperparameters: bootstrap: True, ccp_alpha: 0.0, class_weight: None, criterion: gini, max_depth: 5, max_features: auto, max_leaf_nodes: None, max_samples: None, min_impurity_decrease: 0.0, min_impurity_split: None, min_samples_leaf: 1, min_samples_split: 2, min_weight_fraction_leaf: 0.0, n_estimators: 500, n_jobs: None, oob_score: False, random_state: 123456789, verbose: 0, warm_start: False
132+
2020-03-10 14:25:28,130 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - Evaluating predictions
133+
2020-03-10 14:25:28,172 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_RandomForestClassifier - INFO - using probabilities for the positive class to compute "roc_auc" for evaluation.
134+
2020-03-10 14:25:28,178 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Task: evaluate
135+
2020-03-10 14:25:28,178 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Training on train, Test on dev, feature set ['family.csv', 'misc.csv', 'socioeconomic.csv', 'vitals.csv'] ...
136+
Loading /Users/nmadnani/work/skll/examples/titanic/train/family.csv... done
137+
Loading /Users/nmadnani/work/skll/examples/titanic/train/misc.csv... done
138+
Loading /Users/nmadnani/work/skll/examples/titanic/train/socioeconomic.csv... done
139+
Loading /Users/nmadnani/work/skll/examples/titanic/train/vitals.csv... done
140+
Loading /Users/nmadnani/work/skll/examples/titanic/dev/family.csv... done
141+
Loading /Users/nmadnani/work/skll/examples/titanic/dev/misc.csv... done
142+
Loading /Users/nmadnani/work/skll/examples/titanic/dev/socioeconomic.csv... done
143+
Loading /Users/nmadnani/work/skll/examples/titanic/dev/vitals.csv... done
144+
2020-03-10 14:25:28,226 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Featurizing and training new DecisionTreeClassifier model
145+
2020-03-10 14:25:28,226 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - WARNING - Training data will be shuffled to randomize grid search folds. Shuffling may yield different results compared to scikit-learn.
146+
2020-03-10 14:25:28,269 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Best accuracy grid search score: 0.754
147+
2020-03-10 14:25:28,269 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Hyperparameters: ccp_alpha: 0.0, class_weight: None, criterion: gini, max_depth: None, max_features: None, max_leaf_nodes: None, min_impurity_decrease: 0.0, min_impurity_split: None, min_samples_leaf: 1, min_samples_split: 2, min_weight_fraction_leaf: 0.0, presort: deprecated, random_state: 123456789, splitter: best
148+
2020-03-10 14:25:28,269 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - Evaluating predictions
149+
2020-03-10 14:25:28,272 - Titanic_Evaluate_Tuned_family.csv+misc.csv+socioeconomic.csv+vitals.csv_DecisionTreeClassifier - INFO - using probabilities for the positive class to compute "roc_auc" for evaluation.
134150

135151
We could squelch the warnings about shuffling by setting
136152
:ref:`shuffle <shuffle>` to ``True`` in the :ref:`Input` section.
@@ -159,14 +175,14 @@ types of files:
159175
would like one giant summary file, you can use the :ref:`summarize_results`
160176
command.
161177

162-
An example of a human-readable results file for our Titanic config file is::
178+
An example of a human-readable results file for our Titanic experiment is::
163179

164180
Experiment Name: Titanic_Evaluate_Tuned
165-
SKLL Version: 1.5
181+
SKLL Version: 2.0
166182
Training Set: train
167-
Training Set Size: 712
183+
Training Set Size: 569
168184
Test Set: dev
169-
Test Set Size: 179
185+
Test Set Size: 143
170186
Shuffle: False
171187
Feature Set: ["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]
172188
Learner: RandomForestClassifier
@@ -176,28 +192,28 @@ An example of a human-readable results file for our Titanic config file is::
176192
Grid Search Folds: 3
177193
Grid Objective Function: accuracy
178194
Additional Evaluation Metrics: ['roc_auc']
179-
Scikit-learn Version: 0.19.1
180-
Start Timestamp: 07 Dec 2017 11:42:04.911657
181-
End Timestamp: 07 Dec 2017 11:42:09.118036
182-
Total Time: 0:00:04.206379
183-
184-
185-
Fold:
186-
Model Parameters: {"bootstrap": true, "class_weight": null, "criterion": "gini", "max_depth": 10, "max_features": "auto", "max_leaf_nodes": null, "min_impurity_decrease": 0.0, "min_impurity_split": null, "min_samples_leaf": 1, "min_samples_split": 2, "min_weight_fraction_leaf": 0.0, "n_estimators": 500, "n_jobs": 1, "oob_score": false, "random_state": 123456789, "verbose": 0, "warm_start": false}
187-
Grid Objective Score (Train) = 0.8089887640449438
188-
+---+-------+------+-----------+--------+-----------+
189-
| | 0 | 1 | Precision | Recall | F-measure |
190-
+---+-------+------+-----------+--------+-----------+
191-
| 0 | [101] | 14 | 0.871 | 0.878 | 0.874 |
192-
+---+-------+------+-----------+--------+-----------+
193-
| 1 | 15 | [49] | 0.778 | 0.766 | 0.772 |
194-
+---+-------+------+-----------+--------+-----------+
195+
Scikit-learn Version: 0.22.2.post1
196+
Start Timestamp: 10 Mar 2020 14:25:23.595787
197+
End Timestamp: 10 Mar 2020 14:25:28.175375
198+
Total Time: 0:00:04.579588
199+
200+
201+
Fold:
202+
Model Parameters: {"bootstrap": true, "ccp_alpha": 0.0, "class_weight": null, "criterion": "gini", "max_depth": 5, "max_features": "auto", "max_leaf_nodes": null, "max_samples": null, "min_impurity_decrease": 0.0, "min_impurity_split": null, "min_samples_leaf": 1, "min_samples_split": 2, "min_weight_fraction_leaf": 0.0, "n_estimators": 500, "n_jobs": null, "oob_score": false, "random_state": 123456789, "verbose": 0, "warm_start": false}
203+
Grid Objective Score (Train) = 0.797874315418175
204+
+----+------+------+-------------+----------+-------------+
205+
| | 0 | 1 | Precision | Recall | F-measure |
206+
+====+======+======+=============+==========+=============+
207+
| 0 | [79] | 8 | 0.849 | 0.908 | 0.878 |
208+
+----+------+------+-------------+----------+-------------+
209+
| 1 | 14 | [42] | 0.840 | 0.750 | 0.792 |
210+
+----+------+------+-------------+----------+-------------+
195211
(row = reference; column = predicted)
196-
Accuracy = 0.8379888268156425
197-
Objective Function Score (Test) = 0.8379888268156425
212+
Accuracy = 0.8461538461538461
213+
Objective Function Score (Test) = 0.8461538461538461
198214

199215
Additional Evaluation Metrics (Test):
200-
roc_auc = 0.8219429347826087
216+
roc_auc = 0.9224137931034483
201217

202218
IRIS Example on Binder
203219
----------------------

0 commit comments

Comments
 (0)