-
Notifications
You must be signed in to change notification settings - Fork 43
2016 Results
##September 27, 2016 There are now 3 models for predicting [Random Forest (RF), Gradient Boosting (GBM), and Supporting Vector (SVC)]. Using those 3 models we have come up with the confusion matrices using the 2016 data to predict what the 2016 season would have looked like if we were using the models. The cut-off points to obtain the preliminary confusion matrices for 2016 were RF=4.8 and GBM = 7.01.
The measures from the matrices that will be used to determine a desirable model will be: false-positive rate (FPR), and the true-positive rate(TPR) and precision. An example matrix with the measures is shown below:
[Matrix Name/ Precision]
| Predict False | Predict True | |
|---|---|---|
| Actual False | TN | FP / FPR |
| Actual True | FN | TP / TPR |
- FPR = FP/(FP+TN)
- TRP = TP/(TP+FN)
- PRECISION = TP/(TP+FP)
[2015 Matrix/ 44.8%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 1302 | 16 / 2.31% |
| Actual True | 187 | 13 / 6.5% |
In the end, the 2015 matrix is the standard that will hopefully be improved upon in the future.
###2016 Preliminary Confusion Matrices:
- Consensus Matrix = All 3 predicting TRUE
- Democratic Matrix = Any 2 of 3 models predicting TRUE
[Consensus / 6.7%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 591 | 14 / 2.31% |
| Actual True | 38 | 1 / 2.56% |
[Democratic / 9.9%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 541 | 64 / 10.6% |
| Actual True | 32 | 7 / 17.9% |
[Singular / 7.0%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 286 | 319 / 52.7% |
| Actual True | 15 | 24 / 61.5% |
[SVC Model / 5.2%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 329 | 276 / 45.6% |
| Actual True | 24 | 15 / 45.6% |
[RF Model / 7.8%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 558 | 47 / 7.8% |
| Actual True | 35 | 4 / 10.2% |
[GBM Model / 14.9%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 531 | 74 / 12.2% |
| Actual True | 26 | 13 / 33.3% |
###Combination Matrices:
The GBM model in the preliminary matrices seems to be performing at a higher rate than the other 2 models. This could be confirmed further by looking at the combination of all 3 matrices:
[RF and GBM / 14.3%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 587 | 18 / 3.0% |
| Actual True | 36 | 3 / 7.7% |
[SVC and GBM / 8.7%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 563 | 42 / 6.9% |
| Actual True | 35 | 4 / 10.3% |
[SVC and RF / 5.3%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 573 | 36 / 5.3% |
| Actual True | 37 | 2 / 5.1% |
The combination matrices demonstrate a couple of things. First, the GBM combines well with the other models to create higher FPR than TPR. Second, the RF and SVC do not combine well. With this knowledge, a matrix where you combine the combination matrices that contain the GBM and then stay away the SVC/RF combination model (weighted democratic model) should be better than the preliminary democratic model.
[Weighted Democratic / 11.5%]
| Predict False | Predict True | |
|---|---|---|
| Actual False | 559 | 46 / 7.6% |
| Actual True | 33 | 6 / 15.4% |
The weighted democratic model overall performs better than the preliminary democratic model.
In general, the precision is lacking in all of the matrices that have been researched. Increasing precision is the goal for the future models.
