ausdm/notes.txt at master · jeremybarnes/ausdm · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000

jeremy@jeremy-desktop:~/projects/ausdm$ for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm -T 0.20 -t rmse blender.num_models=$n 2>/dev/null | tail -n1; done
1 0.8692
2 0.8659
3 0.8643
4 0.8627
5 0.8602
7 0.8597
10 0.8597
12 0.8592
15 0.8591
17 0.8594
20 0.8596
25 0.8603
30 0.8605
40 0.8606
50 0.8608

jeremy@jeremy-desktop:~/projects/ausdm$ for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm -T 0.20 -t auc blender.num_models=$n 2>/dev/null | tail -n1; done
1 0.8828
2 0.8869
3 0.8867
4 0.8860
5 0.8871
7 0.8867
10 0.8865
12 0.8862
15 0.8858
17 0.8859
20 0.8856
25 0.8854
30 0.8853
40 0.8851
50 0.8851

So, 15 models seems to be good for RMSE and 5 for AUC


-------

jeremy@jeremy-desktop:~/projects/ausdm$ for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm -T 0.20 -t rmse blender.num_models=$n -r 10 2>/dev/null | tail -n1; done
1 0.8761 +/- 0.0151
2 0.8706 +/- 0.0144
3 0.8688 +/- 0.0142
4 0.8681 +/- 0.0145
5 0.8675 +/- 0.0139
7 0.8669 +/- 0.0139
10 0.8664 +/- 0.0140
12 0.8662 +/- 0.0141
15 0.8659 +/- 0.0140
17 0.8659 +/- 0.0140
20 0.8656 +/- 0.0140
25 0.8658 +/- 0.0140
30 0.8661 +/- 0.0140
40 0.8664 +/- 0.0139
50 0.8666 +/- 0.0139


jeremy@jeremy-desktop:~/projects/ausdm$ for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm -T 0.20 -t auc blender.num_models=$n -r 10 2>/dev/null | tail -n1; done
1 0.8750 +/- 0.0090
2 0.8788 +/- 0.0081
3 0.8792 +/- 0.0080
4 0.8786 +/- 0.0079
5 0.8797 +/- 0.0075
7 0.8804 +/- 0.0072
10 0.8800 +/- 0.0074
12 0.8799 +/- 0.0074
15 0.8798 +/- 0.0075
17 0.8796 +/- 0.0075
20 0.8794 +/- 0.0075
25 0.8789 +/- 0.0077
30 0.8783 +/- 0.0079
40 0.8777 +/- 0.0080
50 0.8774 +/- 0.0081

Better results (more trials): 15-20 for RMSE, 7 for AUC


Boosting:

./build/x86_64/bin/ausdm -T 0.20 -t auc blender.num_models=5 -r 10

0.8779 +/- 0.0077

Not much good f as a first try...


With impossible examples excluded:

./build/x86_64/bin/ausdm -T 0.20 -t auc blender.num_models=5 -r 10

0.8776 +/- 0.0080

Still not much good...


With early stopping:

0.8772 +/- 0.0077

Even worse, not encouraging


With selection via margin:

0.8773 +/- 0.0076

Not any better!


Minor tweaks:

0.8779 +/- 0.0080

------------------------

Gated solution, try 1:

./build/x86_64/bin/ausdm -T 0.20 -t auc -r 10

0.8799 +/- 0.0074


With bug fixes:

0.8804 +/- 0.0074


Predicting actual margin (but using logit) (removed)

0.8801 +/- 0.0074


20 inputs

0.8798 +/- 0.0076

Back to 10 inputs, with extra statistical features:

0.8807 +/- 0.0075

Added a few extra along the same lines:

0.8807 +/- 0.0075

Removed decomposition features:

0.8805 +/- 0.0074

But this is in spite of the fact that the confidence functions got much, much worse:

Before:

trial 9
200 models... 15000 rows... n = 12000 m = 200 nvalues = 200
model 123: before 0.873762/0.256815 after 0.366578/0.535418
model 71: before 0.870887/0.24351 after 0.340936/0.541159
model 81: before 0.870203/0.24804 after 0.35414/0.536658
model 7: before 0.870063/0.260856 after 0.380631/0.544828
model 65: before 0.86898/0.25535 after 0.372716/0.547804
model 46: before 0.868584/0.261021 after 0.384278/0.541895
model 140: before 0.871122/0.25586 after 0.379631/0.533441
model 88: before 0.871134/0.252219 after 0.374164/0.532919
model 150: before 0.871441/0.23935 after 0.339959/0.533688
model 167: before 0.87121/0.260366 after 0.383654/0.540394


After:

trial 9
200 models... 15000 rows... n = 12000 m = 200 nvalues = 200
model 88: before 0.871134/0.252219 after 0.539543/0.308744
model 71: before 0.870887/0.24351 after 0.47926/0.326779
model 123: before 0.873762/0.256815 after 0.534246/0.299965
model 140: before 0.871122/0.25586 after 0.544026/0.309047
model 7: before 0.870063/0.260856 after 0.537814/0.337856
model 81: before 0.870203/0.24804 after 0.515196/0.318835
model 65: before 0.86898/0.25535 after 0.52039/0.337979
model 46: before 0.868584/0.261021 after 0.541253/0.335966
model 167: before 0.87121/0.260366 after 0.563316/0.308776
model 150: before 0.871441/0.23935 after 0.516603/0.288806


--------------

Attempt to use a simple linear blending scheme:

0.8802 +/- 0.0086


Same scheme but with some extra features (including those necessary to emulate the previous system):

0.8808 +/- 0.0087


-------------

RMSE with numeric problems fixed up (I think!):

0.8683 +/- 0.0133

With them really fixed up:

0.8689 +/- 0.0139

Very disappointing!

Same for AUC:

0.8807 +/- 0.0081

With 20 weak learners:

0.8785 +/- 0.0073

So still not there yet...

Submission: (gated top 10 (fixed))

Gini = 0.8755559356638
Success - rmse = 882.36039813249

-------------------

AUC with extra features:

0.8798 +/- 0.0085

Beginning to suspect something wrong (systematic biasing); shouldn't be going down like this!

RMSE with same extra features:

0.8706 +/- 0.0144

So just adding good features gets worse for RMSE as well.

----------------------

AUC with bagged boosted trees for blending:

0.8795 +/- 0.0071

And with a GLZ:

0.8794 +/- 0.0083

With decomposition trained on 1/3 of the data:

auc: 0.8738 +/- 0.0087

Ouch!

Fixed up to use the correct decomposition (I think):

0.8763 +/- 0.0088

Still not great...

And back to using full data for the decomposition:

0.8792 +/- 0.0086

Using the full, full, full set of data for the decomposition:

0.8797 +/- 0.0091


---------------------


Using a neural network to perform blending:

0.8814 +/- 0.0079

Now we're talking!

Submitted:


Success
Gini = 0.87422844737031
AUC = 0.93711422368515


-----------------------

Before using boosting around everything else:

scores: [ 0.117628 0.126624 0.123454 0.134264 0.135271 0.131286 0.131046 0.137431 0.110226 0.119526 ]
0.1267 +/- 0.0088

With boosting (same as before, on top of boosted bagged decision trees):

scores: [ 0.119883 0.122842 0.124331 0.129082 0.136901 0.134772 0.134071 0.136432 0.108208 0.122468 ]
0.1269 +/- 0.0091

Success
Gini = 0.87000892970578
AUC = 0.93500446485289

With GLZ:

scores: [ 0.117289 0.113771 0.113929 0.125466 0.127324 0.130649 0.124218 0.124263 0.105995 0.116004 ]
0.1199 +/- 0.0077

for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm --no-decomposition -T 0.20 -t auc auc.type=linear auc.mode=best_n auc.num_models=$n 2>/dev/null | tail -n1; done
1 0.1218 +/-    nan
2 0.1200 +/-    nan
3 0.1215 +/-    nan
4 0.1200 +/-    nan
5 0.1211 +/-    nan
7 0.1201 +/-    nan
10 0.1193 +/-    nan
12 0.1194 +/-    nan
15 0.1195 +/-    nan
17 0.1199 +/-    nan
20 0.1206 +/-    nan
25 0.1205 +/-    nan
30 0.1208 +/-    nan
40 0.1206 +/-    nan
50 0.1212 +/-    nan


jeremy@jeremy-desktop:~/projects/ausdm$ for n in 1 2 3 4 5 7 10 12 15 17 20 25 30 40 50; do echo -n "$n "; ./build/x86_64/bin/ausdm --no-decomposition -T 0.20 -t auc -r 10 auc.type=linear auc.mode=best_n auc.num_models=$n 2>/dev/null | tail -n1; done
1 0.1252 +/- 0.0090
2 0.1222 +/- 0.0082
3 0.1218 +/- 0.0081
4 0.1210 +/- 0.0083
5 0.1213 +/- 0.0082
7 0.1202 +/- 0.0077
10 0.1200 +/- 0.0074
12 0.1201 +/- 0.0074
15 0.1201 +/- 0.0074
17 0.1203 +/- 0.0074
20 0.1207 +/- 0.0075
25 0.1211 +/- 0.0077
30 0.1217 +/- 0.0079
40 0.1223 +/- 0.0079
50 0.1226 +/- 0.0081


--------------------------------

Gated forest with 5000 decision trees:

score: 0.0568  baseline: 0.0597  diff: -0.00286
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0250006 baseline avg error 0.0277302 avg improvement 0.00272964
category POSS: count 1046 avg error 0.0777513 baseline avg error 0.080278 avg improvement 0.00252669
category IMP: count 162 avg error 0.273266 baseline avg error 0.279722 avg improvement 0.00645689
overall: count 3000 avg error 0.0567993 baseline avg error 0.0596595 avg improvement 0.00286015

Gini = 0.8796782168085
AUC = 0.93983910840425

With 20 models:

score: 0.0569  baseline: 0.0597  diff: -0.00272
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0251198 baseline avg error 0.0277302 avg improvement 0.00261046
category POSS: count 1046 avg error 0.0778645 baseline avg error 0.080278 avg improvement 0.00241351
category IMP: count 162 avg error 0.273835 baseline avg error 0.279722 avg improvement 0.00588778
overall: count 3000 avg error 0.0569407 baseline avg error 0.0596595 avg improvement 0.00271876

With 50 models:

score: 0.0571  baseline: 0.0597  diff: -0.00258
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0251981 baseline avg error 0.0277302 avg improvement 0.00253209
category POSS: count 1046 avg error 0.0784204 baseline avg error 0.080278 avg improvement 0.00185765
category IMP: count 162 avg error 0.27204 baseline avg error 0.279722 avg improvement 0.00768226
overall: count 3000 avg error 0.0570844 baseline avg error 0.0596595 avg improvement 0.00257504


With 100 models:

(probably not worth doing...)

----

Running AUC with just a GLZ:


Linear Regression:


Learned GLZ function:
feature                                     label0       label1
bias                                         0.000000    -0.000000
min_model                                    0.000000    -0.000000
max_model                                    0.000000    -0.000000
avg_model_chosen                             0.000000    -0.000000
min_weighted                                -3.067802     3.067802
max_weighted                                -0.580425     0.580425
avg_weighted                                 7.412578    -7.412578
weighted_avg                               -15.461981    15.461981
S_auc_8_output                           39555.097656-39555.097656
S_auc_8_conf                                 0.000000    -0.000000
S_auc_8_weighted                        337726.687500-337726.687500
S_auc_8_dev_from_mean                        0.851580    -0.851580
S_auc_8_diff_from_int                       -0.151940     0.151940
S_auc_65_output                          39554.570312-39554.570312
S_auc_65_conf                            90859.679688-90859.679688
S_auc_65_weighted                           -1.470467     1.470467
S_auc_65_dev_from_mean                       2.422136    -2.422136
S_auc_65_diff_from_int                       0.454647    -0.454647
S_auc_66_output                          39563.464844-39563.464844
S_auc_66_conf                                0.000000    -0.000000
S_auc_66_weighted                            0.000000    -0.000000
S_auc_66_dev_from_mean                     -13.190518    13.190518
S_auc_66_diff_from_int                       1.878807    -1.878807
S_auc_72_output                              0.000000    -0.000000
S_auc_72_conf                                0.000000    -0.000000
S_auc_72_weighted                        39550.214844-39550.214844
S_auc_72_dev_from_mean                       9.720531    -9.720531
S_auc_72_diff_from_int                      -2.116513     2.116513
S_auc_82_output                          39554.601562-39554.601562
S_auc_82_conf                            90853.570312-90853.570312
S_auc_82_weighted                            5.586189    -5.586189
S_auc_82_dev_from_mean                      -1.326253     1.326253
S_auc_82_diff_from_int                       1.782153    -1.782153
S_auc_89_output                          39534.972656-39534.972656
S_auc_89_conf                            90854.921875-90854.921875
S_auc_89_weighted                            3.652061    -3.652061
S_auc_89_dev_from_mean                      36.556847   -36.556847
S_auc_89_diff_from_int                       2.792318    -2.792318
S_auc_124_output                             0.000000    -0.000000
S_auc_124_conf                           96008.171875-96008.171875
S_auc_124_weighted                       39554.011719-39554.011719
S_auc_124_dev_from_mean                      1.052265    -1.052265
S_auc_124_diff_from_int                     -0.587233     0.587233
S_auc_141_output                         39570.210938-39570.210938
S_auc_141_conf                          -5068163.5000005068163.500000
S_auc_141_weighted                      -5232521.0000005232521.000000
S_auc_141_dev_from_mean                    -33.423252    33.423252
S_auc_141_diff_from_int                     -3.691887     3.691887
S_auc_151_output                         39558.589844-39558.589844
S_auc_151_conf                           90850.171875-90850.171875
S_auc_151_weighted                          -7.856544     7.856544
S_auc_151_dev_from_mean                     -5.966421     5.966421
S_auc_151_diff_from_int                     -0.992893     0.992893
S_auc_168_output                        -451288.906250451288.906250
S_auc_168_conf                          -65365.578125 65365.578125
S_auc_168_weighted                      490844.125000-490844.125000
S_auc_168_dev_from_mean                      2.643380    -2.643380
S_auc_168_diff_from_int                      0.317969    -0.317969
chosen_conf_min                              0.000000    -0.000000
chosen_conf_max                          55077.019531-55077.019531
chosen_conf_avg                         -908543.312500908543.312500
highest_chosen_output                   153855.828125-153855.828125
highest_chosen_weighted                 -153855.921875153855.921875
model_avg                                    0.000000    -0.000000
chosen_model_avg                        -395603.437500395603.437500
models_mean                                 59.470715   -59.470715
models_std                                   0.131589    -0.131589
models_min                                   0.000000    -0.000000
models_max                                   1.412297    -1.412297
models_range                                -0.536038     0.536038
models_range_dev_high                       -1.083698     1.083698
models_range_dev_low                         1.388429    -1.388429
diff_mean_10_all                           -58.981258    58.981258
abs_diff_mean_10_all                        -1.434823     1.434823
BIAS                                    186843.671875-186843.671875


score: 0.0615  baseline: 0.0597  diff: 0.00186
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0279182 baseline avg error 0.0277302 avg improvement -0.000188001
category POSS: count 1046 avg error 0.0837001 baseline avg error 0.080278 avg improvement -0.00342211
category IMP: count 162 avg error 0.289917 baseline avg error 0.279722 avg improvement -0.0101945
overall: count 3000 avg error 0.0615155 baseline avg error 0.0596595 avg improvement -0.00185598
w


Ridge Regression:


Learned GLZ function:
feature                                     label0       label1
bias                                         0.000000    -0.000000
min_model                                    0.000000    -0.000000
max_model                                    0.000000    -0.000000
avg_model_chosen                             0.000000    -0.000000
min_weighted                                -0.035519     0.035519
max_weighted                                 0.020892    -0.020892
avg_weighted                                 0.000000    -0.000000
weighted_avg                                -0.341624     0.341624
S_auc_8_output                              -0.328620     0.328620
S_auc_8_conf                                 0.217331    -0.217331
S_auc_8_weighted                            -0.009243     0.009243
S_auc_8_dev_from_mean                       -0.035127     0.035127
S_auc_8_diff_from_int                        0.010684    -0.010684
S_auc_65_output                             -0.317247     0.317247
S_auc_65_conf                                0.212566    -0.212566
S_auc_65_weighted                           -0.005806     0.005806
S_auc_65_dev_from_mean                      -0.029357     0.029357
S_auc_65_diff_from_int                       0.013780    -0.013780
S_auc_66_output                             -0.317711     0.317711
S_auc_66_conf                                0.214542    -0.214542
S_auc_66_weighted                           -0.005482     0.005482
S_auc_66_dev_from_mean                      -0.028891     0.028891
S_auc_66_diff_from_int                       0.016726    -0.016726
S_auc_72_output                             -0.333903     0.333903
S_auc_72_conf                                0.219665    -0.219665
S_auc_72_weighted                           -0.008495     0.008495
S_auc_72_dev_from_mean                      -0.040573     0.040573
S_auc_72_diff_from_int                       0.020935    -0.020935
S_auc_82_output                             -0.334490     0.334490
S_auc_82_conf                                0.218557    -0.218557
S_auc_82_weighted                           -0.009297     0.009297
S_auc_82_dev_from_mean                      -0.038677     0.038677
S_auc_82_diff_from_int                       0.016342    -0.016342
S_auc_89_output                             -0.391729     0.391729
S_auc_89_conf                                0.000000    -0.000000
S_auc_89_weighted                           -0.000000     0.000000
S_auc_89_dev_from_mean                      -0.075317     0.075317
S_auc_89_diff_from_int                      -0.009634     0.009634
S_auc_124_output                            -0.339837     0.339837
S_auc_124_conf                               0.000000    -0.000000
S_auc_124_weighted                           0.000000    -0.000000
S_auc_124_dev_from_mean                     -0.044046     0.044046
S_auc_124_diff_from_int                      0.007553    -0.007553
S_auc_141_output                            -0.391455     0.391455
S_auc_141_conf                               0.240761    -0.240761
S_auc_141_weighted                          -0.028445     0.028445
S_auc_141_dev_from_mean                     -0.075014     0.075014
S_auc_141_diff_from_int                     -0.008678     0.008678
S_auc_151_output                            -0.339494     0.339494
S_auc_151_conf                               0.220228    -0.220228
S_auc_151_weighted                          -0.011876     0.011876
S_auc_151_dev_from_mean                     -0.043474     0.043474
S_auc_151_diff_from_int                      0.016723    -0.016723
S_auc_168_output                            -0.378733     0.378733
S_auc_168_conf                               0.236946    -0.236946
S_auc_168_weighted                          -0.024743     0.024743
S_auc_168_dev_from_mean                     -0.066869     0.066869
S_auc_168_diff_from_int                     -0.012450     0.012450
chosen_conf_min                              0.000000    -0.000000
chosen_conf_max                              0.236931    -0.236931
chosen_conf_avg                              0.178060    -0.178060
highest_chosen_output                       -0.341891     0.341891
highest_chosen_weighted                     -0.030948     0.030948
model_avg                                    0.000000    -0.000000
chosen_model_avg                             0.000000    -0.000000
models_mean                                 -0.275737     0.275737
models_std                                   0.071349    -0.071349
models_min                                   0.000000    -0.000000
models_max                                  -0.227445     0.227445
models_range                                 0.058359    -0.058359
models_range_dev_high                        0.018765    -0.018765
models_range_dev_low                        -0.016837     0.016837
diff_mean_10_all                             0.071585    -0.071585
abs_diff_mean_10_all                        -0.000738     0.000738
BIAS                                        -0.045517     0.045517

processing 3000 predictions...

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
 done.
elapsed: [124.61s cpu, 97296.2517 mticks, 36.40s wall]
score: 0.0584  baseline: 0.0597  diff: -0.00122
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0254104 baseline avg error 0.0277302 avg improvement 0.00231981
category POSS: count 1046 avg error 0.0798008 baseline avg error 0.080278 avg improvement 0.000477186
category IMP: count 162 avg error 0.285864 baseline avg error 0.279722 avg improvement -0.0061414
overall: count 3000 avg error 0.058439 baseline avg error 0.0596595 avg improvement 0.00122044

Much better!


----

Using ridge regression for conf:


Success
Gini = 0.8756866303033
AUC = 0.93784331515165

Not what I was hoping for...


-------

Denoising auto encoders

Training on small training set, with sizes 200, 100, 80, 50.
prob_cleared=0.25

Layer 0:
testing rmse of layer: exact 0.750719 noisy 0.824763
testing rmse of stack: exact 0.750719 noisy 0.825078

Layer 1:
testing rmse of layer: exact 0.647396 noisy 0.663439
testing rmse of stack: exact 1.2244 noisy 1.10089

Layer 2:
testing rmse of layer: exact 0.54445 noisy 0.572186
testing rmse of stack: exact 1.62892 noisy 1.47308

Layer 3:
testing rmse of layer: exact 0.534287 noisy 0.553167
testing rmse of stack: exact 2.01664 noisy 1.87444

Note that the RMSE of the exact represntation becomes higher than that of the
noisy representation when we get into the higher layers.  This is strange; we
would expect the opposite!


Setting prob_cleared to 0.01:

Layer 0:

testing rmse of layer: exact 0.656643 noisy 0.680308
testing rmse of stack: exact 0.656643 noisy 0.680228


Layer 1:

testing rmse of layer: exact 0.570418 noisy 0.583081
testing rmse of stack: exact 0.995111 noisy 0.995972

Layer 2:

testing rmse of layer: exact 0.427001 noisy 0.44472
testing rmse of stack: exact 1.22612 noisy 1.22274


Layer 3:

testing rmse of layer: exact 0.410943 noisy 0.423614
testing rmse of stack: exact 1.44175 noisy 1.43655


-----------------------

Various fixes (AUC), no reconstitutions:

./build/x86_64/bin/ausdm -t auc -T 0.20

score: 0.0571  baseline: 0.0597  diff: -0.00257
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0250538 baseline avg error 0.0277302 avg improvement 0.00267647
category POSS: count 1046 avg error 0.0772795 baseline avg error 0.080278 avg improvement 0.0029985
category IMP: count 162 avg error 0.281168 baseline avg error 0.279722 avg improvement -0.00144525
overall: count 3000 avg error 0.0570933 baseline avg error 0.0596595 avg improvement 0.00256618
0.0571 +/-    nan


With the 4 sizes of reconstitutions (makes it much slower due to inefficiencies in the :

score: 0.0569  baseline: 0.0597  diff: -0.00276
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0249997 baseline avg error 0.0277302 avg improvement 0.00273057
category POSS: count 1046 avg error 0.0774971 baseline avg error 0.080278 avg improvement 0.00278095
category IMP: count 162 avg error 0.276724 baseline avg error 0.279722 avg improvement 0.00299887
overall: count 3000 avg error 0.0568969 baseline avg error 0.0596595 avg improvement 0.00276262


0.0569 +/-    nan


Question: why the bimodal distribution for the confidence values?

model 64: before 0.0666467/0.374873 after 0.220815/0.333447
model 7: before 0.0665869/0.368651 after 0.198176/0.325907
model 71: before 0.0660204/0.379546 after 0.304334/0.229368
model 65: before 0.066959/0.373087 after 0.271334/0.220748
model 88: before 0.0658663/0.371858 after 0.267408/0.238292
model 81: before 0.0654779/0.37622 after 0.269312/0.225575
model 123: before 0.0643936/0.374808 after 0.329538/0.191075
model 140: before 0.0658261/0.370029 after 0.18544/0.335682
model 150: before 0.0655461/0.380835 after 0.202491/0.344179
model 167: before 0.0658265/0.367942 after 0.18448/0.333779


model 64: before 0.0700761/0.387666 after 0.452996/0.12874

Success
Gini = 0.87898462670556
AUC = 0.93949231335278

Not a great result.


But what about all of the 0.3xxxxxx values above?


-------------------

Profiling:
10 iterations
Note that 80% of samples were used and randomization was on.

/usr/bin/time ./build/x86_64/bin/decompose niter=10 prob_cleared=0.1 -t auc -T DNAE layer_sizes=400

testing rmse of stack: exact 1.01475 noisy 1.06024
elapsed: [398.46s cpu, 210256.1773 mticks, 78.67s wall]
322.70user 75.78system 1:18.69elapsed 506%CPU (0avgtext+0avgdata 0maxresident)k


With local updates:

testing rmse of stack: exact 0.911403 noisy 1.18857
elapsed: [451.66s cpu, 187340.3646 mticks, 70.09s wall]
379.47user 72.21system 1:10.11elapsed 644%CPU (0avgtext+0avgdata 0maxresident

Removed useless extra weight updating steps:

testing rmse of stack: exact 0.911403 noisy 1.18857
elapsed: [273.38s cpu, 116512.9294 mticks, 43.59s wall]
266.91user 6.48system 0:43.62elapsed 626%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2483249minor)pagefaults 0swaps

Minibatch_size = 256:

testing rmse of stack: exact 1.01475 noisy 1.06024
elapsed: [293.03s cpu, 127530.4644 mticks, 47.72s wall]
280.41user 12.63system 0:47.74elapsed 613%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4758900minor)pagefaults 0swaps

A bit slower, but probably worth it due to better convergence


Simplified expressions to save computation:

testing rmse of stack: exact 1.01475 noisy 1.06024
elapsed: [278.62s cpu, 125761.3499 mticks, 47.05s wall]
265.65user 12.98system 0:47.08elapsed 591%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4776474minor)pagefaults 0swaps

A bit better, not that much difference.

Moving everything over to float (1):

testing rmse of stack: exact 1.68017 noisy 1.66399
elapsed: [295.06s cpu, 131455.3340 mticks, 49.18s wall]
281.95user 13.13system 0:49.21elapsed 599%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5337325minor)pagefaults 0swaps

Slower, AND less accurate!

With NaN checking off:

testing rmse of stack: exact 1.68017 noisy 1.66399
elapsed: [284.89s cpu, 114519.3964 mticks, 42.85s wall]
269.58user 15.32system 0:42.87elapsed 664%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5054820minor)pagefaults 0swaps

Internal calculations in double precision:

testing rmse of stack: exact 1.13037 noisy 1.20409
elapsed: [288.74s cpu, 120263.9041 mticks, 45.00s wall]
273.25user 15.52system 0:45.02elapsed 641%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5106862minor)pagefaults 0swaps

Worth it!

Weights stored as double precision:

testing rmse of stack: exact 1.13037 noisy 1.20409
elapsed: [290.79s cpu, 124378.5799 mticks, 46.54s wall]
275.77user 15.04system 0:46.57elapsed 624%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4796407minor)pagefaults 0swaps

Very marginally worse; doesn't make much of a difference

With test_every=5 and some junk removed:

testing rmse of stack: exact 1.07445 noisy 1.20471
elapsed: [219.84s cpu, 95791.8669 mticks, 35.84s wall]
204.45user 15.40system 0:35.86elapsed 613%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5173133minor)pagefaults 0swaps


With singular value equalization:


testing rmse of stack: exact 1.03395 noisy 1.40583
elapsed: [219.58s cpu, 97414.3006 mticks, 36.45s wall]
204.32user 15.26system 0:36.47elapsed 602%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5118235minor)pagefaults 0swaps

Training for AUC:

/usr/bin/time ./build/x86_64/bin/decompose niter=500 prob_cleared=0.1 -t auc -T DNAE layer_sizes=250,150,100,50 minibatch_size=256 test_every=10 sample_proportion=0.8 minibatch_size=256 -o auc_decomposed_4.bin 2>&1 | tee log auc_decomposed_4.log

Added in that 50% of examples have zero missing:

/usr/bin/time ./build/x86_64/bin/decompose niter=500 prob_cleared=0.1 -t auc -T DNAE layer_sizes=250,150,100,50 minibatch_size=256 test_every=10 sample_proportion=0.8 minibatch_size=256 -o auc_decomposed_5.bin 2>&1 | tee log auc_decomposed_5.log

--------

New submission


jeremy@jeremy-desktop:~/projects/ausdm$ ./build/x86_64/bin/ausdm -t auc --decomposition auc_decomposed_5.bin -o auc.txt

Success
Gini = 0.8757558999216
AUC = 0.9378779499608


-----

SVD:

model 7: before 0.06852/0.384003 after 0.229559/0.335661
model 64: before 0.0700761/0.387666 after 0.302785/0.215156
model 65: before 0.0689352/0.387611 after 0.30091/0.215864
model 71: before 0.0680365/0.39421 after 0.313383/0.222036
model 81: before 0.0677772/0.389624 after 0.296709/0.226068
model 88: before 0.068651/0.387245 after 0.223344/0.337744
model 123: before 0.0668508/0.389111 after 0.22569/0.338291
model 140: before 0.0688745/0.385887 after 0.226641/0.334532
model 150: before 0.0678839/0.393203 after 0.302427/0.225582
model 167: before 0.0683845/0.383097 after 0.209152/0.337951

score: 0.0569  baseline: 0.0597  diff: -0.00276
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0249997 baseline avg error 0.0277302 avg improvement 0.00273057
category POSS: count 1046 avg error 0.0774971 baseline avg error 0.080278 avg improvement 0.00278095
category IMP: count 162 avg error 0.276724 baseline avg error 0.279722 avg improvement 0.00299887
overall: count 3000 avg error 0.0568969 baseline avg error 0.0596595 avg improvement 0.00276262


0.0569 +/-    nan


With all models as features:

score: 0.0575  baseline: 0.0597  diff: -0.00217
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0255618 baseline avg error 0.0277302 avg improvement 0.00216845
category POSS: count 1046 avg error 0.0780009 baseline avg error 0.080278 avg improvement 0.00227712
category IMP: count 162 avg error 0.278175 baseline avg error 0.279722 avg improvement 0.00154763
overall: count 3000 avg error 0.0574867 baseline avg error 0.0596595 avg improvement 0.00217282

0.0575 +/-    nan


So that's not a good idea.

model 7: before 0.06852/0.384003 after 0.235083/0.332969
model 64: before 0.0700761/0.387666 after 0.316355/0.19513
model 65: before 0.0689352/0.387611 after 0.312144/0.19573
model 71: before 0.0680365/0.39421 after 0.307778/0.245703
model 81: before 0.0677772/0.389624 after 0.252069/0.335179
model 88: before 0.068651/0.387245 after 0.213997/0.336884
model 123: before 0.0668508/0.389111 after 0.24284/0.336533
model 140: before 0.0688745/0.385887 after 0.217393/0.333598
model 150: before 0.0678839/0.393203 after 0.331594/0.194509
model 167: before 0.0683845/0.383097 after 0.211377/0.338049


Training the SVD only on the testing data:

/build/x86_64/bin/ausdm -t auc --decomposition SVD -T 0.20

category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.02465 baseline avg error 0.0277302 avg improvement 0.00308023
category POSS: count 1046 avg error 0.0772276 baseline avg error 0.080278 avg improvement 0.00305043
category IMP: count 162 avg error 0.274658 baseline avg error 0.279722 avg improvement 0.00506434
overall: count 3000 avg error 0.0564825 baseline avg error 0.0596595 avg improvement 0.00317698

0.0565 +/-    nan


Seems to be a good idea; probably does something to stop overfitting

----

Trying to train another decomposition:

jeremy@jeremy-desktop:~/projects/ausdm$ /usr/bin/time ./build/x86_64/bin/decompose niter=500 prob_cleared=0.1 -t auc -T DNAE layer_sizes=250,150,100,50 minibatch_size=256 test_every=10 sample_proportion=0.8 minibatch_size=256 -o auc_decomposed_6.bin 2>&1 | tee log auc_decomposed_6.log


score: 0.0577  baseline: 0.0597  diff: -0.00196
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0257383 baseline avg error 0.0277302 avg improvement 0.00199198
category POSS: count 1046 avg error 0.0784031 baseline avg error 0.080278 avg improvement 0.00187492
category IMP: count 162 avg error 0.277497 baseline avg error 0.279722 avg improvement 0.00222539
overall: count 3000 avg error 0.0576957 baseline avg error 0.0596595 avg improvement 0.00196377


---------


With one single IRLS:

model 7: before 0.06852/0.384003 after 0.42331/0.128117
model 64: before 0.0700761/0.387666 after 0.45695/0.12374
model 65: before 0.0689352/0.387611 after 0.451618/0.128038
model 71: before 0.0680365/0.39421 after 0.461487/0.132416
model 81: before 0.0677772/0.389624 after 0.459428/0.130617
model 88: before 0.068651/0.387245 after 0.449637/0.126693
model 123: before 0.0668508/0.389111 after 0.459812/0.131259
model 140: before 0.0688745/0.385887 after 0.450065/0.125641
model 150: before 0.0678839/0.393203 after 0.462587/0.133776
model 167: before 0.0683845/0.383097 after 0.444729/0.125012

With 10 IRLS:

model 7: before 0.06852/0.384003 after 0.461492/0.125508
model 64: before 0.0700761/0.387666 after 0.476979/0.125067
model 65: before 0.0689352/0.387611 after 0.473917/0.129824
model 71: before 0.0680365/0.39421 after 0.478801/0.133164
model 81: before 0.0677772/0.389624 after 0.480303/0.132256
model 88: before 0.068651/0.387245 after 0.46412/0.127911
model 123: before 0.0668508/0.389111 after 0.473596/0.131626
model 140: before 0.0688745/0.385887 after 0.46605/0.126157
model 150: before 0.0678839/0.393203 after 0.481739/0.134856
model 167: before 0.0683845/0.383097 after 0.460483/0.126257


--------


Attempt of trained dnae:

./build/x86_64/bin/ausdm -t auc auc.type=deep_net model_base=auc_decomposed_6.bin niter=500 learning_rate=0.4 -o auc-dnae.txt

Success
Gini = 0.8797460452102
AUC = 0.9398730226051


 ./build/x86_64/bin/ausdm -t auc auc.type=deep_net model_base=auc_decomposed_6.bin niter=2000 learning_rate=0.4 -o auc-dnae2.txt 2>&1 | tee auc-dnae2.lo


Success
Gini = 0.8770325990085
AUC = 0.93851629950425

Hmmm....
/x86_64/bin/ausdm -t auc auc.type=deep_net model_base=auc_decomposed_8.bin niter=500 learning_rate=0.4 -o auc-dnae3.txt 2>&1 | tee auc-dnae3.log


Success
Gini = 0.87949115052811
AUC = 0.93974557526406

Still not where we would like to be...


./build/x86_64/bin/ausdm -t auc auc.type=deep_net model_base=auc_decomposed_6.bin niter=500 learning_rate=1 -T 0.20 2>&1 | tee deep_net_with_features_2.log

category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0249236 baseline avg error 0.0277302 avg improvement 0.00280663
category POSS: count 1046 avg error 0.0768187 baseline avg error 0.080278 avg improvement 0.00345936
category IMP: count 162 avg error 0.266806 baseline avg error 0.279722 avg improvement 0.0129163
overall: count 3000 avg error 0.0560793 baseline avg error 0.0596595 avg improvement 0.00358014

0.0561 +/-    nan

Submission:

Success
Gini = 0.88096246860063
AUC = 0.94048123430032

---------

Trying to merge two models together (to see what the difference is)

./build/x86_64/bin/ausdm -t auc --decomposition SVD -o auc-gated-svd1.txt 2>&1 | tee auc-gated-svd1.log


 ./build/x86_64/bin/ausdm -t auc --decomposition SVD -T 0.20 2>&1 | tee auc-gated-svd-test.log

score: 0.0565  baseline: 0.0597  diff: -0.00317
category UNKNOWN: count 0 avg error 0 baseline avg error 0 avg improvement 0
category AUTO: count 1792 avg error 0.0245438 baseline avg error 0.0277302 avg improvement 0.00318641
category POSS: count 1046 avg error 0.0777773 baseline avg error 0.080278 avg improvement 0.00250071
category IMP: count 162 avg error 0.272456 baseline avg error 0.279722 avg improvement 0.0072664
overall: count 3000 avg error 0.0564918 baseline avg error 0.0596595 avg improvement 0.00316765
worst entries:


Merging the SVD gated model with the deep net model:

First I normalize them both to have a unit standard deviation and zero mean, as their scales are not comparable.


jeremy@jeremy-desktop:~/projects/ausdm$ cat auc-gated-svd1.txt | awk '{ total += $1; records += 1; } END { mean = total / records; print mean; }'