|
1 | | -Adapt examples; remaining: example 4-onwards |
2 | | -not working: 15 (qunfold) |
3 | | - |
4 | 1 | Solve the warnings issue; right now there is a warning ignore in method/__init__.py: |
5 | 2 |
|
6 | 3 | Add 'platt' to calib options in EMQ? |
7 | 4 |
|
8 | 5 | Allow n_prevpoints in APP to be specified by a user-defined grid? |
9 | 6 |
|
10 | | -Update READMEs, wiki, & examples for new fit-predict interface |
11 | | - |
12 | | -Add the fix suggested by Alexander: |
13 | | - |
14 | | -For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence |
| 7 | +Add the fix suggested by Alexander? |
| 8 | +"For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence |
15 | 9 | based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual |
16 | 10 | positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while |
17 | 11 | the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33], |
18 | 12 | we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or |
19 | | -scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum. |
| 13 | +scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum." |
20 | 14 | - This affects LabelledCollection |
21 | 15 | - This functionality should be accessible via sampling protocols and evaluation functions |
22 | 16 |
|
23 | | -Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to |
24 | | -work for having access to classes_; think also the case in which the precomputed outputs are already generated |
25 | | -as in the unifying problems code. |
26 | | - |
27 | | -Para quitar el labelledcollection de los métodos: |
28 | | - |
29 | | -- El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros: |
30 | | - - data: LabelledCollection, que puede ser: |
31 | | - - el training set si hay que entrenar el clasificador |
32 | | - - None si no hay que entregar el clasificador |
33 | | - - el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador |
34 | | - - fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros |
35 | | - - val_split: que puede ser: |
36 | | - - un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set |
37 | | - - una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val |
38 | | - - un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False |
39 | | -- La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así: |
40 | | - - En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado; |
41 | | - es decir, hay un fit_classifier=True o False. |
42 | | - - fit_classifier=True: |
43 | | - - data en fit es todo el training incluyendo el validation y todo |
44 | | - - val_split: |
45 | | - - int: número de folds en kfcv |
46 | | - - proporción en [0,1] |
47 | | - - fit_classifier=False: |
48 | | - |
49 | | - |
50 | | - |
51 | 17 | - [TODO] document confidence in manuals |
52 | 18 | - [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example |
53 | 19 | - [TODO] Add EDy (an implementation is available at quantificationlib) |
|
0 commit comments