Skip to content

Commit fd185bd

Browse files
committed
merged from devel
2 parents d556910 + 8a39222 commit fd185bd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+3096
-1244
lines changed

.github/workflows/ci.yml

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ on:
66
branches:
77
- master
88
- devel
9+
tags:
10+
- "[0-9]+.[0-9]+.[0-9]+"
911

1012
jobs:
1113

@@ -28,7 +30,7 @@ jobs:
2830
- name: Install dependencies
2931
run: |
3032
python -m pip install --upgrade pip setuptools wheel
31-
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
33+
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@main"
3234
python -m pip install -e .[bayes,tests]
3335
- name: Test with unittest
3436
run: python -m unittest
@@ -47,7 +49,7 @@ jobs:
4749
- name: Install dependencies
4850
run: |
4951
python -m pip install --upgrade pip setuptools wheel "jax[cpu]"
50-
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@v0.1.4"
52+
python -m pip install "qunfold @ git+https://github.com/mirkobunse/qunfold@main"
5153
python -m pip install -e .[neural,docs]
5254
- name: Build documentation
5355
run: sphinx-build -M html docs/source docs/build
@@ -66,3 +68,37 @@ jobs:
6668
branch: gh-pages
6769
directory: __gh-pages/
6870
github_token: ${{ secrets.GITHUB_TOKEN }}
71+
72+
release:
73+
name: Build & Publish Release
74+
runs-on: ubuntu-latest
75+
if: startsWith(github.ref, 'refs/tags/')
76+
steps:
77+
- uses: actions/checkout@v4
78+
- name: Set up Python
79+
uses: actions/setup-python@v5
80+
with:
81+
python-version: "3.11"
82+
- name: Install build dependencies
83+
run: |
84+
python -m pip install --upgrade pip build twine
85+
- name: Build package
86+
run: python -m build
87+
- name: Publish to PyPI
88+
uses: pypa/gh-action-pypi-publish@release/v1
89+
with:
90+
user: __token__
91+
password: ${{ secrets.PYPI_API_TOKEN }}
92+
- name: Create GitHub Release
93+
id: create_release
94+
uses: actions/create-release@v1
95+
with:
96+
tag_name: ${{ github.ref_name }}
97+
release_name: Release ${{ github.ref_name }}
98+
body: |
99+
Changes in this release:
100+
- see commit history for details
101+
draft: false
102+
prerelease: false
103+
env:
104+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

CHANGE_LOG.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,38 @@
1+
Change Log 0.2.0
2+
-----------------
3+
4+
CLEAN TODO-FILE
5+
6+
- Base code Refactor:
7+
- Removing coupling between LabelledCollection and quantification methods; the fit interface changes:
8+
def fit(data:LabelledCollection): -> def fit(X, y):
9+
- Adding function "predict" (function "quantify" is still present as an alias, for the nostalgic)
10+
- Aggregative methods's behavior in terms of fit_classifier and how to treat the val_split is now
11+
indicated exclusively at construction time, and it is no longer possible to indicate it at fit time.
12+
This is because, in v<=0.1.9, one could create a method (e.g., ACC) and then indicate:
13+
my_acc.fit(tr_data, fit_classifier=False, val_split=val_data)
14+
in which case the first argument is unused, and this was ambiguous with
15+
my_acc.fit(the_data, fit_classifier=False)
16+
in which case the_data is to be used for validation purposes. However, the val_split could be set as a fraction
17+
indicating only part of the_data must be used for validation, and the rest wasted... it was certainly confusing.
18+
- This change imposes a versioning constrain with qunfold, which now must be >= 0.1.6
19+
- EMQ has been modified, so that the representation function "classify" now only provides posterior
20+
probabilities and, if required, these are recalibrated (e.g., by "bcts") during the aggregation function.
21+
- A new parameter "on_calib_error" is passed to the constructor, which informs of the policy to follow
22+
in case the abstention's calibration functions failed (which happens sometimes). Options include:
23+
- 'raise': raises a RuntimeException (default)
24+
- 'backup': reruns by silently avoiding calibration
25+
- Parameter "recalib" has been renamed "calib"
26+
- Added aggregative bootstrap for deriving confidence regions (confidence intervals, ellipses in the simplex, or
27+
ellipses in the CLR space). This method is efficient as it leverages the two-phases of the aggregative quantifiers.
28+
This method applies resampling only to the aggregation phase, thus avoiding to train many quantifiers, or
29+
classify multiple times the instances of a sample. See:
30+
- quapy/method/confidence.py (new)
31+
- the new example no. 16.confidence_regions.py
32+
- BayesianCC moved to confidence.py, where methods having to do with confidence intervals belong.
33+
- Improved documentation of qp.plot module.
34+
35+
136
Change Log 0.1.9
237
----------------
338

README.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ for facilitating the analysis and interpretation of the experimental results.
1313

1414
### Last updates:
1515

16-
* Version 0.1.9 is released! major changes can be consulted [here](CHANGE_LOG.txt).
17-
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/index.html)
16+
* Version 0.2.0 is released! major changes can be consulted [here](CHANGE_LOG.txt).
17+
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)
1818

1919
### Installation
2020

@@ -46,15 +46,15 @@ of the test set.
4646
```python
4747
import quapy as qp
4848

49-
dataset = qp.datasets.fetch_UCIBinaryDataset("yeast")
50-
training, test = dataset.train_test
49+
training, test = qp.datasets.fetch_UCIBinaryDataset("yeast").train_test
5150

5251
# create an "Adjusted Classify & Count" quantifier
5352
model = qp.method.aggregative.ACC()
54-
model.fit(training)
53+
Xtr, ytr = training.Xy
54+
model.fit(Xtr, ytr)
5555

56-
estim_prevalence = model.quantify(test.X)
57-
true_prevalence = test.prevalence()
56+
estim_prevalence = model.predict(test.X)
57+
true_prevalence = test.prevalence()
5858

5959
error = qp.error.mae(true_prevalence, estim_prevalence)
6060
print(f'Mean Absolute Error (MAE)={error:.3f}')
@@ -67,8 +67,7 @@ class prevalence of the training set. For this reason, any quantification model
6767
should be tested across many samples, even ones characterized by class prevalence
6868
values different or very different from those found in the training set.
6969
QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
70-
See the [documentation](https://hlt-isti.github.io/QuaPy/manuals/protocols.html)
71-
and the [examples directory](https://github.com/HLT-ISTI/QuaPy/tree/master/examples) for detailed examples.
70+
See the [documentation](https://hlt-isti.github.io/QuaPy/build/html/) for detailed examples.
7271

7372
## Features
7473

@@ -80,8 +79,8 @@ quantification methods based on structured output learning, HDy, QuaNet, quantif
8079
* 32 UCI Machine Learning datasets.
8180
* 11 Twitter quantification-by-sentiment datasets.
8281
* 3 product reviews quantification-by-sentiment datasets.
83-
* 4 tasks from LeQua 2022 competition
84-
* 4 tasks from LeQua 2024 competition (_new in v0.1.9!_)
82+
* 4 tasks from LeQua 2022 competition and 4 tasks from LeQua 2024 competition
83+
* IFCB for Plancton quantification
8584
* Native support for binary and single-label multiclass quantification scenarios.
8685
* Model selection functionality that minimizes quantification-oriented loss functions.
8786
* Visualization tools for analysing the experimental results.
@@ -102,22 +101,23 @@ In case you want to contribute improvements to quapy, please generate pull reque
102101

103102
## Documentation
104103

105-
The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/).
104+
The [developer API documentation](https://hlt-isti.github.io/QuaPy/build/html/modules.html) is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html).
106105

107-
Check out our [Manuals](https://hlt-isti.github.io/QuaPy/manuals.html), in which many examples
106+
Check out the [Manuals](https://hlt-isti.github.io/QuaPy/manuals.html), in which many code examples
108107
are provided:
109108

110109
* [Datasets](https://hlt-isti.github.io/QuaPy/manuals/datasets.html)
111110
* [Evaluation](https://hlt-isti.github.io/QuaPy/manuals/evaluation.html)
112-
* [Explicit loss minimization](https://hlt-isti.github.io/QuaPy/manuals/explicit-loss-minimization.html)
111+
* [Protocols](https://hlt-isti.github.io/QuaPy/manuals/protocols.html)
113112
* [Methods](https://hlt-isti.github.io/QuaPy/manuals/methods.html)
114-
* [Model Selection](https://hlt-isti.github.io/QuaPy/manuals/datasets.html)
113+
* [SVMperf](https://hlt-isti.github.io/QuaPy/manuals/explicit-loss-minimization.html)
114+
* [Model Selection](https://hlt-isti.github.io/QuaPy/manuals/model-selection.html)
115115
* [Plotting](https://hlt-isti.github.io/QuaPy/manuals/plotting.html)
116-
* [Protocols](https://hlt-isti.github.io/QuaPy/manuals/protocols.html)
117116

118117
## Acknowledgments:
119118

120-
This work has been funded by the QuaDaSh project (P2022TB5JF) "Finanziato dall’Unione europea- Next Generation EU, Missione 4 Componente 2 CUP B53D23026250001".
121-
122-
<img src="docs/source/EUfooter.png" alt="EUcommission" width="1000"/>
123119
<img src="docs/source/SoBigData.png" alt="SoBigData++" width="250"/>
120+
121+
This work has been supported by the QuaDaSh project
122+
_"Finanziato dall’Unione europea---Next Generation EU,
123+
Missione 4 Componente 2 CUP B53D23026250001"_.

TODO.txt

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,60 @@
1+
Adapt examples; remaining: example 4-onwards
2+
not working: 15 (qunfold)
3+
4+
Solve the warnings issue; right now there is a warning ignore in method/__init__.py:
5+
6+
Add 'platt' to calib options in EMQ?
7+
8+
Allow n_prevpoints in APP to be specified by a user-defined grid?
9+
10+
Update READMEs, wiki, & examples for new fit-predict interface
11+
12+
Add the fix suggested by Alexander:
13+
14+
For a more general application, I would maybe first establish a per-class threshold value of plausible prevalence
15+
based on the number of actual positives and the required sample size; e.g., for sample_size=100 and actual
16+
positives [10, 100, 500] -> [0.1, 1.0, 1.0], meaning that class 0 can be sampled at most at 0.1 prevalence, while
17+
the others can be sampled up to 1. prevalence. Then, when a prevalence value is requested, e.g., [0.33, 0.33, 0.33],
18+
we may either clip each value and normalize (as you suggest for the extreme case, e.g., [0.1, 0.33, 0.33]/sum) or
19+
scale each value by per-class thresholds, i.e., [0.33*0.1, 0.33*1, 0.33*1]/sum.
20+
- This affects LabelledCollection
21+
- This functionality should be accessible via sampling protocols and evaluation functions
22+
23+
Solve the pre-trained classifier issues. An example is the coptic-codes script I did, which needed a mock_lr to
24+
work for having access to classes_; think also the case in which the precomputed outputs are already generated
25+
as in the unifying problems code.
26+
27+
Para quitar el labelledcollection de los métodos:
28+
29+
- El follón viene por la semántica confusa de fit en agregativos, que recibe 3 parámetros:
30+
- data: LabelledCollection, que puede ser:
31+
- el training set si hay que entrenar el clasificador
32+
- None si no hay que entregar el clasificador
33+
- el validation, que entra en conflicto con val_split, si no hay que entrenar clasificador
34+
- fit_classifier: dice si hay que entrenar el clasificador o no, y estos cambia la semántica de los otros
35+
- val_split: que puede ser:
36+
- un número: el número de kfcv, lo cual implica fit_classifier=True y data=todo el training set
37+
- una fración en [0,1]: que indica la parte que usamos para validation; implica fit_classifier=True y data=train+val
38+
- un labelled collection: el conjunto de validación específico; no implica fit_classifier=True ni False
39+
- La forma de quitar la dependencia de los métodos con LabelledCollection debería ser así:
40+
- En el constructor se dice si el clasificador que se recibe por parámetro hay que entrenarlo o ya está entrenado;
41+
es decir, hay un fit_classifier=True o False.
42+
- fit_classifier=True:
43+
- data en fit es todo el training incluyendo el validation y todo
44+
- val_split:
45+
- int: número de folds en kfcv
46+
- proporción en [0,1]
47+
- fit_classifier=False:
48+
49+
50+
51+
- [TODO] document confidence in manuals
52+
- [TODO] Test the return_type="index" in protocols and finish the "distributing_samples.py" example
53+
- [TODO] Add EDy (an implementation is available at quantificationlib)
154
- [TODO] add ensemble methods SC-MQ, MC-SQ, MC-MQ
255
- [TODO] add HistNetQ
356
- [TODO] add CDE-iteration and Bayes-CDE methods
457
- [TODO] add Friedman's method and DeBias
558
- [TODO] check ignore warning stuff
659
check https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings
60+
- [TODO] nmd and md are not selectable from qp.evaluation.evaluate as a string

0 commit comments

Comments
 (0)