|
| 1 | +from quapy.method.confidence import AggregativeBootstrap |
| 2 | +from quapy.method.aggregative import PACC |
| 3 | +import quapy.functional as F |
| 4 | +import quapy as qp |
| 5 | + |
| 6 | +""" |
| 7 | +Just like any other type of estimator, quantifier predictions are affected by error. It is therefore useful to provide, |
| 8 | +along with the point estimate (the class prevalence values) a measure of uncertainty. These, typically come in the |
| 9 | +form of credible regions around the point estimate. |
| 10 | +
|
| 11 | +QuaPy implements a method for deriving confidence regions around point estimates of class prevalence based on bootstrap. |
| 12 | +
|
| 13 | +Bootstrap method comes down to resampling the population several times, thus generating a series of point estimates. |
| 14 | +QuaPy provides a variant of bootstrap for aggregative quantifiers, that only applies resampling to the pre-classified |
| 15 | +instances. |
| 16 | +
|
| 17 | +Let see one example: |
| 18 | +""" |
| 19 | + |
| 20 | +# load some data |
| 21 | +data = qp.datasets.fetch_UCIMulticlassDataset('molecular') |
| 22 | +train, test = data.train_test |
| 23 | + |
| 24 | +# by simply wrapping an aggregative quantifier within the AggregativeBootstrap class, we can obtain confidence |
| 25 | +# intervals around the point estimate, in this case, at 95% of confidence |
| 26 | +pacc = AggregativeBootstrap(PACC(), confidence_level=0.95) |
| 27 | + |
| 28 | +with qp.util.temp_seed(0): |
| 29 | + # we train the quantifier the usual way |
| 30 | + pacc.fit(train) |
| 31 | + |
| 32 | + # let us simulate some shift in the test data |
| 33 | + random_prevalence = F.uniform_prevalence_sampling(n_classes=test.n_classes) |
| 34 | + shifted_test = test.sampling(200, *random_prevalence) |
| 35 | + true_prev = shifted_test.prevalence() |
| 36 | + |
| 37 | + # by calling "quantify_conf", we obtain the point estimate and the confidence intervals around it |
| 38 | + pred_prev, conf_intervals = pacc.quantify_conf(shifted_test.X) |
| 39 | + |
| 40 | + # conf_intervals is an instance of ConfidenceRegionABC, which provides some useful utilities like: |
| 41 | + # - coverage: a function which computes the fraction of true values that belong to the confidence region |
| 42 | + # - simplex_proportion: estimates the proportion of the simplex covered by the confidence region (amplitude) |
| 43 | + # ideally, we are interested in obtaining confidence regions with high level of coverage and small amplitude |
| 44 | + |
| 45 | + # the point estimate is computed as the mean of all bootstrap predictions; let us see the prediction error |
| 46 | + error = qp.error.ae(true_prev, pred_prev) |
| 47 | + |
| 48 | + # some useful outputs |
| 49 | + print(f'train prevalence: {F.strprev(train.prevalence())}') |
| 50 | + print(f'test prevalence: {F.strprev(true_prev)}') |
| 51 | + print(f'point-estimate: {F.strprev(pred_prev)}') |
| 52 | + print(f'absolute error: {error:.3f}') |
| 53 | + print(f'Is the true value in the confidence region?: {conf_intervals.coverage(true_prev)==1}') |
| 54 | + print(f'Proportion of simplex covered at {pacc.confidence_level*100:.1f}%: {conf_intervals.simplex_portion()*100:.2f}%') |
| 55 | + |
| 56 | +""" |
| 57 | +Final remarks: |
| 58 | +There are various ways for performing bootstrap: |
| 59 | +- the population-based approach (default): performs resampling of the test instances |
| 60 | + e.g., use AggregativeBootstrap(PACC(), n_train_samples=1, n_test_samples=100, confidence_level=0.95) |
| 61 | +- the model-based approach: performs resampling of the training instances, thus training several quantifiers |
| 62 | + e.g., use AggregativeBootstrap(PACC(), n_train_samples=100, n_test_samples=1, confidence_level=0.95) |
| 63 | + this implementation avoids retraining the classifier, and performs resampling only to train different aggregation functions |
| 64 | +- the combined approach: a combination of the above |
| 65 | + e.g., use AggregativeBootstrap(PACC(), n_train_samples=100, n_test_samples=100, confidence_level=0.95) |
| 66 | + this example will generate 100 x 100 predictions |
| 67 | + |
| 68 | +There are different ways for constructing confidence regions implemented in QuaPy: |
| 69 | +- confidence intervals: the simplest way, and one that typically works well in practice |
| 70 | + use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='intervals') |
| 71 | +- confidence ellipse in the simplex: creates an ellipse, which lies on the probability simplex, around the point estimate |
| 72 | + use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='ellipse') |
| 73 | +- confidence ellipse in the Centered-Log Ratio (CLR) space: creates an ellipse in the CLR space (this should be |
| 74 | + convenient for taking into account the inner structure of the probability simplex) |
| 75 | + use: AggregativeBootstrap(PACC(), confidence_level=0.95, method='ellipse-clr') |
| 76 | +""" |
| 77 | + |
| 78 | + |
0 commit comments