Skip to content

Commit b5a69ba

Browse files
author
Matteo Serafino
committed
Update REAME documentation and add the k-fold image
1 parent bf5616f commit b5a69ba

File tree

2 files changed

+64
-5
lines changed

2 files changed

+64
-5
lines changed

README.md

Lines changed: 64 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,74 @@
22
Python package for plug and play cross validation techniques.
33
If you like the idea or you find usefull this repo in your job, please leave a ⭐ to support this personal project.
44

5-
The documentation will grow with all the information about all the cross-validation techniques.
6-
75
* Cross Validation methods:
8-
* K-fold;
9-
* Leave One Out (LOO);
10-
* Leave One Subject Out (LOSO).
6+
* [K-fold](#k-fold);
7+
* [Leave One Out (LOO)](#leave-one-out-loo);
8+
* [Leave One Subject Out (LOSO)](#leave-one-subject-out-loso).
9+
10+
At the moment the package is not available using `pip install <PACKAGE-NAME>`.
11+
12+
For the installation from the source code click **[here](#installation)**.
13+
14+
Each method returns the confusion matrix and some performance metrics for each itheration and for the overall result.
15+
The performance metrics are:
16+
* Balanced Accuracy;
17+
* F1 Score;
18+
* Matthews Correlation Coefficient.
1119

1220
## K-fold
21+
K-fold consists of partitioning the dataset into k subsets; iteratively one of the k subsets is the test set and the others are the training set.
22+
The value of k could be chosen according to the amount of available data. Increasing the value of k the result is enlarging the training set and decreasing the size of the test set.
23+
Tipically, the default value of k is between 5 to 10, this is a good trade of between a robust validation and computational time.
24+
After a k-fold cross validation all the data set has been tested and it is possible to generate a confusion matrix and compute some performance metrics to validate the generalization capabilities of your model.
25+
26+
![k-fold-cv-image](images/k-fold-cross-validation.png)
27+
***K-fold cross-validation concept illustration** Each row represents an iteration of the cross-validation; in blue, there are the subsets labeled as training set and in orange, the subset defined as test set for the i-th iteration.
28+
At the end, each subset has been tested getting the outcome, that could be compared to the real outputs of the instances*
29+
30+
### Example
31+
```python
32+
from cross_validation.cross_validation import kfold
33+
34+
clf = RandomForestClassifier()
35+
[cm, perf] = kfold(clf, X, y, verbose=True)
36+
```
1337

1438
## Leave One Out (LOO)
39+
Leave-one-out (LOO) is a particular case of the k-fold when the value of k is equal to the number of data points in the dataset.
40+
This method should be used when the data set has few samples; this guarantees to have enough data point for the model training; after the training phase only one point will be evaluated by the model.
41+
42+
### Example
43+
```python
44+
from cross_validation.cross_validation import leave_one_out
45+
46+
clf = RandomForestClassifier()
47+
[cm, perf] = leave_one_out(clf, X, y, verbose=True)
48+
```
1549

1650
## Leave One Subject Out (LOSO)
51+
This method could be considered as a different version of the leave-one-out cross-validation. This method works leaving as a test set not a single example, but the entire examples that belong to a specific subject. The other subjects’ instances are used to train the learning algorithm.
52+
The main advantage of the LOSO is the removal of the subject bias because all the instances of the are the test set.
53+
This technique of cross-validation is widely used in the biomedical field where the the main task is to predict a disease or a condition of a patient using data of other patients.
54+
55+
### Example
56+
```python
57+
from cross_validation.cross_validation import leave_one_subject_out
58+
59+
clf = RandomForestClassifier()
60+
[cm, perf] = leave_one_subject_out(clf, X, y, subject_ids, verbose=True):
61+
```
62+
63+
## Installation
64+
For the installation from the source code type this command into your terminal window:
65+
```
66+
pip install git+<repository-link>
67+
```
68+
or
69+
```
70+
python -m pip install git+<repository-link>
71+
```
72+
or
73+
```
74+
python3 -m pip install git+<repository-link>
75+
```

images/k-fold-cross-validation.png

10.7 KB
Loading

0 commit comments

Comments
 (0)