-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Work to be done
- Fix 7th field in liver dataset and data split: According to https://archive.ics.uci.edu/dataset/60/liver+disorders:
"the 7th field (selector) of the liver dataset has been widely misinterpreted in the past as a dependent variable representing the presence or absence of a liver disorder. This is incorrect since the 7th field was created by BUPA researchers as a train/test selector. The dataset does not contain any variable representing the presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification.''
Up to now, we have used the 7th field which is, as stated before, incorrect, so we have to fix it. Furthermore, the split has to be (probably) modified too.
-
Check comparison (if any)
-
Investigate feature scaling:
- How are the features scaled? (probably useful also for the other datasets, if not done differently)