Skip to content

Dataset LIVER #5

@GianlucaApriceno

Description

@GianlucaApriceno

Work to be done

"the 7th field (selector) of the liver dataset has been widely misinterpreted in the past as a dependent variable representing the presence or absence of a liver disorder. This is incorrect since the 7th field was created by BUPA researchers as a train/test selector. The dataset does not contain any variable representing the presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification.''

Up to now, we have used the 7th field which is, as stated before, incorrect, so we have to fix it. Furthermore, the split has to be (probably) modified too.

  • Check comparison (if any)

  • Investigate feature scaling:

    • How are the features scaled? (probably useful also for the other datasets, if not done differently)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions