Skip to content

Implement outlier detection in RCF #348

@baslia

Description

@baslia

Problem

Currently, (Random Cut Forest) RCF is not used for outlier detection. This significantly limits its capabilities in identifying and analyzing data points that deviate significantly from the overall pattern, potentially leading to inaccurate conclusions and missed insights.

Proposed solution

Introduce outlier detection capabilities to RCF, enabling users to identify and analyze data points that fall outside the expected range for their respective categories or nutri-score labels. This could be achieved by implementing the following functionalities:

  • Ability to specify categorical and numerical features for outlier detection. This allows users to identify outliers within specific categories, such as finding outlier nutrient values for different food types or nutri-score levels.
  • Option to apply different outlier detection methods for each category. This provides flexibility to tailor the analysis to the specific characteristics of each group, ensuring accurate outlier identification.
  • Visualization options to represent outliers within each category. This could include scatter plots, boxplots, or other appropriate visualizations that clearly show the distribution of data and highlight outliers in each category.

Additional context

Outlier detection plays a crucial role in data analysis, enabling researchers to identify data points that might be erroneous, fraudulent, or indicative of unique patterns. By incorporating outlier detection capabilities, RCF would become a more comprehensive and versatile tool for analyzing nutritional data, providing deeper insights into dietary patterns and potential health implications.

Mockups

A dropdown menu or checkbox option to select a categorical feature alongside the numerical feature for outlier detection.
A visualization panel displaying scatter plots or boxplots for each category, highlighting outlier data points.

Part of

Implement outlier detection in RCF

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    To triage

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions