Skip to content

Improve robustness of numeric data distribution visualization #147

@adhit-r

Description

@adhit-r

Summary

The numeric data distribution helper (plot_numeric_distributions) introduced in PR #145 works well for basic use cases, but there are several opportunities to make it more robust and user-friendly for ML analysis workflows.

Tasks

  • Add explicit validation for the columns argument:
    • Ensure all requested columns exist in df.columns.
    • Optionally enforce or warn when non-numeric columns are requested.
    • Raise clear ValueError messages for invalid columns.
  • Improve subplot layout behavior:
    • Avoid creating unused extra axes for small numbers of columns (e.g. 1 column resulting in a 1x2 grid).
    • Keep the layout readable for larger numbers of numeric features.
  • Document headless usage considerations:
    • Clarify in the docstring how to use the function safely in CI or non-GUI environments (e.g. using a non-interactive matplotlib backend).
  • Consider re-exporting plot_numeric_distributions from the apps/ml/visualizations/__init__.py module if this is intended as the public API surface.

Acceptance Criteria

  • Calling plot_numeric_distributions with invalid or non-existent column names produces clear, actionable error messages.
  • Subplot layouts are sensible for 1, 2, and many numeric columns.
  • The docstring documents headless/CI usage expectations.
  • Existing tests continue to pass, and additional tests are added where helpful (e.g. column validation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai/mlAI and machine learning featuresbugSomething isn't workinggood first issueGood for newcomers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions