Summary
The numeric data distribution helper (plot_numeric_distributions) introduced in PR #145 works well for basic use cases, but there are several opportunities to make it more robust and user-friendly for ML analysis workflows.
Tasks
- Add explicit validation for the
columns argument:
- Ensure all requested columns exist in
df.columns.
- Optionally enforce or warn when non-numeric columns are requested.
- Raise clear
ValueError messages for invalid columns.
- Improve subplot layout behavior:
- Avoid creating unused extra axes for small numbers of columns (e.g. 1 column resulting in a 1x2 grid).
- Keep the layout readable for larger numbers of numeric features.
- Document headless usage considerations:
- Clarify in the docstring how to use the function safely in CI or non-GUI environments (e.g. using a non-interactive matplotlib backend).
- Consider re-exporting
plot_numeric_distributions from the apps/ml/visualizations/__init__.py module if this is intended as the public API surface.
Acceptance Criteria
- Calling
plot_numeric_distributions with invalid or non-existent column names produces clear, actionable error messages.
- Subplot layouts are sensible for 1, 2, and many numeric columns.
- The docstring documents headless/CI usage expectations.
- Existing tests continue to pass, and additional tests are added where helpful (e.g. column validation).
Summary
The numeric data distribution helper (
plot_numeric_distributions) introduced in PR #145 works well for basic use cases, but there are several opportunities to make it more robust and user-friendly for ML analysis workflows.Tasks
columnsargument:df.columns.ValueErrormessages for invalid columns.plot_numeric_distributionsfrom theapps/ml/visualizations/__init__.pymodule if this is intended as the public API surface.Acceptance Criteria
plot_numeric_distributionswith invalid or non-existent column names produces clear, actionable error messages.