Add feature analysis CLI and learn-weights enhancements#58
Add feature analysis CLI and learn-weights enhancements#58iankchristie wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a feature-correlation analysis workflow to the existing learn-weights pipeline/CLI so users can inspect correlated layers (clusters/plots) and optionally include/exclude specific attributes before training.
Changes:
- Introduces
reVeal/feature_analysis.pyfor correlation computation, hierarchical clustering, plotting, and exclusion suggestions. - Extends
LearnWeightsConfig+learn-weightsCLI with include/exclude attribute selection and analysis/analyze-only options. - Wires analysis into
run_learn_weights(with progress callbacks) and adds new/updated tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
reVeal/feature_analysis.py |
New correlation + clustering + plotting utilities and output saving helpers. |
reVeal/learn_weights.py |
Adds progress callback support and optionally runs feature analysis before training (or analysis-only). |
reVeal/config/learn_weights.py |
Adds new config fields and validators for attribute selection + analyze-only behavior. |
reVeal/cli/learn_weights.py |
Adds CLI params, progress printing, and analysis artifact saving to analysis/. |
tests/test_feature_analysis.py |
New unit tests for correlation, clustering, plotting, and output saving. |
tests/test_learn_weights.py |
Adds tests for include/exclude validation and analyze/analyze-only behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
adcfde0 to
6afb54f
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #58 +/- ##
==========================================
+ Coverage 81.62% 81.84% +0.22%
==========================================
Files 20 23 +3
Lines 1518 1746 +228
Branches 200 219 +19
==========================================
+ Hits 1239 1429 +190
- Misses 236 269 +33
- Partials 43 48 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add `analyze-features` CLI command for computing feature correlation analysis (Spearman/Pearson), hierarchical clustering (Ward's linkage), and generating exclusion suggestions on normalized grids - Add reVeal/feature_analysis.py module with correlation matrix, cluster computation, dendrogram/heatmap plotting, and redundancy detection - Add configurable attribute selection (explicit list or exclude-based filtering) for `learn_weights` CLI. - Output analysis/ subdirectory with correlation_matrix.csv, feature_clusters.json, dendrogram.png, correlation_heatmap.png, and suggested_exclusions.json - Add matplotlib dependency to pyproject.toml and environment.yml - Add CLI docs for learn-weights and analyze-features - Add tests for feature_analysis (17 tests) and analyze_features (12 tests)
d0e5419 to
aca9f1b
Compare
ppinchuk
left a comment
There was a problem hiding this comment.
Question about progress printing. Everything else looks good
9224358 to
a26876d
Compare
a26876d to
d4352e1
Compare
The learn_weights CLI tool derives weights for all data layers given a grid and label set. This is a problem because some data layers might be highly correlated or the user might want to manually exclude data layers given priors. This CLI tool allows the user to iteratively analyze and exclude fields from before it goes into the learn_weights procedure.
Changes:
- Add
analyze-featuresCLI command for computing feature correlationanalysis (Spearman/Pearson), hierarchical clustering (Ward's linkage),
and generating exclusion suggestions on normalized grids
- Add reVeal/feature_analysis.py module with correlation matrix, cluster
computation, dendrogram/heatmap plotting, and redundancy detection
- Add configurable attribute selection (explicit list or exclude-based
filtering) for
learn_weightsCLI.- Output analysis/ subdirectory with correlation_matrix.csv,
feature_clusters.json, dendrogram.png, correlation_heatmap.png,
and suggested_exclusions.json
- Add matplotlib dependency to pyproject.toml and environment.yml
- Add CLI docs for learn-weights and analyze-features
- Add tests for feature_analysis (17 tests) and analyze_features (12 tests)
Example Correlation Map:

Example Dendrogram:
