Skip to content

Add feature analysis CLI and learn-weights enhancements#58

Open
iankchristie wants to merge 2 commits into
mainfrom
ichristi/feature-engineering
Open

Add feature analysis CLI and learn-weights enhancements#58
iankchristie wants to merge 2 commits into
mainfrom
ichristi/feature-engineering

Conversation

@iankchristie
Copy link
Copy Markdown
Collaborator

@iankchristie iankchristie commented May 28, 2026

The learn_weights CLI tool derives weights for all data layers given a grid and label set. This is a problem because some data layers might be highly correlated or the user might want to manually exclude data layers given priors. This CLI tool allows the user to iteratively analyze and exclude fields from before it goes into the learn_weights procedure.

Changes:
- Add analyze-features CLI command for computing feature correlation
analysis (Spearman/Pearson), hierarchical clustering (Ward's linkage),
and generating exclusion suggestions on normalized grids
- Add reVeal/feature_analysis.py module with correlation matrix, cluster
computation, dendrogram/heatmap plotting, and redundancy detection
- Add configurable attribute selection (explicit list or exclude-based
filtering) for learn_weights CLI.
- Output analysis/ subdirectory with correlation_matrix.csv,
feature_clusters.json, dendrogram.png, correlation_heatmap.png,
and suggested_exclusions.json
- Add matplotlib dependency to pyproject.toml and environment.yml
- Add CLI docs for learn-weights and analyze-features
- Add tests for feature_analysis (17 tests) and analyze_features (12 tests)

Example Correlation Map:
correlation_heatmap

Example Dendrogram:
dendrogram

@iankchristie iankchristie requested a review from ppinchuk as a code owner May 28, 2026 21:06
Copilot AI review requested due to automatic review settings May 28, 2026 21:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a feature-correlation analysis workflow to the existing learn-weights pipeline/CLI so users can inspect correlated layers (clusters/plots) and optionally include/exclude specific attributes before training.

Changes:

  • Introduces reVeal/feature_analysis.py for correlation computation, hierarchical clustering, plotting, and exclusion suggestions.
  • Extends LearnWeightsConfig + learn-weights CLI with include/exclude attribute selection and analysis/analyze-only options.
  • Wires analysis into run_learn_weights (with progress callbacks) and adds new/updated tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
reVeal/feature_analysis.py New correlation + clustering + plotting utilities and output saving helpers.
reVeal/learn_weights.py Adds progress callback support and optionally runs feature analysis before training (or analysis-only).
reVeal/config/learn_weights.py Adds new config fields and validators for attribute selection + analyze-only behavior.
reVeal/cli/learn_weights.py Adds CLI params, progress printing, and analysis artifact saving to analysis/.
tests/test_feature_analysis.py New unit tests for correlation, clustering, plotting, and output saving.
tests/test_learn_weights.py Adds tests for include/exclude validation and analyze/analyze-only behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread reVeal/feature_analysis.py
Comment thread reVeal/feature_analysis.py
Comment thread reVeal/feature_analysis.py
Comment thread reVeal/learn_weights.py Outdated
Comment thread reVeal/cli/learn_weights.py Outdated
@iankchristie iankchristie changed the title Add feature analysis and include/exclude attribute selection DNR: Add feature analysis and include/exclude attribute selection May 28, 2026
@iankchristie iankchristie force-pushed the ichristi/feature-engineering branch from adcfde0 to 6afb54f Compare May 28, 2026 21:31
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

❌ Patch coverage is 80.97166% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.84%. Comparing base (f868d47) to head (d4352e1).

Files with missing lines Patch % Lines
reVeal/learn_weights.py 32.14% 19 Missing ⚠️
reVeal/cli/analyze_features.py 73.13% 17 Missing and 1 partial ⚠️
reVeal/feature_analysis.py 94.57% 4 Missing and 3 partials ⚠️
reVeal/config/learn_weights.py 71.42% 1 Missing and 1 partial ⚠️
reVeal/cli/learn_weights.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   81.62%   81.84%   +0.22%     
==========================================
  Files          20       23       +3     
  Lines        1518     1746     +228     
  Branches      200      219      +19     
==========================================
+ Hits         1239     1429     +190     
- Misses        236      269      +33     
- Partials       43       48       +5     
Flag Coverage Δ
unittests 81.84% <80.97%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add `analyze-features` CLI command for computing feature correlation
  analysis (Spearman/Pearson), hierarchical clustering (Ward's linkage),
  and generating exclusion suggestions on normalized grids
- Add reVeal/feature_analysis.py module with correlation matrix, cluster
  computation, dendrogram/heatmap plotting, and redundancy detection
- Add  configurable attribute selection (explicit list or exclude-based
  filtering) for `learn_weights` CLI.
- Output analysis/ subdirectory with correlation_matrix.csv,
  feature_clusters.json, dendrogram.png, correlation_heatmap.png,
  and suggested_exclusions.json
- Add matplotlib dependency to pyproject.toml and environment.yml
- Add CLI docs for learn-weights and analyze-features
- Add tests for feature_analysis (17 tests) and analyze_features (12 tests)
@iankchristie iankchristie force-pushed the ichristi/feature-engineering branch from d0e5419 to aca9f1b Compare May 28, 2026 22:48
@iankchristie iankchristie changed the title DNR: Add feature analysis and include/exclude attribute selection Add feature analysis CLI and learn-weights enhancements May 28, 2026
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about progress printing. Everything else looks good

Comment thread reVeal/cli/learn_weights.py Outdated
@iankchristie iankchristie force-pushed the ichristi/feature-engineering branch from 9224358 to a26876d Compare May 29, 2026 17:31
@iankchristie iankchristie force-pushed the ichristi/feature-engineering branch from a26876d to d4352e1 Compare May 29, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants