Add distribution diagnostics for BalanceDF#265
Add distribution diagnostics for BalanceDF#265neuralsorcerer wants to merge 5 commits intofacebookresearch:mainfrom
Conversation
neuralsorcerer
commented
Jan 15, 2026
- Added weighted EMD/CVMD/KS computation helpers and comparison functions in the weighted stats module.
- Exposed EMD/CVMD/KS BalanceDF helper methods and public comparison APIs for linked samples and direct targets.
- Added appropiate tests for EMD/CVMD/KS covering identical distributions, weighted effects, expected discrete/numeric values, validation errors, and NA-indicator skipping.
There was a problem hiding this comment.
Pull request overview
This PR adds three new distribution comparison metrics (Earth Mover's Distance, Cramér-von Mises distance, and Kolmogorov-Smirnov statistic) to the balance package for comparing adjusted samples to target populations. The implementation follows the established patterns for ASMD and KLD.
Changes:
- Added helper functions and three new comparison metrics (EMD, CVMD, KS) in weighted_comparisons_stats module
- Exposed EMD, CVMD, and KS methods on the BalanceDF class for both linked samples and direct target comparisons
- Added comprehensive tests covering identical distributions, weighted effects, expected values, validation errors, and NA-indicator handling
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| balance/stats_and_plots/weighted_comparisons_stats.py | Implemented helper functions for weighted distributions and three new comparison metrics (emd, cvmd, ks) with validation |
| balance/balancedf_class.py | Added static wrapper methods and public APIs for emd, cvmd, ks on BalanceDF; removed TODO comments for these methods |
| tests/test_stats_and_plots.py | Added tests for emd, cvmd, ks covering identical distributions, weighted effects, expected values, validation, and NA skipping |
| CHANGELOG.md | Added changelog entry for the new distribution diagnostic features |
Comments suppressed due to low confidence (1)
balance/stats_and_plots/weighted_comparisons_stats.py:32
- This TODO comment is now obsolete since wasserstein_distance has been imported on line 27. The comment should be removed.
# TODO: add?
# from scipy.stats import wasserstein_distance
|
Very cool! |
|
not: Also, in the docstring, please add links to wikipedia for each method added |
@talgalili Updated :) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
balance/stats_and_plots/weighted_comparisons_stats.py:36
- This TODO comment is obsolete since
wasserstein_distanceis already imported at line 31. The commented import and TODO should be removed to avoid confusion.
# TODO: add?
# from scipy.stats import wasserstein_distance
@neuralsorcerer could you please fix this nit? |
Updated :) |
|
@talgalili has imported this pull request. If you are a Meta employee, you can view this in D90854392. |
|
@talgalili I think we’re in a good place to bump the library to 0.15.0. There have been a lot of significant improvements made. WDYT? |
Fair suggestion. WDYT? |
|
@talgalili merged this pull request in c17914a. |
I’ll work on adding CLI docstrings next, but setting up a code-coverage badge will be a bit more challenging for me as I believe we need to use an external service like coveralls or codecov (both are free to use for open-source projects) which would require extra permissions and stuff. |