Skip to content

Add distribution diagnostics for BalanceDF#265

Closed
neuralsorcerer wants to merge 5 commits intofacebookresearch:mainfrom
neuralsorcerer:dist
Closed

Add distribution diagnostics for BalanceDF#265
neuralsorcerer wants to merge 5 commits intofacebookresearch:mainfrom
neuralsorcerer:dist

Conversation

@neuralsorcerer
Copy link
Collaborator

  • Added weighted EMD/CVMD/KS computation helpers and comparison functions in the weighted stats module.
  • Exposed EMD/CVMD/KS BalanceDF helper methods and public comparison APIs for linked samples and direct targets.
  • Added appropiate tests for EMD/CVMD/KS covering identical distributions, weighted effects, expected discrete/numeric values, validation errors, and NA-indicator skipping.

Copilot AI review requested due to automatic review settings January 15, 2026 14:03
@meta-cla meta-cla bot added the cla signed label Jan 15, 2026
@neuralsorcerer neuralsorcerer added this to the balance 0.15.0 milestone Jan 15, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds three new distribution comparison metrics (Earth Mover's Distance, Cramér-von Mises distance, and Kolmogorov-Smirnov statistic) to the balance package for comparing adjusted samples to target populations. The implementation follows the established patterns for ASMD and KLD.

Changes:

  • Added helper functions and three new comparison metrics (EMD, CVMD, KS) in weighted_comparisons_stats module
  • Exposed EMD, CVMD, and KS methods on the BalanceDF class for both linked samples and direct target comparisons
  • Added comprehensive tests covering identical distributions, weighted effects, expected values, validation errors, and NA-indicator handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
balance/stats_and_plots/weighted_comparisons_stats.py Implemented helper functions for weighted distributions and three new comparison metrics (emd, cvmd, ks) with validation
balance/balancedf_class.py Added static wrapper methods and public APIs for emd, cvmd, ks on BalanceDF; removed TODO comments for these methods
tests/test_stats_and_plots.py Added tests for emd, cvmd, ks covering identical distributions, weighted effects, expected values, validation, and NA skipping
CHANGELOG.md Added changelog entry for the new distribution diagnostic features
Comments suppressed due to low confidence (1)

balance/stats_and_plots/weighted_comparisons_stats.py:32

  • This TODO comment is now obsolete since wasserstein_distance has been imported on line 27. The comment should be removed.
# TODO: add?
# from scipy.stats import wasserstein_distance

@talgalili
Copy link
Contributor

Very cool!
Please address comment of copilot, and make sure you add examples in the docstring and in the intro tutorial.
Ping me once ready for review.

@talgalili
Copy link
Contributor

not: Also, in the docstring, please add links to wikipedia for each method added

@neuralsorcerer
Copy link
Collaborator Author

Very cool! Please address comment of copilot, and make sure you add examples in the docstring and in the intro tutorial. Ping me once ready for review.

@talgalili Updated :)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Copy link
Contributor

@talgalili talgalili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great - made some nits. Please review.

(also, please see feedback from copilot)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

balance/stats_and_plots/weighted_comparisons_stats.py:36

  • This TODO comment is obsolete since wasserstein_distance is already imported at line 31. The commented import and TODO should be removed to avoid confusion.
# TODO: add?
# from scipy.stats import wasserstein_distance

@talgalili
Copy link
Contributor

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)
balance/stats_and_plots/weighted_comparisons_stats.py:36

  • This TODO comment is obsolete since wasserstein_distance is already imported at line 31. The commented import and TODO should be removed to avoid confusion.
# TODO: add?
# from scipy.stats import wasserstein_distance

@neuralsorcerer could you please fix this nit?

@neuralsorcerer
Copy link
Collaborator Author

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)
balance/stats_and_plots/weighted_comparisons_stats.py:36

  • This TODO comment is obsolete since wasserstein_distance is already imported at line 31. The commented import and TODO should be removed to avoid confusion.
# TODO: add?
# from scipy.stats import wasserstein_distance

@neuralsorcerer could you please fix this nit?

Updated :)

Copy link
Contributor

@talgalili talgalili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thank you very much!
I will send this to internal review.

@meta-codesync
Copy link

meta-codesync bot commented Jan 16, 2026

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D90854392.

@neuralsorcerer
Copy link
Collaborator Author

@talgalili I think we’re in a good place to bump the library to 0.15.0. There have been a lot of significant improvements made.

WDYT?

@talgalili
Copy link
Contributor

@talgalili I think we’re in a good place to bump the library to 0.15.0. There have been a lot of significant improvements made.

WDYT?

Fair suggestion.
I moved a bunch of issues to 0.16.0.
What I'd love to include in 0.15 is the docstring for cli, and a badge on code coverage (feel free to give me pointers, and I can also do it myself).
Once both are done, I'm happy to wrap up version 0.15.

WDYT?

@meta-codesync meta-codesync bot closed this in c17914a Jan 16, 2026
@meta-codesync
Copy link

meta-codesync bot commented Jan 16, 2026

@talgalili merged this pull request in c17914a.

@neuralsorcerer
Copy link
Collaborator Author

Fair suggestion. I moved a bunch of issues to 0.16.0. What I'd love to include in 0.15 is the docstring for cli, and a badge on code coverage (feel free to give me pointers, and I can also do it myself). Once both are done, I'm happy to wrap up version 0.15.

WDYT?

I’ll work on adding CLI docstrings next, but setting up a code-coverage badge will be a bit more challenging for me as I believe we need to use an external service like coveralls or codecov (both are free to use for open-source projects) which would require extra permissions and stuff.

@neuralsorcerer neuralsorcerer deleted the dist branch January 16, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants