Skip to content

Extract common pattern in comparison stat functions to reduce CCN#267

Closed
Copilot wants to merge 5 commits intomainfrom
copilot/generalize-balance-functions
Closed

Extract common pattern in comparison stat functions to reduce CCN#267
Copilot wants to merge 5 commits intomainfrom
copilot/generalize-balance-functions

Conversation

Copy link
Contributor

Copilot AI commented Jan 16, 2026

  • Create a generic helper function _apply_comparison_stat_to_BalanceDF to reduce code duplication
  • Update _asmd_BalanceDF to use the new helper function
  • Update _kld_BalanceDF to use the new helper function
  • Update _emd_BalanceDF to use the new helper function
  • Update _cvmd_BalanceDF to use the new helper function
  • Update _ks_BalanceDF to use the new helper function
  • Run tests to ensure all functionality remains unchanged (all 83 tests in test_balancedf.py pass)
  • Run code review and security checks (no issues found)
  • Add direct test coverage for _kld_BalanceDF, _emd_BalanceDF, _cvmd_BalanceDF, and _ks_BalanceDF (5 new tests added)
  • Fix flake8 linting errors (removed trailing whitespace from blank lines)
  • Fix ufmt formatting errors (formatted test file with ufmt)

Summary

Successfully extracted the common pattern from five similar functions (_asmd_BalanceDF, _kld_BalanceDF, _emd_BalanceDF, _cvmd_BalanceDF, _ks_BalanceDF) into a single generic helper function _apply_comparison_stat_to_BalanceDF.

Changes Made

  • Added Callable to the imports from typing module
  • Created _apply_comparison_stat_to_BalanceDF helper function that:
    1. Validates inputs are BalanceDF objects
    2. Extracts df and weights from both objects
    3. Calls the comparison function with the extracted data
  • Refactored all five comparison functions to use the helper (reduced from ~30 lines to ~5 lines each)
  • Maintains special handling for _asmd_BalanceDF which passes std_type="target" via kwargs
  • Added comprehensive test coverage for all comparison methods:
    • test_BalanceDF__kld_BalanceDF: Direct test of _kld_BalanceDF method
    • test_BalanceDF__emd_BalanceDF: Direct test of _emd_BalanceDF method
    • test_BalanceDF__cvmd_BalanceDF: Direct test of _cvmd_BalanceDF method
    • test_BalanceDF__ks_BalanceDF: Direct test of _ks_BalanceDF method
    • test_BalanceDF_comparison_functions_invalid_input: Tests input validation for all methods
  • Fixed flake8 linting errors: Removed trailing whitespace from blank lines in test file
  • Fixed ufmt formatting: Formatted test file according to project standards (black + usort)

Test Coverage

  • All tests now directly exercise the helper function through the four comparison methods
  • Tests verify correct Series output with expected keys (a, b, mean(metric))
  • Tests verify mathematical properties (non-negativity, bounded ranges)
  • Tests verify aggregate_by_main_covar parameter works
  • Tests verify proper input validation with clear error messages
  • Total tests: 88 (83 original + 5 new), all passing
  • Code quality compliance: All linting (flake8) and formatting (ufmt) checks pass

Benefits

  • Reduces Cyclomatic Complexity Number (CCN) - the original goal of the issue
  • Eliminates code duplication - DRY principle applied
  • Easier maintenance - future changes only need to be made in one place
  • Type safety - added proper type hint for the callable parameter
  • No behavioral changes - all 83 existing tests pass without modification
  • Comprehensive test coverage - direct tests for all comparison methods and edge cases
  • Code quality - passes all linting and formatting checks
Original prompt

This section details on the original issue you should resolve

<issue_title>[FEATURE] generalize functions in ‎balance/balancedf_class.py‎</issue_title>
<issue_description>a bunch of functions in
‎balance/balancedf_class.py‎

Follow the exact same pattern (other than one word change:
@staticmethod
def _emd_BalanceDF(
sample_BalanceDF: "BalanceDF",
target_BalanceDF: "BalanceDF",
aggregate_by_main_covar: bool = False,
) -> pd.Series:
"""Run EMD on two BalanceDF objects.

    Prepares the BalanceDF objects by passing them through :func:`_get_df_and_weights`, and
    then passes the df and weights into :func:`weighted_comparisons_stats.emd`.

    Args:
        sample_BalanceDF (BalanceDF): Object.
        target_BalanceDF (BalanceDF): Object.
        aggregate_by_main_covar (bool, optional): See :func:`weighted_comparisons_stats.emd`. Defaults to False.

    Returns:
        pd.Series: See :func:`weighted_comparisons_stats.emd`.
    """
    BalanceDF._check_if_not_BalanceDF(sample_BalanceDF, "sample_BalanceDF")
    BalanceDF._check_if_not_BalanceDF(target_BalanceDF, "target_BalanceDF")

    sample_df_values, sample_weights = sample_BalanceDF._get_df_and_weights()
    target_df_values, target_weights = target_BalanceDF._get_df_and_weights()

    return weighted_comparisons_stats.emd(
        sample_df_values,
        target_df_values,
        sample_weights,
        target_weights,
        aggregate_by_main_covar=aggregate_by_main_covar,
    )

Extract this pattern to a helper function to reduce CCN.

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…r function

Co-authored-by: talgalili <976006+talgalili@users.noreply.github.com>
Copilot AI changed the title [WIP] Generalize functions in balance/balancedf_class.py Extract common pattern in comparison stat functions to reduce CCN Jan 16, 2026
Copilot AI requested a review from talgalili January 16, 2026 15:50
@talgalili talgalili marked this pull request as ready for review January 16, 2026 16:19
Copilot AI review requested due to automatic review settings January 16, 2026 16:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors five comparison statistic methods (_asmd_BalanceDF, _kld_BalanceDF, _emd_BalanceDF, _cvmd_BalanceDF, _ks_BalanceDF) in the BalanceDF class to reduce code duplication and cyclomatic complexity by extracting a common pattern into a new helper method _apply_comparison_stat_to_BalanceDF.

Changes:

  • Introduced _apply_comparison_stat_to_BalanceDF helper that centralizes validation, data extraction, and delegation to comparison functions
  • Refactored five static methods to use the helper, reducing each from ~15 lines to ~5 lines
  • Added Callable to the typing imports for proper type hinting

@talgalili
Copy link
Contributor

@copilot apply changes based on the comments in this thread

…lanceDF, and _ks_BalanceDF

Co-authored-by: talgalili <976006+talgalili@users.noreply.github.com>
@talgalili
Copy link
Contributor

@copilot

Fix

Run flake8 .
flake8 .
shell: /usr/bin/bash -e {0}
env:
pythonLocation: /opt/hostedtoolcache/Python/3.12.12/x64
PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib/pkgconfig
Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib
./tests/test_balancedf.py:1311:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1317:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1319:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1325:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1328:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1340:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1346:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1348:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1354:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1357:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1369:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1375:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1377:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1383:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1386:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1398:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1404:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1406:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1412:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1416:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1428:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1431:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1434:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1437:1: W293 blank line contains whitespace
./tests/test_balancedf.py:1440:1: W293 blank line contains whitespace
Error: Process completed with exit code 1.

Co-authored-by: talgalili <976006+talgalili@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 16, 2026

@copilot

Fix
...

Fixed flake8 errors by removing trailing whitespace from blank lines in commit cbb7a17. All 25 W293 warnings have been resolved.

@talgalili
Copy link
Contributor

@copilot fix

Run ufmt check .
ufmt check .
shell: /usr/bin/bash -e {0}
env:
pythonLocation: /opt/hostedtoolcache/Python/3.12.12/x64
PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib/pkgconfig
Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.12/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.12.12/x64/lib
Would format /home/runner/work/balance/balance/tests/test_balancedf.py
✨ 1 file would be formatted, 53 files already formatted ✨
Error: Process completed with exit code 1.

Co-authored-by: talgalili <976006+talgalili@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 16, 2026

@copilot fix

Run ufmt check .
...

Fixed ufmt formatting errors by running ufmt format on the test file in commit 6b32a40. The file is now properly formatted according to the project's black + usort standards.

@meta-codesync
Copy link

meta-codesync bot commented Jan 16, 2026

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D90870323.

@meta-codesync
Copy link

meta-codesync bot commented Jan 16, 2026

@talgalili merged this pull request in c7fcb79.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] generalize functions in ‎balance/balancedf_class.py‎

4 participants