Refactor dataset loaders into dedicated module by neuralsorcerer · Pull Request #357 · facebookresearch/balance

neuralsorcerer · 2026-03-05T00:42:08Z

No description provided.

Copilot

Pull request overview

Refactors the simulated dataset loader functions into a dedicated balance.datasets.loading_data module while preserving the existing balance.datasets public API via re-exports.

Changes:

Added balance/datasets/loading_data.py containing load_sim_data, load_cbps_data, and load_data.
Updated balance/datasets/__init__.py to re-export the loader functions from the new module.
Added a changelog entry documenting the refactor.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
balance/datasets/loading_data.py	New dedicated module housing dataset loader implementations and their docstrings.
balance/datasets/init.py	Re-exports loaders to preserve `balance.datasets.*` public API.
CHANGELOG.md	Documents the refactor under “Code Quality & Refactoring”.

Comments suppressed due to low confidence (4)

balance/datasets/loading_data.py:33

The docstring says only sample_df has a happiness column, but the implementation also sets target_df["happiness"]. Please update the description to match the actual returned schema (or adjust the code if the intent is for only one dataframe to include the outcome).

    Version 01 returns two dataframes containing the columns gender ("Male", "Female" and nan),
    age_group ("18-24", "25-34", "35-44", "45+"), income (some numbers from a normal distribution), and id.
    The sample_df also has a column called happiness with a value from 0 to 100 that depends on the covariates.

balance/datasets/loading_data.py:48

The Returns: section documents Tuple[pd.DataFrame, pd.DataFrame], but the function can return (None, None) for unsupported versions (and the type annotation already reflects that). Please update the docstring return type/description to include the None case (or raise an explicit error instead of returning None).

    Returns:
        Tuple[pd.DataFrame, pd.DataFrame]: Two DataFrames containing simulated data for the target and sample of interest.
    """

balance/datasets/loading_data.py:153

The load_data docstring says it returns two DataFrames, but the function returns (None, None) when source is unrecognized. Please align the documented return type/behavior with the implementation (e.g., document the None case or raise a ValueError for invalid source).

    Returns:
        Tuple[pd.DataFrame, pd.DataFrame]: The first dataframe contains simulated data of the "target" and the second dataframe contains simulated data of the "sample".
    """

balance/datasets/loading_data.py:123

The load_cbps_data docstring says "You can view the structure ... by looking at the example below", but no example is included. Please either add the example (the previous version had one) or remove this line to avoid misleading documentation.

    """Load simulated data for CBPS comparison with R.

    The code in balance that implements CBPS attempts to mimic the code from the R package CBPS (https://cran.r-project.org/web/packages/CBPS/).

    In the help page of the CBPS function in R (i.e.: `?CBPS::CBPS`) there is a simulated dataset that is used to showcase the CBPS function.
    The output of that simulated dataset is saved in balance in order to allow for comparison of `balance` (Python) with `CBPS` (R).

    You can view the structure of the simulated dataset by looking at the example below.

talgalili

LGTM

meta-codesync · 2026-03-05T08:58:25Z

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D95347392.

meta-codesync · 2026-03-05T13:02:09Z

@talgalili merged this pull request in c7b2e9f.

Refactor dataset loaders into dedicated module

cb779fa

Copilot AI review requested due to automatic review settings March 5, 2026 00:42

meta-cla bot added the cla signed label Mar 5, 2026

Copilot started reviewing on behalf of neuralsorcerer March 5, 2026 00:42 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Fix lints

5ed10d0

neuralsorcerer added this to the balance 0.17.0 milestone Mar 5, 2026

neuralsorcerer requested a review from talgalili March 5, 2026 00:54

talgalili approved these changes Mar 5, 2026

View reviewed changes

meta-codesync bot closed this in c7b2e9f Mar 5, 2026

facebook-github-bot added the Merged label Mar 5, 2026

neuralsorcerer deleted the refct branch March 5, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dataset loaders into dedicated module#357

Refactor dataset loaders into dedicated module#357
neuralsorcerer wants to merge 2 commits intofacebookresearch:mainfrom
neuralsorcerer:refct

neuralsorcerer commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

talgalili left a comment

Uh oh!

meta-codesync bot commented Mar 5, 2026

Uh oh!

meta-codesync bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

neuralsorcerer commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

talgalili left a comment

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Mar 5, 2026

Uh oh!

meta-codesync bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants