0.17.0 (TBD)

Breaking Changes

CLI: unmentioned columns now go to ignore_columns instead of outcome_columns
- Previously, when --outcome_columns was not explicitly set, all columns that were not the id, weight, or a covariate were automatically classified as outcome columns. Now those columns are placed into ignore_columns instead.
- Columns that are explicitly mentioned — the id column, weight column, covariate columns, and outcome columns — are not ignored.

Documentation

Improved keep_columns documentation
- Updated docstrings for has_keep_columns(), keep_columns(), and the --keep_columns argument to clarify that keep columns control which columns appear in the final output CSV. Keep columns that are not id, weight, covariate, or outcome columns will be placed into ignore_columns during processing but are still retained and available in the output.

Bug Fixes

Weight diagnostics now consistently accept DataFrame inputs
- design_effect, nonparametric_skew, prop_above_and_below, and weighted_median_breakdown_point now explicitly normalize DataFrame inputs to their first column before computation, matching validation behavior and returning scalar/Series outputs consistently.

Tests

Expanded warning coverage for Sample.from_frame() ID inference
- Added assertions that validate all three expected warnings are emitted when inferring an id column and default weights, including ID guessing, ID string casting, and automatic weight creation.
Added focused unit coverage for IPW helpers
- Added tests for link_transform(), and calc_dev() to validate behavior for extreme probabilities, and finite 10-fold deviance summaries.

0.16.0 (2026-02-09)

New Features

Outcome weight impact diagnostics
- Added paired outcome-weight impact tests (y*w0 vs y*w1) with confidence intervals.
- Exposed in BalanceDFOutcomes, Sample.diagnostics(), and the CLI via --weights_impact_on_outcome_method.
Pandas 3 support
- Updated compatibility and tests for pandas 3.x
Categorical distribution metrics without one-hot encoding
- KLD/EMD/CVMD/KS on BalanceDF.covars() now operate on raw categorical variables (with NA indicators) instead of one-hot encoded columns.
Misc
- Raw-covariate adjustment for custom models
  - Sample.adjust() now supports fitting models on raw covariates (without a model matrix) for IPW via use_model_matrix=False. String, object, and boolean columns are converted to pandas Categorical dtype, allowing sklearn estimators with native categorical support (e.g., HistGradientBoostingClassifier with categorical_features="from_dtype") to handle them correctly. Requires scikit-learn >= 1.4 when categorical columns are present.
- Validate weights include positive values
  - Added a guard in weight diagnostics to error when all weights are zero.
- Support configurable ID column candidates
  - Sample.from_frame() and guess_id_column() now accept candidate ID column names when auto-detecting the ID column.
- Formula support for BalanceDF model matrices
  - BalanceDF.model_matrix() now accepts a formula argument to build custom model matrices without precomputing them manually.

Bug Fixes

Removed deprecated setup build
- Replaced deprecated setup.py with pyproject.toml build in CI to avoid build failure.
Hardened ID column candidate validation
- guess_id_column() now ignores duplicate candidate names and validates that candidates are non-empty strings.
Hardened pandas 3 compatibility paths
- Updated string/NA handling and discrete checks for pandas 3 dtypes, and refreshed tests to accept string-backed dtypes.

Packaging & Tests

Pandas 3.x compatibility
- Expanded the pandas dependency range to allow pandas 3.x releases.
Direct util imports in tests
- Refactored util test modules to import helpers directly from their modules instead of via balance_util.

Breaking Changes

Require positive weights for weight diagnostics that normalize or aggregate
- design_effect, nonparametric_skew, prop_above_and_below, and weighted_median_breakdown_point now raise a ValueError when all weights are zero.
- Migration: ensure your weights include at least one positive value before calling these diagnostics, or catch the ValueError if all-zero weights are possible in your workflow.

0.15.0 (2026-01-20)

New Features

Added EMD/CVMD/KS distribution diagnostics
- BalanceDF now exposes Earth Mover's Distance (EMD), Cramér-von Mises distance (CVMD), and Kolmogorov-Smirnov (KS) statistics for comparing adjusted samples to targets.
- These diagnostics support weighted or unweighted comparisons, apply discrete/continuous formulations, and respect aggregate_by_main_covar for one-hot categorical aggregation.
Exposed outcome columns selection in the CLI
- Added --outcome_columns to choose which columns are treated as outcomes instead of defaulting to all non-id/weight/covariate columns. Remaining columns are moved to ignored_columns.
Improved missing data handling in poststratify()
- poststratify() now accepts na_action to either drop rows with missing values or treat missing values as their own category during weighting.
- Breaking change: the default behavior now fills missing values in poststratification variables with "__NaN__" and treats this as a distinct category during weighting. Previously, missing values were not handled explicitly, and their treatment depended on pandas groupby and merge defaults. To approximate the legacy behavior where missing values do not form their own category, pass na_action="drop" explicitly.
Added formula support for descriptive_stats model matrices
- descriptive_stats() now accepts a formula argument that is always applied to the data (including numeric-only frames), letting callers control which terms and dummy variables are included in summary statistics.

Documentation

Documented the balance CLI
- Added full API docstrings for balance.cli and a new CLI tutorial notebook.
Created Balance CLI tutorial
- Added CLI command echoing, a load_data() example, and richer diagnostics exploration with metric/variable listings and a browsable diagnostics table. https://import-balance.org/docs/tutorials/balance_cli_tutorial/
Synchronized docstring examples with test cases
- Updated user-facing docstrings so the documented examples mirror tested inputs and outputs.

Code Quality & Refactoring

Added warning when the sample size of 'target' is much larger than 'sample' sample size
- Sample.adjust() now warns when the target exceeds 100k rows and is at least 10x larger than the sample, highlighting that uncertainty is dominated by the sample (akin to a one-sample comparison).
Split util helpers into focused modules
- Broke balance.util into balance.utils submodules for easier navigation.

Bug Fixes

Updated Sample.__str__() to format weight diagnostics like Sample.summary()
- Weight diagnostics (design effect, effective sample size proportion, effective sample size) are now displayed on separate lines instead of comma-separated on one line.
- Replaced "eff." abbreviations with full "effective" word for better readability.
- Improves consistency with Sample.summary() output format.
Numerically stable CBPS probabilities
- The CBPS helper now uses a stable logistic transform to avoid exponential overflow warnings during probability computation in constraint checks.
Silenced pandas observed default warning
- Explicitly sets observed=False in weighted categorical KLD calculations to retain current behavior and avoid future pandas default changes.
Fixed plot_qq_categorical to respect the weighted parameter for target data
- Previously, the target weights were always applied regardless of the weighted=False setting, causing inconsistent behavior between sample and target proportions in categorical QQ plots.
Restored CBPS tutorial plots
- Re-enabled scatter plots in the CBPS comparison tutorial notebook while avoiding GitHub Pages rendering errors and pandas colormap warnings. https://import-balance.org/docs/tutorials/comparing_cbps_in_r_vs_python_using_sim_data/
Clearer validation errors in adjustment helpers
- trim_weights() now accepts list/tuple inputs and reports invalid types explicitly.
- apply_transformations() raises clearer errors for invalid inputs and empty transformations.
Fixed model_matrix to drop NA rows when requested
- model_matrix(add_na=False) now actually drops rows containing NA values while preserving categorical levels, matching the documented behavior.
- Previously, add_na=False only logged a warning without dropping rows; code relying on the old behavior may now see fewer rows and should either handle missingness explicitly or use add_na=True.

Tests

Aligned formatting toolchain between Meta internal and GitHub CI
- Added ["fbcode/core_stats/balance"] override to Meta's internal tools/lint/pyfmt/config.toml to use formatter = "black" and sorter = "usort".
- This ensures both internal (pyfmt/arc lint) and external (GitHub Actions) environments use the same Black 25.1.0 formatter, eliminating formatting drift.
- Updated CI workflow, pre-commit config, and requirements-fmt.txt to use black==25.1.0.
Added Pyre type checking to GitHub Actions via .pyre_configuration.external and a new pyre job in the workflow. Tests are excluded due to external typeshed stub differences; library code is fully type-checked.
Added test coverage workflow and badge to README via .github/workflows/coverage.yml. The workflow collects coverage using pytest-cov, generates HTML and XML reports, uploads them as artifacts, and displays coverage metrics. A coverage badge is now shown in README.md alongside other workflow badges.
Improved test coverage for edge cases and error handling paths
- Added targeted tests for previously uncovered code paths across the library, addressing edge cases including empty inputs, verbose logging, error handling for invalid parameters, and boundary conditions in weighting methods (IPW, CBPS, rake).
- Tests exercise defensive code paths that handle empty DataFrames, NaN convergence values, invalid model types, and non-convergence warnings.
Split test_util.py into focused test modules
- Split the large test_util.py file (2325 lines) into 5 modular test files that mirror the balance/utils/ structure:
  - test_util_data_transformation.py - Tests for data transformation utilities
  - test_util_input_validation.py - Tests for input validation utilities
  - test_util_model_matrix.py - Tests for model matrix utilities
  - test_util_pandas_utils.py - Tests for pandas utilities (including high cardinality warnings)
  - test_util_logging_utils.py - Tests for logging utilities
- This improves test organization and makes it easier to locate tests for specific utilities.

Contributors

@neuralsorcerer, @talgalili

0.14.0 (2025-12-14)

New Features

Enhanced adjusted sample summary output
- Sample.__str__() now displays adjustment details (method, trimming parameters, design effect, effective sample size) when printing adjusted samples (#194, #57).
Richer Sample.summary() diagnostics
- Adjusted sample summary now groups covariate diagnostics, reports design effect alongside ESSP/ESS, and surfaces weighted outcome means when available.
Warning of high-cardinality categorical features in .adjust()
- Categorical features where ≥80% of values are unique are flagged before weight fitting to help identify problematic columns like user IDs (#195, #65).
Ignored column handling for Sample inputs
- Sample.from_frame accepts ignore_columns for columns that should remain on the dataframe but be excluded from covariates and outcome statistics. Ignored columns appear in Sample.df and can be retrieved via Sample.ignored_columns().

Code Quality & Refactoring

Consolidated diagnostics helpers
- Added _concat_metric_val_var() helper and balance.util._coerce_scalar for robust diagnostics row construction and scalar-to-float conversion.
- Breaking change: Sample.diagnostics() for IPW now always emits iteration/intercept summaries plus hyperparameter settings.

Bug Fixes

Early validation of null weight inputs
- Sample.from_frame now raises ValueError when weights contain None, NaN, or pd.NA values with count and preview of affected rows.
Percentile weight trimming across platforms
- trim_weights() now computes thresholds via percentile quantiles with explicit clipping bounds for consistent behavior across Python/NumPy versions.
- Breaking change: percentile-based clipping may shift by roughly one observation at typical limits.
IPW diagnostics improvements
- Fixed multi_class reporting, normalized scalar hyperparameters to floats, removed deprecated penalty argument warnings, and deduplicated metric entries for stable counts across sklearn versions.

Tests

Added Windows and macOS CI testing support
- Expanded GitHub Actions to run on ubuntu-latest, macos-latest, and windows-latest for Python 3.9-3.14.
- Added tempfile_path() context manager for cross-platform temp file handling and configured matplotlib Agg backend via conftest.py.

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

0.13.0 (2025-12-02)

New Features

Propensity modeling beyond static logistic regression
- .adjust(method='ipw') now accepts any sklearn classifier via the model argument, enabling the use of models like random forests and gradient boosting while preserving all existing trimming and diagnostic features. Dense-only estimators and models without linear coefficients are fully supported. Propensity probabilities are stabilized to avoid numerical issues.
- Allow customization of logistic regression by passing a configured :class:~sklearn.linear_model.LogisticRegression instance through the model argument. Also, the CLI now accepts --ipw_logistic_regression_kwargs JSON to build that estimator directly for command-line workflows.
Covariate diagnostics
- Added KL divergence calculations for covariate comparisons (numeric and one-hot categorical), exposed via sample.covars().kld() alongside linked-sample aggregation support.
Weighting Methods
- rake() and poststratify() now honour weight_trimming_mean_ratio and weight_trimming_percentile, trimming and renormalising weights through the enhanced trim_weights(..., target_sum_weights=...) API so the documented parameters work as expected (#147).

Documentation

Added comprehensive post-stratification tutorial notebook (balance_quickstart_poststratify.ipynb) (#141, #142, #143).
Expanded poststratify docstring with clear examples and improved statistical methods documentation (#141).
Added project badges to README for build status, Python version support, and release tracking (#145).
Added example of using custom logistic regression and custom sklearn classifier usage in (balance_quickstart.ipynb).
Shorten the welcome message (for when importing the package).

Code Quality & Refactoring

Raking algorithm refactor
- Removed ipfn dependency and replaced with a vectorized NumPy implementation (_run_ipf_numpy) for iterative proportional fitting, resulting in significant performance improvements and eliminating external dependency (#135).
IPW method refactoring
- Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code patterns into reusable helper functions: _compute_deviance(), _compute_proportion_deviance(), _convert_to_dense_array().
- Removed manual ASMD improvement calculation and now uses existing compute_asmd_improvement() from weighted_comparisons_stats.py
Type safety improvements
- Migrated 32 Python files from # pyre-unsafe to # pyre-strict mode, covering core modules, statistics, weighting methods, datasets, and test files
- Modernized type hints to PEP 604 syntax (X | Y instead of Union[X, Y]) across 11 files for improved readability and Python 3.10+ alignment
- Type alias definitions in typing.py retain Union syntax for Python 3.9 compatibility
- Enhanced plotting function type safety with TypedDict definitions and proper type narrowing
- Replaced assert-based type narrowing with _assert_type() helper for better error messages and pyre-strict compliance
Renamed BalanceDF to BalanceDF****
- BalanceCovarsDF to BalanceDFCovars
- BalanceOutcomesDF to BalanceDFOutcomes
- BalanceWeightsDF to BalanceDFWeights

Bug Fixes

Utility Functions
- Fixed quantize() to preserve column ordering and use proper TypeError exceptions (#133)
Statistical Functions
- Fixed division by zero in asmd_improvement() when asmd_mean_before is zero, now returns 0.0 for 0% improvement
CLI & Infrastructure
- Replaced deprecated argparse FileType with pathlib.Path (#134)
Weight Trimming
- Fixed trim_weights() to consistently return pd.Series with dtype=np.float64 and preserve original index across both trimming methods
- Fixed percentile-based winsorization edge case: _validate_limit() now automatically adjusts limits to prevent floating-point precision issues (#144)
- Enhanced documentation for trim_weights() and _validate_limit() with clearer examples and explanations

Tests

Enhanced test coverage for weight trimming with test_trim_weights_return_type_consistency and 11 comprehensive tests for _validate_limit() covering edge cases, error conditions, and boundary conditions

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

0.12.1 (2025-11-03)

New Features

Added a welcome message when importing the package.

Documentation

Added 'CHANGELOG' to the docs website. https://import-balance.org/docs/docs/CHANGELOG/

Bug Fixes

Fixed plotly figures in all the tutorials. https://import-balance.org/docs/tutorials/

0.12.0 (2025-10-14)

New Features

Support for Python 3.13 + 3.14
- Update setup.py and CI/CD integration to include Python 3.13 and 3.14.
- Remove upper version constraints from numpy, pandas, scipy, and scikit-learn dependencies for Python 3.12+.

Contributors

@talgalili, @wesleytlee

0.11.0 (2025-09-24)

New Features

Python 3.12 support - Complete support for Python 3.12 alongside existing Python 3.9, 3.10, and 3.11 support (with CI/CD integration).
- Implemented Python version-specific dependency constraints - Added conditional version ranges for numpy, pandas, scipy, and scikit-learn that vary based on Python version (e.g., numpy>=1.21.0,<2.0 for Python <3.12, numpy>=1.24.0,<2.1 for Python >=3.12)
- Pandas compatibility improvements - Replaced value_counts(dropna=False) with groupby().size() in frequency table creation to avoid FutureWarning
- Fixed various pandas deprecation warnings and improved DataFrame handling
Improved raking algorithm - Completely refactored rake weighting from DataFrame-based to array-based ipfn algorithm using multi-dimensional arrays and itertools for better performance and compatibility with latest Python versions. Variables are now automatically alphabetized to ensure consistent results regardless of input order.
poststratify method enhancement - New strict_matching parameter (default True) handles cases where sample cells are not present in target data. When False, issues warning and assigns weight 0 to uncovered samples

Bug Fixes

Type annotations - Enhanced Pyre type hints throughout the codebase, particularly in utility functions
Sample class improvements - Fixed weight type assignment (ensuring float64 type), improved DataFrame manipulation with .infer_objects(copy=False) for pandas compatibility, and enhanced weight setting logic
Website dependencies - Updated various website dependencies including Docusaurus and related packages

Tests

Comprehensive test refactoring, including:

Enhanced test validation - Added detailed explanations of test methodologies and expected behaviors in docstrings
Improved test coverage - Tests now include edge cases like NaN handling, different data types, and error conditions
Improved test organization (more granular) across all test modules (test_stats_and_plots.py, test_balancedf.py, test_ipw.py, test_rake.py, test_cli.py, test_weighted_comparisons_plots.py, test_cbps.py, test_testutil.py, test_adjustment.py, test_util.py, test_sample.py)
Updated GitHub workflows to include Python 3.12 in build and test matrix
Fix 261 "pandas deprecation" warnings!
Added type annotations - Converted test_balancedf.py to pyre-strict with.

Documentation

GitHub issue template for support questions - Added structured template to help users ask questions about using the balance package

Contributors

@talgalili, @wesleytlee, @dependabot

0.10.0 (2025-01-06)

New Features

Dependency on glmnet has been removed, and the ipw method now uses sklearn.
The transition to sklearn should enable support for newer python versions (3.11) as well as the Windows OS!
ipw method uses logistic regression with L2-penalties instead of L1-penalties for computational reasons. The transition from glmnet to sklearn and use of L2-penalties will lead to slightly different generated weights compared to previous versions of Balance.
Unfortunately, the sklearn-based ipw method is generally slower than the previous version by 2-5x. Consider using the new arguments lambda_min, lambda_max, and num_lambdas for a more efficient search over the ipw penalization space.

Misc

Update license from GPL v2 to MIT license.
Updated Python and package compatibility. Balance is now compatible with Python 3.11, but no longer compatible with Python 3.8 due to typing errors. Balance is currently incompatible with Python 3.12 due to the removal of distutils.

Contributors

@wesleytlee, @talgalili, @SarigT

0.9.1 (2023-07-30)

Bug Fixes

Fix E721 flake8 issue (see: https://github.com/facebookresearch/balance/actions/runs/5704381365/job/15457952704)
Remove support for python 3.11 from release.yml

Documentation

Added links to presentation given at ISA 2023.
Fixed misc typos.

0.9.0 (2023-05-22)

News

Remove support for python 3.11 due to new test failures. This will be the case until glmnet will be replaced by sklearn. hopefully before end of year.

New Features

All plotly functions: add kwargs to pass arguments to update_layout in all plotly figures. This is useful to control width and height of the plot. For example, when wanting to save a high resolution of the image.
Add a summary methods to BalanceWeightsDF (i.e.: Sample.weights().summary()) to easily get access to summary statistics of the survey weights. Also, it means that Sample.diagnostics() now uses this new summary method in its internal implementation.
BalanceWeightsDF.plot method now relies on the default BalanceDF.plot method. This means that instead of a static seaborn kde plot we'll get an interactive plotly version.

Bug Fixes

datasets
- Remove a no-op in load_data and accommodate deprecation of pandas syntax by using a list rather than a set when selecting df columns (thanks @ahakso for the PR).
- Make the outcome variable (happiness) be properly displayed in the tutorials (so we can see the benefit of the weighting process). This included fixing the simulation code in the target.
Fix Sample.outcomes().summary() so it will output the ci columns without truncating them.

Documentation

Fix text based on updated from version 0.7.0 and 0.8.0.
- https://import-balance.org/docs/docs/general_framework/adjusting_sample_to_population/
Fix tutorials to include the outcome in the target.

Contributors

@talgalili, @SarigT, @ahakso

0.8.0 (2023-04-26)

New Features

Add rake method to .adjust (currently in beta, given that it doesn't handles marginal target as input).
Add a new function prepare_marginal_dist_for_raking - to take in a dict of marginal proportions and turn them into a pandas DataFrame. This can serve as an input target population for raking.

Misc

The ipw function now gets max_de=None as default (instead of 1.5). This version is faster, and the user can still choose a threshold as desired.
Adding hex stickers graphics files

Documentation

New section on raking.
New notebook (in the tutorial section):
- quickstart_rake - like the quickstart tutorial, but shows how to use the rake (raking) algorithm and compares the results to IPW (logistic regression with LASSO).

Contributors

@talgalili, @SarigT

0.7.0 (2023-04-10)

New Features

Add plotly_plot_density function: Plots interactive density plots of the given variables using kernel density estimation.
Modified plotly_plot_dist and plot_dist to also support 'kde' plots. Also, these are now the default options. This automatically percolates to BalanceDF.plot() methods.
Sample.from_frame can now guess that a column called "weights" is a weight column (instead of only guessing so if the column is called "weight").

Bug Fixes

Fix rm_mutual_nas: it now remembers the index of pandas.Series that were used as input. This fixed erroneous plots produced by seaborn functions which uses rm_mutual_nas.
Fix plot_hist_kde to work when dist_type = "ecdf"
Fix plot_hist_kde and plot_bar when having an input only with "self" and "target", by fixing _return_sample_palette.

Misc

All plotting functions moved internally to expect weight column to be called weight, instead of weights.
All adjust (ipw, cbps, poststratify, null) functions now export a dict with a key called weight instead of weights.

Contributors

@talgalili, @SarigT

0.6.0 (2023-04-05)

New Features

Variance of the weighted mean
- Add the var_of_weighted_mean function (from balance.stats_and_plots.weighted_stats import var_of_weighted_mean): Computes the variance of the weighted average (pi estimator for ratio-mean) of a list of values and their corresponding weights.
  - Added the var_of_mean option to stat in the descriptive_stats function (based on var_of_weighted_mean)
  - Added the .var_of_mean() method to BalanceDF.
- Add the ci_of_weighted_mean function (from balance.stats_and_plots.weighted_stats import ci_of_weighted_mean): Computes the confidence intervals of the weighted mean using the (just added) variance of the weighted mean.
  - Added the ci_of_mean option to stat in the descriptive_stats function (based on ci_of_weighted_mean). Also added kwargs support.
  - Added the .ci_of_mean() method to BalanceDF.
  - Added the .mean_with_ci() method to BalanceDF.
  - Updated .summary() methods to include the output of ci_of_mean.
All bar plots now have an added ylim argument to control the limits of the y axis. For example use: plot_dist(dfs1, names=["self", "unadjusted", "target"], ylim = (0,1)) Or this: s3_null.covars().plot(ylim = (0,1))
Improve 'choose_variables' function to control the order of the returned variables
- The return type is now a list (and not a Tuple)
- The order of the returned list is based on the variables argument. If it is not supplied, it is based on the order of the column names in the DataFrames. The df_for_var_order arg controls which df to use.
Misc
- The _prepare_input_model_matrix and downstream functions (e.g.: model_matrix, sample.outcomes().mean(), etc) can now handle DataFrame with special characters in the column names, by replacing special characters with '_' (or '_i', if we end up with columns with duplicate names). It also handles cases in which the column names have duplicates (using the new _make_df_column_names_unique function).
- Improve choose_variables to control the order of the returned variables
  - The return type is now a list (and not a Tuple)
  - The order of the returned list is based on the variables argument. If it is not supplied, it is based on column names in the DataFrames. The df_for_var_order arg controls which df to use.

Contributors

@talgalili, @SarigT

0.5.0 (2023-03-06)

New Features

The datasets.load_data function now also supports the input "sim_data_cbps", which loads the simulated data used in the CBPS R vs Python tutorial. It is also used in unit-testing to compare the CBPS weights produced from Python (i.e.: balance) with R (i.e.: the CBPS package). The testing shows how the correlation of the weights from the two implementations (both Pearson and Spearman) produce a correlation of >0.98.
cli improvements:
- Add an option to set formula (as string) in the cli.

Documentation

New notebook (in the tutorial section):
- Comparing results of fitting CBPS between R's CBPS package and Python's balance package (using simulated data). link

Contributors

@stevemandala, @talgalili, @SarigT

0.4.0 (2023-02-08)

New Features

Added two new flags to the cli:
- --standardize_types: This gives cli users the ability to set the standardize_types parameter in Sample.from_frame to True or False. To learn more about this parameter, see: https://import-balance.org/api_reference/html/balance.sample_class.html#balance.sample_class.Sample.from_frame
- --return_df_with_original_dtypes: the Sample object now stores the dtypes of the original df that was read using Sample.from_frame. This can be used to restore the original dtypes of the file output from the cli. This is relevant in cases in which we want to convert back the dtypes of columns from how they are stored in Sample, to their original types (e.g.: if something was Int32 it would be turned in float32 in balance.Sample, and using the new flag will return that column, when using the cli, to be back in the Int32 type). This feature may not be robust to various edge cases. So use with caution.
In the logging:
- Added warnings about dtypes changes. E.g.: if using Sample.from_frame with a column that has Int32, it will be turned into float32 in the internal storage of sample. Now there will be a warning message indicating of this change.
- Increase the default length of logger printing (from 500 to 2000)

Bug Fixes

Fix pandas warning: SettingWithCopyWarning in from_frame (and other places in sample_class.py)
sample.from_frame has a new argument use_deepcopy to decide if changes made to the df inside the sample object would also change the original df that was provided to the sample object. The default is now set to True since it's more likely that we'd like to keep the changes inside the sample object to the df contained in it, and not have them spill into the original df.

Contributors

@SarigT, @talgalili

0.3.1 (2023-02-01)

Bug Fixes

Sample.from_frame now also converts int16 and in8 to float16 and float16. Thus helping to avoid TypeError: Cannot interpret 'Int16Dtype()' as a data type style errors.

Documentation

Added ISSUE_TEMPLATE

Contributors

@talgalili, @stevemandala, @SarigT

0.3.0 (2023-01-30)

New Features

Added compatibility for Python 3.11 (by supporting SciPy 1.9.2) (props to @tomwagstaff-opml for flagging this issue).
Added the session-info package as a dependency.

Bug Fixes

Fixed pip install from source on Windows machines (props to @tomwagstaff-opml for the bug report).

Documentation

Added session_info.show() outputs to the end of the three tutorials (at: https://import-balance.org/docs/tutorials/)
Misc updates to the README.

Contributors

@stevemandala, @SarigT, @talgalili

0.2.0 (2023-01-19)

New Features

cli improvements:
- Add an option to set weight_trimming_mean_ratio = None for no trimming.
- Add an option to set transformations to be None (i.e. no transformations).
Add an option to adapt the title in:
- stats_and_plots.weighted_comparison_plots.plot_bar
- stats_and_plots.weighted_comparison_plots.plot_hist_kde

Bug Fixes

Fix (and simplify) balanceDF.plot to organize the order of groups (now unadjusted/self is left, adjusted/self center, and target is on the right)
Fix plotly functions to use the red color for self when only compared to target (since in that case it is likely unadjusted): balance.stats_and_plots.weighted_comparisons_plots.plotly_plot_qq and balance.stats_and_plots.weighted_comparisons_plots.plotly_plot_bar
Fix seaborn_plot_dist: output None by default (instead of axis object). Added a return_Axes argument to control this behavior.
Fix some test_cbps tests that were failing due to non-exact matches (we made the test less sensitive)

Documentation

New blog section, with the post: Bringing "balance" to your data
New tutorial:
- quickstart_cbps - like the quickstart tutorial, but shows how to use the CBPS algorithm and compares the results to IPW (logistic regression with LASSO).
- balance_transformations_and_formulas - This tutorial showcases ways in which transformations, formulas and penalty can be included in your pre-processing of the covariates before adjusting for them.
API docs:
- New: highlighting on codeblocks
- a bunch of text fixes.
Update README.md
- logo
- with contributors
- typo fixes (props to @zbraiterman and @luca-martial).
Added section about "Releasing a new version" to CONTRIBUTING.md
- Available under "Docs/Contributing" section of website

Misc

Added automated Github Action package builds & deployment to PyPi on release.
- See release.yml

Contributors

@stevemandala, @SarigT, @talgalili

0.1.0 (2022-11-20)

Summary

balance released to the world!

Contributors

@SarigT, @talgalili, @stevemandala

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

0.17.0 (TBD)

Breaking Changes

Documentation

Bug Fixes

Tests

0.16.0 (2026-02-09)

New Features

Bug Fixes

Packaging & Tests

Breaking Changes

0.15.0 (2026-01-20)

New Features

Documentation

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

0.14.0 (2025-12-14)

New Features

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

0.13.0 (2025-12-02)

New Features

Documentation

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

0.12.1 (2025-11-03)

New Features

Documentation

Bug Fixes

0.12.0 (2025-10-14)

New Features

Contributors

0.11.0 (2025-09-24)

New Features

Bug Fixes

Tests

Documentation

Contributors

0.10.0 (2025-01-06)

New Features

Misc

Contributors

0.9.1 (2023-07-30)

Bug Fixes

Documentation

0.9.0 (2023-05-22)

News

New Features

Bug Fixes

Documentation

Contributors

0.8.0 (2023-04-26)

New Features

Misc

Documentation

Contributors

0.7.0 (2023-04-10)

New Features

Bug Fixes

Misc

Contributors

0.6.0 (2023-04-05)

New Features

Contributors

0.5.0 (2023-03-06)

New Features

Documentation

Contributors

0.4.0 (2023-02-08)