BinAgg: Differentially Private Linear Regression

Last updated on March 1, 2026.

BinAgg: Differentially Private Linear Regression

A Python package for differentially private linear regression and synthetic data generation using the Binning-Aggregation framework under Gaussian differential privacy (GDP).

This package implements the algorithms from the paper and may be expanded with additional functionality in the near future. Please use the command below to obtain the latest version.

Citation

It is based on the paper:

Lin, S., Slavković, A., & Bhoomireddy, D. R. (2026). Differentially private linear regression and synthetic data generation with statistical guarantees. In Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS). Proceedings of Machine Learning Research.

If you use this package, please cite:

@inproceedings{lin2026differentially,
  title     = {Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees},
  author    = {Lin, Shurong and Slavkovi{\'c}, Aleksandra and Bhoomireddy, Deekshith Reddy},
  booktitle = {Proceedings of the 29th International Conference on Artificial Intelligence and Statistics},
  series    = {Proceedings of Machine Learning Research},
  year      = {2026},
  publisher = {PMLR}
}

Features

Based on the method from the paper:

Binning-Aggregation: Differentially private data binning followed by aggregation and privatization
DP Linear Regression: Bias-corrected weighted least squares with valid confidence intervals
DP Synthetic Data Generation: Generate privacy-preserving synthetic datasets
GDP Privacy Accounting: Tight composition using Gaussian Differential Privacy

Installation

From GitHub (Recommended)

pip install git+https://github.com/shuronglin/binagg.git

Upgrade to Latest Version

pip uninstall binagg -y && pip install git+https://github.com/shuronglin/binagg.git

From Source (For Development)

# Clone the repository
git clone https://github.com/shuronglin/binagg.git
cd binagg

# Install in development mode
pip install -e .

# Or install with development dependencies
pip install -e ".[dev]"

From PyPI (Coming Soon)

pip install binagg

Requirements

Python >= 3.9
NumPy >= 1.20
SciPy >= 1.7

Quick Start

For detailed tutorials, see the examples/ folder, which includes both real data and simulated data examples.

DP Linear Regression

import numpy as np
from binagg import dp_linear_regression

# Generate sample data
np.random.seed(42)
n, d = 500, 3
X = np.random.uniform(0, 10, (n, d))
true_beta = np.array([1.5, -2.0, 0.5])
y = X @ true_beta + np.random.normal(0, 1, n)

# Define public domain bounds (required for DP, must be specified by analyst)
# These should be known a priori or privately computed from the sensitive data
x_bounds = [(0, 10), (0, 10), (0, 10)]  # Known domain for each feature
y_bounds = (-30, 30)  # Known range for target variable

# Run DP regression with μ=1.0 privacy budget
result = dp_linear_regression(
    X, y, x_bounds, y_bounds,
    mu=1.0,           # Privacy budget (μ-GDP)
    alpha=0.05,       # 95% confidence intervals
    random_state=42
)

# Results
print("Coefficients:", result.coefficients)
print("Standard Errors:", result.standard_errors)
print("95% CI:", result.confidence_intervals)
print(f"Number of bins: {result.n_bins}")

DP Synthetic Data Generation

from binagg import generate_synthetic_data

# Generate synthetic data
syn_result = generate_synthetic_data(
    X, y, x_bounds, y_bounds,
    mu=1.0,
    random_state=42
)

print(f"Generated {syn_result.n_samples} synthetic samples")
print(f"Synthetic X shape: {syn_result.X_synthetic.shape}")
print(f"Synthetic y shape: {syn_result.y_synthetic.shape}")

# Use synthetic data for downstream analysis
X_syn = syn_result.X_synthetic
y_syn = syn_result.y_synthetic

Privacy Budget Conversion

from binagg import (
    delta_from_gdp,
    eps_from_mu_delta,
    mu_from_eps_delta,
    compose_gdp
)

# μ-GDP to (ε, δ)-DP: given μ and ε, compute δ
delta = delta_from_gdp(mu=1.0, eps=2.0)
print(f"(μ=1.0, ε=2.0) → δ={delta:.6f}")

# μ-GDP to (ε, δ)-DP: given μ and δ, compute ε
eps = eps_from_mu_delta(mu=1.0, delta=1e-5)
print(f"(μ=1.0, δ=1e-5) → ε={eps:.2f}")

# (ε, δ)-DP to μ-GDP
mu = mu_from_eps_delta(eps=1.0, delta=1e-5)
print(f"(ε=1.0, δ=1e-5) → μ={mu:.2f}")

# Compose multiple mechanisms
total_mu = compose_gdp(0.5, 0.5, 0.5, 0.5)  # Four mechanisms
print(f"Composed privacy: μ={total_mu:.2f}")

API Reference

Main Functions

`dp_linear_regression(X, y, x_bounds, y_bounds, mu, ...)`

Performs differentially private linear regression with bias correction.

Parameters:

X: Feature matrix of shape (n, d)
y: Label vector of shape (n,)
x_bounds: Per-feature bounds as [(L_1, U_1), ..., (L_d, U_d)] - must be specified by analyst, not computed from data
y_bounds: Bounds on y as (y_min, y_max)
mu: Total privacy budget in μ-GDP
theta: PrivTree splitting threshold (default: 0)
alpha: Significance level for confidence intervals (default: 0.05 for 95% CI)
budget_ratios: Privacy budget ratios for (binning, count, sum_x, sum_y) (default: (1, 3, 3, 3))
min_count: Minimum noisy count to keep a bin (default: 2)
clip: Whether to clip input data to bounds (default: True)
return_synthetic: If True, also return synthetic data using the same privacy budget (default: False)
clip_synthetic_output: Whether to clip synthetic output to bounds, only used when return_synthetic=True (default: False)
preserve_sample_size: If True, rescale noisy counts so total equals original sample size n (default: True)
random_state: Random seed for reproducibility

Returns:

If return_synthetic=False: DPRegressionResult with coefficients, standard_errors, confidence_intervals, n_bins
If return_synthetic=True: Tuple of (DPRegressionResult, SyntheticDataResult) - both share the same privacy budget

`generate_synthetic_data(X, y, x_bounds, y_bounds, mu, ...)`

Generates differentially private synthetic data that preserves the joint (X, y) distribution.

Parameters:

X: Feature matrix of shape (n, d)
y: Label vector of shape (n,)
x_bounds: Per-feature bounds as [(L_1, U_1), ..., (L_d, U_d)]
y_bounds: Bounds on y as (y_min, y_max)
mu: Total privacy budget in μ-GDP
theta: PrivTree splitting threshold (default: 0)
budget_ratios: Privacy budget ratios for (binning, count, sum_x, sum_y) (default: (1, 3, 3, 3))
min_count: Minimum noisy count to generate samples from a bin (default: 2)
clip: Whether to clip input data to bounds (default: True)
clip_output: Whether to clip synthetic output data to bounds (default: False)
preserve_sample_size: If True, rescale noisy counts so total synthetic samples equals original n (default: True)
random_state: Random seed for reproducibility

Returns: SyntheticDataResult with:

X_synthetic: Synthetic features
y_synthetic: Synthetic targets
n_samples: Number of samples generated
n_bins_used: Number of bins used for generation

`privtree_binning(X, y, x_bounds, mu_bin, ...)`

Private binning using PrivTree algorithm.

`privatize_aggregates(bin_result, y_bound, mu_agg, ...)`

Add calibrated noise to bin aggregates.

Privacy Functions

delta_from_gdp(mu, eps): μ-GDP → (ε, δ)-DP, compute δ given μ and ε
eps_from_mu_delta(mu, delta): μ-GDP → (ε, δ)-DP, compute ε given μ and δ
mu_from_eps_delta(eps, delta): (ε, δ)-DP → μ-GDP
compose_gdp(*mus): Compose multiple μ-GDP mechanisms
allocate_budget(total_mu, ratios): Split budget by ratios

Understanding Privacy Parameters

μ-GDP (Gaussian Differential Privacy)

This package uses μ-GDP for privacy accounting. Smaller values of μ correspond to stronger privacy guarantees.

μ ≤ 0.5: Strong privacy protection (higher noise, lower accuracy)
0.5 < μ ≤ 1.5: Moderate privacy protection
μ > 1.5: Weaker privacy protection (lower noise, higher accuracy)

Converting to (ε, δ)-DP

from binagg import delta_from_gdp

# For μ=1.0, what's δ at ε=1?
delta = delta_from_gdp(mu=1.0, eps=1.0)
# δ ≈ 0.12

# For μ=1.0, what's δ at ε=2?
delta = delta_from_gdp(mu=1.0, eps=2.0)
# δ ≈ 0.02

Budget Allocation

The default budget split (1, 3, 3, 3) allocates:

10% to binning (PrivTree)
30% to noisy counts
30% to noisy sum(X)
30% to noisy sum(y)

Examples

See the examples/ directory for complete tutorials:

basic_regression.py: Simple DP regression example
synthetic_data.py: Generating and using synthetic data
privacy_accounting.py: Understanding privacy budgets
real_data_example.py: Working with real datasets

Testing

# Run all tests
pytest tests/ -v

# Run specific test module
pytest tests/test_regression.py -v

# Run with coverage
pytest tests/ --cov=binagg

Contributors

Shurong Lin - Original algorithm implementation and paper author; package development and testing

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
examples		examples
src/binagg		src/binagg
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

BinAgg: Differentially Private Linear Regression

Citation

Features

Installation

From GitHub (Recommended)

Upgrade to Latest Version

From Source (For Development)

From PyPI (Coming Soon)

Requirements

Quick Start

DP Linear Regression

DP Synthetic Data Generation

Privacy Budget Conversion

API Reference

Main Functions

dp_linear_regression(X, y, x_bounds, y_bounds, mu, ...)

generate_synthetic_data(X, y, x_bounds, y_bounds, mu, ...)

privtree_binning(X, y, x_bounds, mu_bin, ...)

privatize_aggregates(bin_result, y_bound, mu_agg, ...)

Privacy Functions

Understanding Privacy Parameters

μ-GDP (Gaussian Differential Privacy)

Converting to (ε, δ)-DP

Budget Allocation

Examples

Testing

Contributors

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`dp_linear_regression(X, y, x_bounds, y_bounds, mu, ...)`

`generate_synthetic_data(X, y, x_bounds, y_bounds, mu, ...)`

`privtree_binning(X, y, x_bounds, mu_bin, ...)`

`privatize_aggregates(bin_result, y_bound, mu_agg, ...)`

Packages