lsynth

MAP-alignment fidelity and dataset distance for synthetic tabular data

This package implements the one-sided MAP-alignment fidelity statistic introduced by Chattopadhyay et al. and described in the manuscript “How Good Is Your Synthetic Data?”.

The package cleanly separates synthetic data generation from evaluation:

generate_syndata(...) produces synthetic tabular data using multiple generators.
compute_upsilon(df, ...) evaluates MAP-alignment fidelity on any external DataFrame.

The library is fully compatible with externally generated synthetic data and is primarily aimed at evaluation, although several reference generators are included.

Core Idea

For a synthetic record to be realistic, each coordinate should agree with the conditional MAP prediction inferred from real data.

For a data record x and coordinate i:

υ(x, i) = φ_i(x_i | x_{-i}) / max_y φ_i(y | x_{-i})

Averaged over samples and coordinates:

Υ(D) ∈ [0, 1]

High Υ ⇒ synthetic preserves real conditional structure
Low Υ ⇒ structural distortion (even if marginals or covariance match)

Installation

pip install lsynth

Synthetic Data Generation

Synthetic data generation is handled by generate_syndata, which always returns a pandas DataFrame whose columns exactly match the target feature space.

from lsynth import generate_syndata

Supported Generators

"LSM" Uses a trained QuasiNet model as a generative model via qsample.
"BASELINE" Independent-column null model (Gaussian for numeric, categorical sampling for discrete).
"CTGAN" Uses SDV’s CTGANSynthesizer.
Custom generators Any user-defined function returning a DataFrame with the correct columns.

Example: LSM Generation

df_lsm = generate_syndata(
    num=1000,
    model_path="gss_2018.joblib",
    gen_algorithm="LSM",
    n_workers=8,
)

Example: Baseline Generator

df_baseline = generate_syndata(
    num=1000,
    gen_algorithm="BASELINE",
    orig_df=df_real,
    feature_names=df_real.columns.tolist(),
)

Example: CTGAN Generator

df_ctgan = generate_syndata(
    num=1000,
    gen_algorithm="CTGAN",
    orig_df=df_real,
    feature_names=df_real.columns.tolist(),
)

MAP-Alignment Fidelity Evaluation

Evaluation is handled only by compute_upsilon, which operates on an external DataFrame.

from lsynth import compute_upsilon

ups, _ = compute_upsilon(
    df=df_lsm,
    model_path="gss_2018.joblib",
    n_workers=8,
)

print("Mean Υ:", ups.mean())

Key properties:

No data generation occurs inside compute_upsilon
Columns must match model.feature_names exactly and in order
Works identically for real or synthetic data

Example Notebook

See example2.ipynb for a complete, reproducible workflow:

Load real data
Generate synthetic datasets using multiple generators
Compute and compare MAP-alignment fidelity
Interpret structural degradation across generators

Why MAP-Alignment?

Because covariance matching is insufficient.

The manuscript shows explicit counterexamples where:

Real and synthetic data share identical means, variances, and covariance matrices
Yet differ strongly in higher-order and conditional structure
MAP-alignment detects the discrepancy immediately

This method:

Detects nonlinear and higher-order dependencies
Avoids embedding or representation artifacts
Comes with finite-sample uncertainty control

Citation

Chattopadhyay I, et al.
“How Good Is Your Synthetic Data?”

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
datasets		datasets
dist		dist
examples		examples
local_examples		local_examples
lsynth.egg-info		lsynth.egg-info
lsynth		lsynth
refs		refs
tex		tex
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.rst		README.rst
makedoc.sh		makedoc.sh
setup.py		setup.py
upload.sh		upload.sh
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

lsynth

Core Idea

Installation

Synthetic Data Generation

Supported Generators

Example: LSM Generation

Example: Baseline Generator

Example: CTGAN Generator

MAP-Alignment Fidelity Evaluation

Example Notebook

Why MAP-Alignment?

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

lsynth

Core Idea

Installation

Synthetic Data Generation

Supported Generators

Example: LSM Generation

Example: Baseline Generator

Example: CTGAN Generator

MAP-Alignment Fidelity Evaluation

Example Notebook

Why MAP-Alignment?

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages