Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,30 @@ Inspired by [Awesome Synthetic Data](https://github.com/gretelai/awesome-synthet

# Open source tools

+ [Copulas](https://github.com/sdv-dev/Copulas): a Python library for modeling multivariate distributions and sampling from them using copula functions.
+ [CTGAN](https://github.com/sdv-dev/CTGAN): SDV’s collection of deep learning-based synthetic data generators for single table data.
+ [DataGene](https://github.com/firmai/datagene): a tool to train, test, and validate datasets, detect and compare dataset similarity between real and synthetic datasets.
+ [DoppelGANger](https://github.com/fjxmlzn/DoppelGANger): a synthetic data generation framework based on generative adversarial networks (GANs).
+ [DP_WGAN-UCLANESL](https://github.com/nesl/nist_differential_privacy_synthetic_data_challenge): this solution trains a Wasserstein generative adversarial network (w-GAN) that is trained on the real private dataset.
+ [DPSyn](https://github.com/usnistgov/PrivacyEngCollabSpace/tree/master/tools/de-identification/Differential-Privacy-Synthetic-Data-Challenge-Algorithms/DPSyn): an algorithm for synthesizing microdata while satisfying differential privacy.
+ [Faker](https://github.com/joke2k/faker): a Python package that generates fake data (Note: this tool does not generate synthetic data but offers dummy data).
+ [Generative adversarial nets for synthetic time series data](https://github.com/stefan-jansen/synthetic-data-for-finance): a repository that shows how to create synthetic time-series data using generative adversarial networks (GANs).
+ [Gretel.ai](https://gretel.ai/): commercial synthetic data vendor that offers open source functionality.
+ [mirrorGen](https://github.com/DataResponsibly/MirrorDataGenerator): a python tool that generates synthetic data based on user-specified causal relations among features in the data.
+ [Plait.py](https://github.com/plaitpy/plaitpy): a program for generating fake data from composable yaml templates.
+ [Pydbgen](https://github.com/tirthajyoti/pydbgen): a Python package that generates a random database table based on the user's choice of data types.
+ [Smart noise synthesizer](https://smartnoise.org/): a differentially private open source synthesizer for tabular data.
+ [Synner](https://github.com/huda-lab/synner): an open source tool to generate real-looking synthetic data by visually specifying the properties of the dataset.
+ [Synth](https://www.getsynth.com/): an open source data-as-code tool that provides a simple CLI workflow for generating consistent data in a scalable way.
+ [Synthea](https://synthetichealth.github.io/synthea/): an open source synthetic patient generator that models the medical history of synthetic patients.
+ [Synthetic data vault (SDV)](https://sdv.dev/): one of the first open source synthetic data solutions, SDV provides tools for generating synthetic data for tabular, relational, and time series data.
+ [TGAN](https://github.com/sdv-dev/TGAN): generative adversarial training for generating synthetic tabular data.
+ [Tofu](https://github.com/spiros/tofu): a Python library for generating synthetic UK Biobank data.
+ [Twinify](https://github.com/DPBayes/twinify): a software package for privacy-preserving generation of a synthetic twin to a given sensitive data set.
+ [YData](https://github.com/ydataai/ydata-synthetic): synthetic structured data generator by YData, a commercial vendor.

# Open source with commercial license

+ [Gretel.ai](https://gretel.ai/): commercial synthetic data vendor that offers open source functionality.
+ [Synthetic data vault (SDV)](https://sdv.dev/): one of the first open source synthetic data solutions, SDV provides tools for generating synthetic data for tabular, relational, and time series data.
+ [Copulas](https://github.com/sdv-dev/Copulas): a Python library for modeling multivariate distributions and sampling from them using copula functions.
+ [CTGAN](https://github.com/sdv-dev/CTGAN): SDV’s collection of deep learning-based synthetic data generators for single table data.
+ [TGAN](https://github.com/sdv-dev/TGAN): generative adversarial training for generating synthetic tabular data.

# Commercial solutions

Expand Down