🏥👨‍💻 Synthetic Patient Generator 👨‍💻🏥

📌 Project Overview

This project uses a Generative Adversarial Network (GAN) to create realistic synthetic patient data for use in pharmaceutical and healthcare research. The goal is to support meaningful analysis while keeping patient information private and secure. The model is trained on anonymized patient records, learning the patterns and statistics of real data so it can generate new, lifelike datasets that reflect the same trends—without revealing anyone’s personal details. Real patient datasets are locked behind privacy laws, IRB approval, and data sharing agreements

📋 Implementation

This project can be a stepping stone in creating tools that bridge the gap between real-world healthcare data and safe, privacy-preserving research. The synthetic patient generator can be used in a multitude of scenarios:

Medical Research - Researchers can test hyptheses, explore disease models, and simulate treatment outcomes without needing immediate access to sensitive patient data.
Algorithm Development - Data Scientists can build, train, and implement the stress-test machine learning models on synthetic cohorts before applying them to clinical datasets (limited or regulated)
Policy & Planning Simulations - Public health analysts can use this to model population-level and study potential impacts before rollout.
Healthcare Software - Clinical decision support tools or EHR systems populating their applications with synthetic patients to test features, workflows, and analytics without the privacy concerns.

🔬 Workflow

Data Preparation: Load the anonymized patient dataset in .npy format.
Model Training: Train a Wasserstein GAN to improve stability during learning.
Synthetic Generation: Use the trained model to create realistic synthetic patient samples.
Validation: Check that privacy is protected and the synthetic data matches real-world patterns.
Deployment: Release the validated synthetic dataset for research purposes.

📈Data Interpretation

Generates 500+ synthetic patient records from 297 real samples
Privacy risk under 0.5% (minimal re-identification threat)
Uses Wasserstein GAN for stable, reliable training

⚙️ Technical Methods & Models

Language: Python
Algorithms: GANs, Variational Autoencoders
Libraries: TensorFlow, Numpy, Scikit-learn, Matplotlib

👨‍💻 Installation

git clone https://github.com/JobinJohn24/Synthetic-Patient-Generator.git

cd Synthetic-Patient-Generator

pip install -r requirements.txt

👨‍🔬 Results

After training 297 real patients records:

Metric	Value
Synthetic Samples generated	500
Privacy Risk	0.34%
Real Cluster Tightness	2.91 average
Synthetic Cluster Spread	5.44
PCA Variance Captured	36% in 2D

📊 Visualizations

K-means clustering showed how synthetic samples are distributed across clusters.
PCA Projection & Feature Distribution shows the overlap in data between the synthetic and real datasets based on individual features

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
images		images
scripts		scripts
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
analysis.ipynb		analysis.ipynb
patient_data.npy		patient_data.npy
synthetic_patients.npy		synthetic_patients.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏥👨‍💻 Synthetic Patient Generator 👨‍💻🏥

📌 Project Overview

📋 Implementation

🔬 Workflow

📈Data Interpretation

⚙️ Technical Methods & Models

👨‍💻 Installation

👨‍🔬 Results

📊 Visualizations

About

Uh oh!

Releases

Packages

Languages

License

JobinJohn24/Synthetic-Patient-Generator

Folders and files

Latest commit

History

Repository files navigation

🏥👨‍💻 Synthetic Patient Generator 👨‍💻🏥

📌 Project Overview

📋 Implementation

🔬 Workflow

📈Data Interpretation

⚙️ Technical Methods & Models

👨‍💻 Installation

👨‍🔬 Results

📊 Visualizations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages