This repository contains code for training Denoising Diffusion Probabilistic Models (DDPMs) on CIFAR-10 and evaluating different attacks on these models, as part of our upcoming work on copyright protection for diffusion models.
The codebase supports:
- Training standard diffusion models
- Training models on different dataset splits
- Running different attacks (MIA, DRA) and storing results.
- Analyzing attack results.
conda env create -f environment.yml
Our experiments focus on CIFAR-10. For the CP-$k$ algorithm, we support training a proposal model --set_index flag. DP models are trained on the entire dataset.
There is no explicit work necessary to download or prepare the CIFAR-10 dataset, as the code will automatically download the dataset when first run.
To train a general conditional diffusion model on the entirety of CIFAR-10, you can run the following command.
python main.py --train --set_index=3To train on the dataset subsets used for the safe models, you may also use --set_index=1 or --set_index=2. To train with DP, you can set --privacy=True. Other flags can be set via command line or through the main.py file, depending on your preference.
To evaluate a trained model's FID, you can use the following commands.
python main.py --eval --logdir=<enter_logdir_here>You can modify the main.py file's arguments, either via command line or through the main.py file, to evaluate using the CP-$k$ mechanism if desired.
At a high level, we use the generator.py file to run attacks and the analysis.py file to analyze results. We use this structure because performing attacks can be computationally expensive, and we do not require separate runs for the CP-$k$ mechanism, as we store the log probabilities and threshold using the CP-$k$ mechanimsm when necessary during analysis, which gives greater flexibility to the analysis process.
To run attacks, we use the generator.py file. This file contains the main function that runs attacks and saves results.
python generator.py --config=<path_to_config_file> --seed=<seed>To analyze results, we use the analysis.py file. This file contains the main function that analyzes results and saves plots. You can find examples for analysis configs for different attacks in the config/analysis_config/ directory. When not using the CP-$k$ mechanism, the code will automatically use the log probabilities and threshold stored during the attack run.
When modifying the configuration files to match your generator runs, it is important that you select the outermost run directory (e.g. logs/generator_logs/cp_run_1/ as opposed to logs/generator_logs/cp_run_1/membership_inference/results/). The codebase automatically handles finding relevant files that contain the data from each run.
python analysis.py --config=<path_to_config_file> --seed=<seed>You will need to update both analysis and generator configs to reflect your own model paths.