Skip to content

asaha1203/copyright-project

Repository files navigation

Copyright Research Project

This repository contains code for training Denoising Diffusion Probabilistic Models (DDPMs) on CIFAR-10 and evaluating different attacks on these models, as part of our upcoming work on copyright protection for diffusion models.

Overview

The codebase supports:

  1. Training standard diffusion models
  2. Training models on different dataset splits
  3. Running different attacks (MIA, DRA) and storing results.
  4. Analyzing attack results.

Requirements

conda env create -f environment.yml

Dataset Preparation

Our experiments focus on CIFAR-10. For the CP-$k$ algorithm, we support training a proposal model $p$ on the entire dataset, and training smaller models $q_1$ and $q_2$ on halves of the dataset. You can select which subset you'd like to train with using the --set_index flag. DP models are trained on the entire dataset.

There is no explicit work necessary to download or prepare the CIFAR-10 dataset, as the code will automatically download the dataset when first run.

Training Diffusion Models

Basic Training

Training on Specific Dataset Splits

To train a general conditional diffusion model on the entirety of CIFAR-10, you can run the following command.

python main.py --train --set_index=3

To train on the dataset subsets used for the safe models, you may also use --set_index=1 or --set_index=2. To train with DP, you can set --privacy=True. Other flags can be set via command line or through the main.py file, depending on your preference.

Model Evaluation

To evaluate a trained model's FID, you can use the following commands.

python main.py --eval --logdir=<enter_logdir_here>

You can modify the main.py file's arguments, either via command line or through the main.py file, to evaluate using the CP-$k$ mechanism if desired.

Running and Evaluating Attacks

At a high level, we use the generator.py file to run attacks and the analysis.py file to analyze results. We use this structure because performing attacks can be computationally expensive, and we do not require separate runs for the CP-$k$ mechanism, as we store the log probabilities and threshold using the CP-$k$ mechanimsm when necessary during analysis, which gives greater flexibility to the analysis process.

Using generator.py

To run attacks, we use the generator.py file. This file contains the main function that runs attacks and saves results.

python generator.py --config=<path_to_config_file> --seed=<seed>

Using analysis.py

To analyze results, we use the analysis.py file. This file contains the main function that analyzes results and saves plots. You can find examples for analysis configs for different attacks in the config/analysis_config/ directory. When not using the CP-$k$ mechanism, the code will automatically use the log probabilities and threshold stored during the attack run.

When modifying the configuration files to match your generator runs, it is important that you select the outermost run directory (e.g. logs/generator_logs/cp_run_1/ as opposed to logs/generator_logs/cp_run_1/membership_inference/results/). The codebase automatically handles finding relevant files that contain the data from each run.

python analysis.py --config=<path_to_config_file> --seed=<seed>

You will need to update both analysis and generator configs to reflect your own model paths.

About

Repository for copyright experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors