Skip to content

Latest commit

 

History

History
40 lines (38 loc) · 4.18 KB

File metadata and controls

40 lines (38 loc) · 4.18 KB

Configs

Here, we explain each field in the config files. Note that only the ramia field is new compared to the base config file meant for membership inference attacks.

  • ramia: Configurations of range membership inference attacks

    • range_function: the type of range function to be used in running ramia
    • sample_size: the number of samples to be taken in each range to compute the range membership score for each range
    • radius: the size of the range
    • transformations: a list of transformation to be applied to the range center when using the geometric range function
    • mask_model: the name of the llm to replace masked tokens in the sequence when using the word_replace range function
    • mask_tokenizer: the name of the tokenizer of the mask_model when using the word_replace range function
    • num_masks: the number of words to be replaced when using the word_replace range function
  • run: Configurations related to this specific run

    • random_seed: integer number of specifying random seed. Each run of experiments will use the same random seed.
    • log_dir: Path to where all the information will be saved, including models and computed signals. If the directory contains models, these models will be loaded instead of trained. Hence, to run experiments with new models, we need to change the log_dir.
    • time_log: Indicate whether to log the time for each step. If True, a time log will be saved
    • num_experiments: Number of target models we attack. If it is more than 1, an aggregate report will be generated in the end
  • audit: Configurations related to auditing

    • privacy_game: Indicate the type of privacy game/notion. We currently support the privacy_loss_model game. We will add more games in the future.
    • algorithm: The membership inference attack used for auditing. We currently support the RMIA introduced by Zarifzadeh et al. 2024(https://openreview.net/pdf?id=sT7UJh5CTc)) and the LOSS attack
    • num_ref_models: Number of reference models used to audit each target model
    • device: The device we want to use for inferring signals and auditing models
    • report_log: The folder name where we save the log and auditing report
    • batch_size: Batch size for evaluating models and inferring signals.
    • data_size: The size of the dataset in auditing. If not specified, the entire dataset is used. Must be an even number. The sampled auditing dataset will contain equal numbers of IN and OUT data samples according to the membership information from the first target model.
  • train: Configuration related to training

    • model_name: The model type. We support CNN, wrn28-1, wrn28-2, wrn28-10, vgg16, mlp, gpt2 and speedyresnet. More model types can be added in /models/.
    • tokenizer: The tokenizer type. It can be any tokenizer or local checkpoint supported by the transformers library. For non-text datasets, this field can be dropped.
    • device: The device we want to use for training models. Note for transformers, the behavior from Huggingface's Trainer class is to use all GPUs available.
    • batch_size: Batch size for training models.
    • learning_rate: Learning rate for training models.
    • weight_decay: Weight decay for training models.
    • epochs: Number of epochs for training models.
    • optimizer: Optimizer for training models. We support SGD, Adam, AdamW. More optimizers can be added in get_optimizer in trainers/default_trainer.py.
    • peft: Configuration related to peft. It can be dropped if not needed.
  • data: Configuration related to datasets

    • dataset: The name of the dataset. We support cifar10, cifar100, purchase100 and texas100 and agnews by default.
    • data_dir: The directory where the dataset is stored. If the dataset is not found in the directory, it will be downloaded.
    • tokenize: Indicate whether to tokenize the dataset. If True, the dataset will be tokenized using the tokenizer specified in the next field. It can be dropped if not needed.
    • tokenizer: The tokenizer type. It can be any tokenizer or local checkpoint supported by the transformers library. For non-text datasets, this field can be dropped.