Skip to content

Support Dirty-Label Backdoor Attack #137

@deprit

Description

@deprit

Add support in Armory Library for an undefended Dirty-label Backdoor (DLBD) Attack applied to image classification.

In a DLBD attack, training images are chosen from the source class, a trigger applied to them, and then their labels flipped to the target class. The model is then trained on this modified data. The adversary's goal is that test images from the source class will be classified as the target class when the trigger is applied at test time.

Four primary metrics are computed after the model is trained on poisoned data.

  • Accuracy on benign test data, all classes
  • Accuracy on benign test data, source class
  • Accuracy on poisoned test data, all classes
  • Attack success rate

To evaluate a DLBD attack, Armory Library must

  • Create poison datasets by inserting triggers into selected classes and modifying labels;
  • Generate primary poisoning metrics to evaluate a poisoned model;
  • Run an example script evaluating a DLBD attack using the CIFAR10 dataset and a ResNet-18 classifier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions