Skip to content

ASR drop on MMSafeBench after replacing the anchor image #4

@jkcjkc

Description

@jkcjkc

Hi, thanks for the great work.

I am currently experimenting with the code and have a question regarding the attack transferability.

Experiment Details: I replaced the initial image used to generate the adversarial images. Aside from this change, all other configurations remain consistent with the original repository.

Results:

AdvBench Subset: The Attack Success Rate (ASR) is 98%.

MMSafeBench: When transferring the attack to mmsafebench, the ASR drops significantly to 51%.

Evaluator: Both benchmarks were evaluated using HarmBench-Llama-2-13b-cls.

Question: Could this performance gap be attributed to the change in the image itself? Or is it possible that the evaluator (HarmBench-Llama-2-13b-cls) differs from the setting/judge used in the original paper?

Any insights would be appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions