ASR drop on MMSafeBench after replacing the anchor image

Hi, thanks for the great work.

I am currently experimenting with the code and have a question regarding the attack transferability.

Experiment Details: I replaced the initial image used to generate the adversarial images. Aside from this change, all other configurations remain consistent with the original repository.

Results:

AdvBench Subset: The Attack Success Rate (ASR) is 98%.

MMSafeBench: When transferring the attack to mmsafebench, the ASR drops significantly to 51%.

Evaluator: Both benchmarks were evaluated using HarmBench-Llama-2-13b-cls.

Question: Could this performance gap be attributed to the change in the image itself? Or is it possible that the evaluator (HarmBench-Llama-2-13b-cls) differs from the setting/judge used in the original paper?

Any insights would be appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR drop on MMSafeBench after replacing the anchor image #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ASR drop on MMSafeBench after replacing the anchor image #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions