Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness

🚀 TPAMI 2026 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 TPAMI 2026 Paper

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

🎯 NeurIPS 2024 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 NeurIPS 2024 Paper

🔍 Overview

Pretrained vision-language models such as CLIP demonstrate remarkable zero-shot generalization ability. However, they remain highly vulnerable to adversarial perturbations.

We identify a critical phenomenon:

Adversarial perturbations systematically shift text-guided attention, rather than merely corrupting pixel space.

Based on this insight, we propose:

TGA-ZSR (NeurIPS 2024)
Text-Guided Attention for Zero-Shot Robustness
Comp-TGA (TPAMI 2026)
Complementary Text-Guided Attention

Across 16 datasets, our methods improve zero-shot robust accuracy by:

+9.58% with TGA-ZSR
+11.95% with Comp-TGA

🧠 Motivation

Attention Shift Under Adversarial Perturbation

Adversarial examples induce significant deviation in text-guided attention.

Spurious Attention in Clean Samples

Even without adversarial perturbations, text-guided attention may focus on irrelevant regions.

🚀 Method

TGA-ZSR Framework

TGA-ZSR consists of two components:

Local Attention Refinement Module
Aligns adversarial attention with clean attention from the original model.

Global Attention Constraint Module
Preserves clean performance while enhancing robustness.

This design enforces attention consistency without sacrificing zero-shot generalization.

Complementary Text-Guided Attention (Comp-TGA)

We observe that standard text-guided attention occasionally captures spurious foreground cues.

Comp-TGA introduces a complementary fusion mechanism:

Class-prompt guided foreground attention
Reversed non-class prompt driven attention

By integrating these two complementary signals, the model captures a more accurate foreground representation and improves robustness stability.

📊 Zero-Shot Adversarial Robustness Benchmark

Method	Venue	Robust	Clean	Average
CLIP	ICML 2021	4.90	64.42	34.66
FT-Clean	Initial Entry	7.05	54.37	30.71
FT-Adv.	Initial Entry	28.83	43.36	36.09
TeCoA	ICLR 2023	28.06	45.81	36.93
PMG-AFT	CVPR 2024	32.51	46.60	39.55
FARE	ICML 2024	18.25	59.85	39.05
Vision-based	Initial Entry	29.47	45.02	37.24
TGA-ZSR (Ours)	NeurIPS 2024	42.09	56.44	49.27
Comp-TGA (Ours)	TPAMI 2026	44.46	55.44	49.95

Robustness–Clean Trade-off

Each point represents a method.
Point size reflects trade-off quality between clean and robust accuracy.

🔧 Reproducibility

Checkpoints

⚙️ Environment Setup

pip install virtualenv
virtualenv TGA-ZSR
source TGA-ZSR/venv/bin/activate
pip install -r requirements.txt

Experiment:

Run the code with (TeCoA and PMG-AFT see source code.):

bash ./main.sh

options for each of the code parts :

--Method: Differentiate between checkpoints obtained using various methods.
--train_eps: The magnitude of the perturbation applied to generate the training adversarial example. (default = 1)
--train_numsteps: The number of iteration applied to generate the training adversarial example. (default = 2)
--train_stepsize: The iteration step size applied to generate the training adversarial example. (default = 1)
--test_eps: The magnitude of the perturbation applied to generate the test adversarial example. (default = 1)
--test_numsteps: The number of iteration applied to generate the test adversarial example. (default = 100)
--test_stepsize: The iteration step size applied to generate the test adversarial example. (default = 1)
--arch: Different CLIP versions. (default = 'vit_b32')
--dataset: The dataset used for training. (default = 'tinyImageNet')
--seed: random seed. (default = 0)
--resume: Address of checkpoint. (default = None)
--last_num_ft: fine tuning layer (default = 0)
--VPbaseline: Whether adversarial training is conducted or not.

Specific Options ：

TGA-ZSR.py

--Distance_metric: Select the distance measure in the loss function. (default = 'l2')
--atten_methods: Attention from different perspectives. (default = 'text')
--Alpha: L_LARM in Equ.9. (default = 0.08)
--Beta: L_GACM in Equ.12. (default = 0.05)

Comp-TGA.py:

--Distance_metric: Select the distance measure in the loss function. (default = 'l2')
--atten_methods: Attention from different perspectives. (default = 'text')
--Alpha: L_LARM in Equ.9. (default = 0.10)
--Beta: L_GACM in Equ.12. (default = 0.07)

Citation

If you find this repository useful, please consider citing our paper:

@inproceedings{TGA-ZSR,
     title={Text-guided attention is all you need for zero-shot robustness in vision-language models},
     author={Yu, Lu and Zhang, Haiyang and Xu, Changsheng},
     journal={Advances in Neural Information Processing Systems},
     volume={37},
     pages={96424--96448},
     year={2024}
}

@article{Comp-TGA,
     title={Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness},
     author={Yu, Lu and Zhang, Haiyang and Xu, Changsheng},
     journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
     year={2026},
     publisher={IEEE}
}

Acknowledgement

We gratefully thank the authors from TeCoA and CLIPCAM for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
__pycache__		__pycache__
models		models
pytorch_grad_cam_modified		pytorch_grad_cam_modified
replace		replace
save/figure		save/figure
util		util
.DS_Store		.DS_Store
Comp-TGA.py		Comp-TGA.py
FT.py		FT.py
README.md		README.md
TGA-ZSR.py		TGA-ZSR.py
attacks.py		attacks.py
attention_map.py		attention_map.py
imagenet_classes_names.txt		imagenet_classes_names.txt
main.sh		main.sh
requirements.txt		requirements.txt
tinyimagenet_classes_name.txt		tinyimagenet_classes_name.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness

🚀 TPAMI 2026 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 TPAMI 2026 Paper

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

🎯 NeurIPS 2024 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 NeurIPS 2024 Paper

🔍 Overview

🧠 Motivation

Attention Shift Under Adversarial Perturbation

Spurious Attention in Clean Samples

🚀 Method

TGA-ZSR Framework

Complementary Text-Guided Attention (Comp-TGA)

📊 Zero-Shot Adversarial Robustness Benchmark

Robustness–Clean Trade-off

🔧 Reproducibility

Checkpoints

⚙️ Environment Setup

Experiment:

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness

🚀 TPAMI 2026 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 TPAMI 2026 Paper

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

🎯 NeurIPS 2024 Lu Yu · Haiyang Zhang · Changsheng Xu 📄 NeurIPS 2024 Paper

🔍 Overview

🧠 Motivation

Attention Shift Under Adversarial Perturbation

Spurious Attention in Clean Samples

🚀 Method

TGA-ZSR Framework

Complementary Text-Guided Attention (Comp-TGA)

📊 Zero-Shot Adversarial Robustness Benchmark

Robustness–Clean Trade-off

🔧 Reproducibility

Checkpoints

⚙️ Environment Setup

Experiment:

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages