- [2026/03] TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
- [2026/02] Weight space Detection of Backdoors in LoRA Adapters
- [2026/01] LoRA as Oracle
- [2026/01] SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models
- [2026/01] Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs
- [2025/12] Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
- [2025/12] MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks
- [2025/12] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
- [2025/12] Bias Injection Attacks on RAG Databases and Sanitization Defenses
- [2025/11] AutoBackdoor: Automating Backdoor Attacks via LLM Agents
- [2025/10] Secure Retrieval-Augmented Generation against Poisoning Attacks
- [2025/10] PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models
- [2025/10] One Token Embedding Is Enough to Deadlock Your Large Reasoning Model
- [2025/10] Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems
- [2025/10] Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
- [2025/10] ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking
- [2025/10] Goal-oriented Backdoor Attack against Vision-Language-Action Models via Physical Objects
- [2025/10] Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
- [2025/10] P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
- [2025/10] Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
- [2025/09] Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks
- [2025/09] Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
- [2025/09] Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation
- [2025/09] ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation
- [2025/08] Detecting Stealthy Data Poisoning Attacks in AI Code Generators
- [2025/08] UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation
- [2025/08] The Art of Hide and Seek: Making Pickle-Based Model Supply Chain Poisoning Stealthy Again
- [2025/08] An Investigation on Group Query Hallucination Attacks
- [2025/08] SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs
- [2025/08] IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding
- [2025/08] BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models
- [2025/08] Provably Secure Retrieval-Augmented Generation
- [2025/08] ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
- [2025/08] Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models
- [2025/08] DUP: Detection-guided Unlearning for Backdoor Purification in Language Models
- [2025/08] A Survey on Data Security in Large Language Models
- [2025/07] RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation
- [2025/07] Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
- [2025/07] When and Where do Data Poisons Attack Textual Inversion?
- [2025/07] Thought Purity: Defense Paradigm For Chain-of-Thought Attack
- [2025/07] ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks
- [2025/06] On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling
- [2025/06] Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
- [2025/06] Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs
- [2025/06] Robust Anti-Backdoor Instruction Tuning in LVLMs
- [2025/06] Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
- [2025/06] Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation
- [2025/06] A Systematic Review of Poisoning Attacks Against Large Language Models
- [2025/06] Your Agent Can Defend Itself against Backdoor Attacks
- [2025/06] Which Factors Make Code LLMs More Vulnerable to Backdoor Attacks? A Systematic Study
- [2025/06] VLMs Can Aggregate Scattered Training Patches
- [2025/06] Through the Stealth Lens: Rethinking Attacks and Defenses in RAG
- [2025/05] Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
- [2025/05] Benchmarking Poisoning Attacks against Retrieval-Augmented Generation
- [2025/05] CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models
- [2025/05] Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models
- [2025/05] Finetuning-Activated Backdoors in LLMs
- [2025/05] Backdoor Cleaning without External Guidance in MLLM Fine-tuning
- [2025/05] Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
- [2025/05] One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems
- [2025/05] System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection
- [2025/05] POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
- [2025/05] BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models
- [2025/04] Traceback of Poisoning Attacks to Retrieval-Augmented Generation
- [2025/04] Backdoor Defense in Diffusion Models via Spatial Attention Unlearning
- [2025/04] BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
- [2025/04] Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection
- [2025/04] REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models
- [2025/04] BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models
- [2025/04] Exploring Backdoor Attack and Defense for LLM-empowered Recommendations
- [2025/04] ControlNET: A Firewall for RAG-based LLM System
- [2025/04] PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization
- [2025/04] ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
- [2025/04] Practical Poisoning Attacks against Retrieval-Augmented Generation
- [2025/03] BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
- [2025/03] XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
- [2025/03] Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
- [2025/03] Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
- [2025/03] NaviDet: Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation
- [2025/03] Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models
- [2025/02] The RAG Paradox: A Black-Box Attack Exploiting Unintentional Vulnerabilities in Retrieval-Augmented Generation Systems
- [2025/02] ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
- [2025/02] BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model
- [2025/02] THEMIS: Regulating Textual Inversion for Personalized Concept Censorship
- [2025/02] A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations
- [2025/02] Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation
- [2025/01] DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs
- [2024/12] CL-attack: Textual Backdoor Attacks via Cross-Lingual Triggers
- [2024/12] Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers
- [2024/12] UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
- [2024/12] From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection
- [2024/12] Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
- [2024/11] Knowledge Database or Poison Base? Detecting RAG Poisoning Attack through LLM Activations
- [2024/11] LoBAM: LoRA-Based Backdoor Attack on Model Merging
- [2024/11] PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
- [2024/11] Combinational Backdoor Attack against Customized Text-to-Image Models
- [2024/11] When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
- [2024/10] Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation
- [2024/10] Persistent Pre-Training Poisoning of LLMs
- [2024/10] Denial-of-Service Poisoning Attacks against Large Language Models
- [2024/10] PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
- [2024/10] ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs
- [2024/10] Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning
- [2024/09] Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
- [2024/09] CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
- [2024/09] Weak-To-Strong Backdoor Attacks for LLMs with Contrastive Knowledge Distillation
- [2024/09] Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
- [2024/09] Understanding Implosion in Text-to-Image Generative Models
- [2024/09] TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
- [2024/09] The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
- [2024/09] Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor
- [2024/08] Transferring Backdoors between Large Language Models by Knowledge Distillation
- [2024/08] Compromising Embodied Agents with Contextual Backdoor Attacks
- [2024/08] Scaling Laws for Data Poisoning in LLMs
- [2024/07] Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models
- [2024/07] EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second
- [2024/07] AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
- [2024/07] Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
- [2024/06] "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
- [2024/06] BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
- [2024/06] Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors
- [2024/06] Is poisoning a real threat to LLM alignment? Maybe more so than you think
- [2024/06] CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
- [2024/06] Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models
- [2024/06] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection
- [2024/06] A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures
- [2024/06] Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
- [2024/06] BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
- [2024/06] Invisible Backdoor Attacks on Diffusion Models
- [2024/06] BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
- [2024/06] Are you still on track!? Catching LLM Task Drift with Activations
- [2024/05] Phantom: General Trigger Attacks on Retrieval Augmented Language Generation
- [2024/05] Exploring Backdoor Attacks against Large Language Model-based Decision Making
- [2024/05] TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
- [2024/05] Certifiably Robust RAG against Retrieval Corruption
- [2024/05] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
- [2024/05] Backdoor Removal for Generative Large Language Models
- [2024/04] Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tunin
- [2024/04] Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications
- [2024/04] Talk Too Much: Poisoning Large Language Models under Token Limit
- [2024/04] Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations
- [2024/04] Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models
- [2024/04] Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
- [2024/04] Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning
- [2024/04] What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
- [2024/04] UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models
- [2024/03] Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
- [2024/03] Diffusion Denoising as a Certified Defense against Clean-label Poisoning
- [2024/03] LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- [2024/02] Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models
- [2024/02] On Trojan Signatures in Large Language Models of Code
- [2024/02] WIPI: A New Web Threat for LLM-Driven Web Agents
- [2024/02] VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
- [2024/02] Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
- [2024/02] Learning to Poison Large Language Models During Instruction Tuning
- [2024/02] Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
- [2024/02] Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
- [2024/02] Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization
- [2024/02] Secret Collusion Among Generative AI Agents
- [2024/02] Test-Time Backdoor Attacks on Multimodal Large Language Models
- [2024/02] PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
- [2024/02] Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
- [2024/01] Universal Vulnerabilities in Large Language Models: In-context Learning Backdoor Attacks
- [2024/01] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
- [2023/12] Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding Practices with Insecure Suggestions from Poisoned AI Models
- [2023/12] Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
- [2023/12] The Philosopher's Stone: Trojaning Plugins of Large Language Models
- [2023/11] Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
- [2023/10] Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data
- [2023/10] Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
- [2023/10] PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
- [2023/10] Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
- [2023/10] Composite Backdoor Attacks Against Large Language Models
- [2023/09] BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
- [2023/09] BadEdit: Backdooring Large Language Models by Model Editing
- [2023/09] Universal Jailbreak Backdoors from Poisoned Human Feedback
- [2023/08] LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
- [2023/08] The Poison of Alignment
- [2023/07] Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
- [2023/06] On the Exploitability of Instruction Tuning
- [2023/05] Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- [2023/05] Poisoning Language Models During Instruction Tuning
- [2022/11] Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis