- You can’t solve AI security problems with more AI
- Adversaries Can Misuse Combinations of Safe Models
- On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning
- Data Poisoning Won't Save You From Facial Recognition
- On the Impossible Safety of Large AI Models
- Beyond Labeling Oracles: What does it mean to steal ML models?
- Text Embeddings Reveal (Almost) As Much As Text
- Planting Undetectable Backdoors in Machine Learning Models
- Motivating the Rules of the Game for Adversarial Example Research
- On Evaluating Adversarial Robustness
- LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
- Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
- When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
- Adversarial Examples Are Not Bugs, They Are Features
- On Adaptive Attacks to Adversarial Example Defenses
- Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
- Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
- Proof-of-Learning is Currently More Broken Than You Think
- On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing Problem
- Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses
- UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI