- Adversarial Training Methods for Semi-Supervised Text Classification
- Synthetic and Natural Noise Both Break Neural Machine Translation
- Adversarial Reprogramming of Neural Networks
- DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
- Best Thematic Paper: What's in a Name? Reducing Bias in Bios without Access to Protected Attributes
- Best Explainable NLP Paper: CNM: An Interpretable Complex-valued Network for Matching
- Best Long Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Best Short Paper: Probing the Need for Visual Context in Multimodal Machine Translation
- Best Resource Paper: CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
- Deep Contextualized Word Representations
- Learning to Map Context-Dependent Sentences to Executable Formal Queries
- Neural Text Generation in Stories using Entity Representations as Context
- Recurrent Neural Networks as Weighted Language Recognizers
- NNLM: A Neural Probabilistic Language Model
- Word2Vec: Distributed Representations of Words and Phrases and their Compositionality
- GloVe: GloVe: Global Vectors for Word Representation
- ELMo: Deep contextualized word representations
- GPT: Improving Language Understanding by Generative Pre-Training
- BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding