🚀 Awesome NLP Papers

📚 Awesome NLP is a curated collection of high-quality resources, papers, libraries, tools, and datasets for Natural Language Processing (NLP). Whether you're a beginner exploring the basics or an expert diving into cutting-edge research, this repository has something for everyone.

📖 Contents

🔰 1. Introduction
🛠️ 2. How to Use
🤝 3. Contributing
🧠 4. Fundamentals of Deep Learning
- 🔢 4.1 Neural Networks and Deep Learning
- ⚡ 4.2 Activation Functions
- 🔄 4.3 Backpropagation and Gradient Descent
- 🎯 4.4 Optimization Techniques
⏳ 5. Sequence Modeling
- 🔁 5.1 RNNs and LSTMs
- 📈 5.2 Sequence Models
- 🎯 5.3 Attention Mechanism
- 🔥 5.4 Transformers
📝 6. Word Representations
- 📖 6.1 Static Word Embeddings
- 🧠 6.2 Contextualized Embeddings
- 🔤 6.3 Subword-Based Representations
📊 7. Evaluation
🎯 8. Tasks
- 📝 8.1 Text Generation
- 🏷️ 8.2 Text Classification
- 🔍 8.3 Named Entity Recognition
- ❓ 8.4 Question Answering
- 🏆 8.5 Fill Mask
- 🌍 8.6 Machine Translation
🤖 9. Models
📂 10. Datasets
- 📝 10.1 Text Generation Datasets
- 🏷️ 10.2 Text Classification Datasets
- 🔍 10.3 Named Entity Recognition Datasets
- ❓ 10.4 Question Answering Datasets
- 🌍 10.6 Machine Translation Datasets
🇻🇳 11. NLP in Vietnamese
- 🔄 11.1 Vietnamese Text Preprocessing
- 🔡 11.2 Vietnamese Word Representations
- 🏷️ 11.3 Vietnamese Named Entity Recognition (NER)
- ✍️ 11.4 Vietnamese Part-of-Speech Tagging
- 📚 11.5 Vietnamese Syntax and Parsing
- 🌍 11.7 Machine Translation for Vietnamese
- ❓ 11.8 Vietnamese Question Answering
- 📝 11.9 Vietnamese Text Summarization
- 📂 11.10 Resources for Vietnamese NLP
- ⚠️ 11.11 Challenges in Vietnamese NLP

1. Introduction

Natural Language Processing (NLP) is a fast-evolving field at the intersection of 🗣️ linguistics, 🤖 artificial intelligence, and 🧠 deep learning. It powers various applications, from 💬 chatbots and 🌍 machine translation to ✍️ automated text generation and 🔍 information retrieval.

This repository organizes NLP research into key areas, making it easier for students, researchers, and practitioners to find relevant 📄 papers, 🛠️ tools, and 📚 datasets. Below is an overview of the main sections:

🧠 Fundamentals of Deep Learning: Covers the core concepts of deep learning, including neural networks, activation functions, backpropagation, and optimization techniques.
⏳ Sequence Modeling: Focuses on sequential data processing, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Transformer-based architectures.
📝 Word Representations: Explores word embedding techniques, including static embeddings (Word2Vec, GloVe) and contextualized embeddings (BERT, ELMo).
📏 Evaluation: Discusses how to measure NLP model performance, including accuracy, BLEU, ROUGE, and fairness metrics.
🎯 Tasks: A collection of research papers on key NLP applications such as 📝 text generation, 🏷️ classification, 🔍 named entity recognition (NER), ❓ question answering, and 🌍 machine translation.
🤖 Models: Covers state-of-the-art NLP models such as BERT, GPT-3, RoBERTa, T5, and many others, providing links to research papers and implementations.
📂 Datasets: A list of public datasets commonly used in NLP research, categorized by task (e.g., 🏷️ text classification, 🔍 NER, 🌍 machine translation).
🇻🇳 NLP in Vietnamese: Focuses on Vietnamese NLP research, including 🔄 text preprocessing, 🔤 embeddings, 🏷️ sentiment analysis, and 🌍 translation.

This structured collection makes it easier to 📖 understand fundamental NLP concepts, 🚀 explore the latest research, and ⚙️ apply NLP techniques to real-world problems.

2. How to Use

This repository is designed to be a comprehensive reference for NLP research and applications. Here’s how you can make the most of it:

1️⃣ Learn the Basics

If you're new to NLP, start with the Fundamentals of Deep Learning section. It provides a foundation in deep learning concepts that are essential for understanding modern NLP techniques.

2️⃣ Explore NLP Architectures

Read about different sequence modeling techniques in the Sequence Modeling section. This will introduce you to RNNs, LSTMs, the Attention Mechanism, and the Transformer model, which forms the basis of most modern NLP models.

3️⃣ Understand Word Representations

Check out the Word Representations section to learn how text is transformed into numerical vectors, including static embeddings (Word2Vec, GloVe) and contextualized embeddings (BERT, ELMo, GPT).

4️⃣ Assess Model Performance

Visit the Evaluation section to understand how NLP models are evaluated. This section covers common metrics such as BLEU for translation, ROUGE for summarization, and fairness metrics.

5️⃣ Find NLP Research Papers by Task

Browse the Tasks section for papers related to text classification, question answering, machine translation, and more.

6️⃣ Explore State-of-the-Art NLP Models

Visit the Models section to find research papers on models like BERT, GPT-3, RoBERTa, T5, and others.

7️⃣ Discover NLP Datasets

If you're looking for training datasets, check out the Datasets section, which categorizes datasets based on NLP tasks.

8️⃣ Explore Vietnamese NLP Research

For researchers focusing on Vietnamese NLP, the NLP in Vietnamese section includes papers and resources on Vietnamese text preprocessing, NER, sentiment analysis, and machine translation.

9️⃣ Stay Updated

The field of NLP is evolving rapidly. Keep an eye on new research papers and updates to this repository.

🔟 Contribute and Collaborate

If you have found a useful NLP paper or tool, consider contributing! See the Contributing section for details.

3. Contributing

We welcome contributions to make this repository better! Here’s how you can help:

Suggest Papers or Resources:
Found an important NLP paper, dataset, or tool? Open an issue or submit a pull request.
Report Issues:
Noticed a broken link or incorrect information? Let us know by opening an issue.
Enhance Documentation:
Help improve descriptions, summaries, or structure.
Submit Pull Requests:
- Fork the repository.
- Create a new branch for your changes.
- Commit your updates, ensuring they follow the existing format.
- Submit a pull request with a clear description of your contribution.

Check out our Contribution Guidelines for detailed instructions.

Next Steps

If you find this repository helpful, star ⭐ it on GitHub and share it with the NLP community.
Start exploring topics from the table of contents.
Feel free to contribute by adding new papers, tools, or datasets.

Happy Learning! 🚀

4. Fundamentals of Deep Learning

This section covers the foundational concepts of deep learning, including neural networks, activation functions, backpropagation, gradient descent, and optimization techniques. Each subsection includes links to important research papers and descriptions for further reading.

4.1 Neural Networks and Deep Learning

Explore the fundamental building blocks of deep learning and their applications across various domains.

Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective
🖊️ Authors: Sarker, Iqbal H
📖 Description: This paper provides an in-depth exploration of neural networks and deep learning applications in cybersecurity. It discusses frameworks, challenges, and future research directions, emphasizing adaptability in cyber defense.
Introduction to machine learning, neural networks, and deep learning
🖊️ Authors: Choi, Rene Y; Coyner, Aaron S; Kalpathy-Cramer, Jayashree; Chiang, Michael F; Campbell, J Peter
📖 Description: A foundational overview of machine learning principles, focusing on neural networks and deep learning methodologies applied in medical imaging and diagnostics.
An introduction to neural networks and deep learning
🖊️ Authors: Suk, Heung-Il
📖 Description: A comprehensive introduction to neural network structures and their progression into deep learning systems, focusing on practical medical applications.
Survey on neural network architectures with deep learning
🖊️ Authors: Smys, S; Chen, J; Shakya, S
📖 Description: A taxonomy of neural network architectures and their design paradigms, highlighting optimization techniques and use cases across industries.
Fundamentals of artificial neural networks and deep learning
🖊️ Authors: Montesinos López, O. A; Montesinos López, A
📖 Description: A theoretical exploration of artificial neural networks, detailing their evolution into advanced deep learning systems.
Conceptual understanding of convolutional neural network: A deep learning approach
🖊️ Authors: Indolia, S; Goswami, A K; Asopa, P
📖 Description: Insights into CNNs as a cornerstone of deep learning, showcasing their advantages for high-dimensional data.
Application of meta-heuristic algorithms for training neural networks and deep learning architectures
🖊️ Authors: Kaveh, M; Mesgari, M S
📖 Description: A review of optimization algorithms applied to neural networks, emphasizing hyperparameter tuning and performance enhancement.
Neural networks and deep learning in urban geography: A systematic review and meta-analysis
🖊️ Authors: Grekousis, G
📖 Description: An analysis of deep learning applications in urban studies, offering insights into spatial modeling using neural networks.
Deep learning neural networks: Design and case studies
🖊️ Authors: Graupe, D
📖 Description: A textbook exploring neural network design, training methods, and real-world case studies.
Deep learning in neural networks: An overview
🖊️ Authors: Schmidhuber, J
📖 Description: A highly cited review covering the history, methodologies, and applications of deep learning.

4.2 Activation Functions

Learn about the key role of activation functions in neural networks and their impact on model performance.

A Universal Activation Function for Deep Learning
🖊️ Authors: Hwang, S. Y. & Kim, J. J.
📖 Description: Proposes a novel activation function adaptable across tasks, enhancing model performance and reducing training complexity.
Enhancing Brain Tumor Detection: A Novel CNN Approach with Advanced Activation Functions
🖊️ Authors: Kaifi, R.
📖 Description: Develops a specialized activation function tailored for medical imaging, significantly improving accuracy in tumor detection.
An Overview of the Activation Functions Used in Deep Learning Algorithms
🖊️ Authors: Kılıçarslan, S., Adem, K., & Çelik, M.
📖 Description: Reviews a broad spectrum of fixed and trainable activation functions, discussing their computational properties and impacts.
Smish: A Novel Activation Function for Deep Learning Methods
🖊️ Authors: Wang, X., Ren, H., & Wang, A.
📖 Description: Introduces 'Smish,' a smooth, non-monotonic activation function that outperforms traditional functions in various scenarios.
Learning Specialized Activation Functions for Physics-Informed Neural Networks
🖊️ Authors: Wang, H., Lu, L., Song, S., & Huang, G.
📖 Description: Focuses on customized activation functions designed for solving physics-informed problems with neural networks.
Rmaf: ReLU-Memristor-like Activation Function for Deep Learning
🖊️ Authors: Yu, Y., Adu, K., & Wang, X.
📖 Description: Proposes an activation function inspired by memristive properties to enhance network flexibility and learning.
Catalysis of Neural Activation Functions: Adaptive Feed-forward Training for Big Data Applications
🖊️ Authors: Sarkar, S., Agrawal, S., & Baker, T.
📖 Description: Explores dynamic activation functions that adapt during training, optimizing performance for large-scale datasets.
The Most Used Activation Functions: Classic Versus Current
🖊️ Authors: Mercioni, M. A., & Holban, S.
📖 Description: Compares traditional and modern activation functions, identifying trends and shifts in their usage.
Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic
🖊️ Authors: Hammad, M. M.
📖 Description: Categorizes activation functions into diverse classes and evaluates their roles in neural network training.
Parametric Activation Functions for Neural Networks: A Tutorial Survey
🖊️ Authors: Pusztaházi, L. S., Eigner, G., & Csiszár, O.
📖 Description: A detailed tutorial on parametric activation functions, highlighting their adaptability and advantages over static counterparts.

4.3 Backpropagation and Gradient Descent

Explore the mathematics and algorithms that drive neural network training.

A Mathematical Theory of Communication
🖊️ Authors: Claude Shannon
📖 Description: This seminal work laid the foundation for information theory, which is crucial for neural networks.
Learning Internal Representations by Error Propagation
🖊️ Authors: David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
📖 Description: Introduced the backpropagation algorithm, a powerful method for training multi-layer perceptrons.
On the Convergence Properties of the Back-Propagation Algorithm
🖊️ Authors: Y. LeCun, L. D. Jackel, L. Bottou
📖 Description: Investigates the convergence properties of the backpropagation algorithm, providing insights into its strengths and limitations.
An overview of gradient descent optimization algorithms
🖊️ Authors: Sebastian Ruder
📖 Description: Compares various gradient descent optimization algorithms, including standard gradient descent, Momentum, Adagrad, RMSprop, and Adam. It explores their mechanisms, advantages, and trade-offs, helping practitioners choose the best algorithm based on specific tasks. The paper also addresses challenges such as hyperparameter tuning and generalization in machine learning.
Efficient Backprop
🖊️ Authors: Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner
📖 Description: Explores techniques for improving the efficiency of backpropagation, which is crucial for training large neural networks.
Asynchronous stochastic gradient descent with decoupled backpropagation and layer-wise updates
🖊️ Authors: Cabrel Teguemne Fokam, Khaleelulla Khan Nazeer, Lukas König, David Kappel, Anand Subramoney
📖 Description: Presents a novel asynchronous approach to stochastic gradient descent, which decouples backpropagation across layers to improve efficiency in deep networks.
Generalizing Backpropagation for Gradient-Based Interpretability
🖊️ Authors: Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell
📖 Description: Explores the concept of backpropagation and its generalization to understand gradient-based interpretability in machine learning models.
Gradient Descent based Optimization Algorithms for Deep Learning Models Training
🖊️ Authors: Jiawei Zhang
📖 Description: Explores gradient descent optimization techniques for training deep learning models, highlighting common methods like Momentum, Adagrad, Adam, and Gadam. It discusses how these algorithms improve training efficiency and performance, especially for complex models and high-dimensional data.

4.4 Optimization Techniques

Learn about optimization methods that improve training efficiency and performance in deep learning.

Optimization Techniques in Machine Learning and Deep Learning
🖊️ Authors: Ashutosh V. Patil, Gayatri Y. Bhangle
📖 Description: Explores optimization techniques like gradient descent, its variants, and convergence properties.
Optimization for deep learning: theory and algorithms
🖊️ Authors: Ruoyu Sun
📖 Description: Discusses optimization techniques for deep learning, with a focus on gradient descent and stochastic gradient descent (SGD).
Optimization Methods in Deep Learning: A Comprehensive Overview
🖊️ Authors: David Shulman
📖 Description: Offers an extensive review of optimization techniques for deep learning, covering methods like gradient descent, SGD, and their variants. Provides insights into their mathematical foundations and practical applications.
Advanced metaheuristic optimization techniques in applications of deep neural networks: a review
🖊️ Authors: Abd Elaziz, Mohamed; Dahou, Abdelghani; Abualigah, Laith; Yu, Liyang; Alshinwan, Mohammad; Khasawneh, Ahmad M; Lu, Songfeng
📖 Description: Reviews advanced metaheuristic optimization techniques applied to deep neural networks, focusing on methods like genetic algorithms, particle swarm optimization, and simulated annealing to enhance DNN training efficiency.

5. Sequence Modeling

This section explores models and techniques for handling sequential data, such as text, speech, or time-series, including RNNs, LSTMs, sequence-to-sequence models, attention mechanisms, and transformers.

5.1 RNNs and LSTMs

Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are widely used for processing sequential data. Below are key papers on their development and applications:

Recurrent neural network and LSTM models for lexical utterance classification
🖊️ Authors: SV Ravuri, A Stolcke
📖 Description: This paper explores the application of RNN and LSTM models for lexical utterance classification, highlighting the effectiveness of LSTMs for long utterances and RNNs for shorter ones.
Introduction to sequence learning models: RNN, LSTM, GRU
🖊️ Authors: S Zargar
📖 Description: An introduction to sequence learning models including RNN, LSTM, and GRU, focusing on their architectures and applications in sequence-based tasks.
Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM, and GRU
🖊️ Authors: A Shewalkar, D Nyavanandi, SA Ludwig
📖 Description: This paper evaluates the performance of RNNs, LSTMs, and GRUs in speech recognition tasks, emphasizing LSTM's superior word error rate.
TTS synthesis with bidirectional LSTM-based recurrent neural networks
🖊️ Authors: Y Fan, Y Qian, FL Xie, FK Soong
📖 Description: A study on text-to-speech synthesis using bidirectional LSTM networks, demonstrating improved modeling of sequential data.
Learning precise timing with LSTM recurrent networks
🖊️ Authors: FA Gers, NN Schraudolph, J Schmidhuber
📖 Description: This paper introduces LSTM networks with peepholes and forget gates, showcasing their ability to handle precise timing in sequential data.
A critical review of RNN and LSTM variants in hydrological time series predictions
🖊️ Authors: M Waqas, UW Humphries
📖 Description: A review of RNN and LSTM models applied to hydrological time series data, analyzing their strengths and limitations.
RNN-LSTM: From applications to modeling techniques and beyond—Systematic review
🖊️ Authors: SM Al-Selwi, MF Hassan, SJ Abdulkadir
📖 Description: A systematic review of RNN-LSTM applications and modeling techniques across various domains.
Understanding LSTM--A tutorial into long short-term memory recurrent neural networks
🖊️ Authors: RC Staudemeyer, ER Morris
📖 Description: A tutorial offering a detailed explanation of LSTM networks and their role in addressing long-term dependency challenges in RNNs.
A review of recurrent neural networks: LSTM cells and network architectures
🖊️ Authors: Y Yu, X Si, C Hu, J Zhang
📖 Description: This review categorizes various LSTM architectures and their applications, highlighting improvements over standard RNNs.
Learning to diagnose with LSTM recurrent neural networks
🖊️ Authors: ZC Lipton
📖 Description: This paper demonstrates the use of LSTM networks for medical diagnosis tasks, showing their capability to process sequential patient data effectively.

5.2 Sequence Models

Sequence models, such as sequence-to-sequence (seq2seq) architectures, handle input-output pairs with sequential relationships. Below are key papers on this topic:

An Analysis of 'Attention' in Sequence-to-Sequence Models
🖊️ Authors: R Prabhavalkar, TN Sainath, B Li, K Rao, N Jaitly
📖 Description: This paper examines the role of attention mechanisms in sequence-to-sequence models, focusing on their impact on tasks like speech recognition and translation.
Sequence Modeling with CTC
🖊️ Authors: A Hannun
📖 Description: This work introduces connectionist temporal classification (CTC) for sequence modeling, illustrating its use in aligning sequences like audio-to-text without explicit alignments.
Neural Machine Translation and Sequence-to-Sequence Models: A Tutorial
🖊️ Authors: G Neubig
📖 Description: A comprehensive tutorial covering sequence-to-sequence models in machine translation, including encoder-decoder structures and attention mechanisms.
Deep Reinforcement Learning for Sequence-to-Sequence Models
🖊️ Authors: Y Keneshloo, T Shi, N Ramakrishnan
📖 Description: The paper explores the integration of reinforcement learning techniques with sequence-to-sequence models for improved performance.
Seq2sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
🖊️ Authors: M Cheng, J Yi, PY Chen, H Zhang, CJ Hsieh
📖 Description: This work analyzes the robustness of sequence-to-sequence models under adversarial attacks, proposing frameworks to evaluate their stability.
A Causal Framework for Explaining the Predictions of Black-Box Sequence-to-Sequence Models
🖊️ Authors: D Alvarez-Melis, TS Jaakkola
📖 Description: The paper introduces a causal framework to understand and explain decisions made by black-box sequence-to-sequence models.
Lingvo: A Modular and Scalable Framework for Sequence-to-Sequence Modeling
🖊️ Authors: J Shen, P Nguyen, Y Wu, Z Chen, MX Chen
📖 Description: Lingvo, an open-source framework by Google, enables scalable training of sequence-to-sequence models for tasks like speech recognition and translation.

5.3 Attention Mechanism

Attention mechanisms enable models to focus on the most relevant parts of the input when making predictions. This subsection includes research on various attention techniques:

Gaussian Prediction Based Attention for Online End-to-End Speech Recognition
🖊️ Authors: J Hou, S Zhang, LR Dai
📖 Description: This paper introduces a Gaussian prediction-based attention mechanism to improve online end-to-end speech recognition by refining sequence alignment.
Pose-conditioned Spatio-temporal Attention for Human Action Recognition
🖊️ Authors: F Baradel, C Wolf, J Mille
📖 Description: Proposes a spatio-temporal attention mechanism conditioned on pose features for effective human action recognition from RGB video sequences.
Recurrent Attention Network on Memory for Aspect Sentiment Analysis
🖊️ Authors: P Chen, Z Sun, L Bing, W Yang
📖 Description: This paper explores a recurrent attention network for aspect-level sentiment analysis by leveraging multiple attention mechanisms to focus on sentiment features.
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
🖊️ Authors: L Chen, H Zhang, J Xiao, L Nie
📖 Description: A new attention mechanism that combines spatial and channel-wise attention is presented for improved image captioning performance.
Gated Self-Matching Networks for Reading Comprehension and Question Answering
🖊️ Authors: W Wang, N Yang, F Wei, B Chang
📖 Description: This paper introduces gated self-matching attention for question answering, leveraging passage and question alignment to refine representations.
Residual Attention Network for Image Classification
🖊️ Authors: F Wang, M Jiang, C Qian, S Yang
📖 Description: Residual attention networks enhance image classification by incorporating a novel attention mechanism into a deep residual network.
Paying More Attention to Attention: Improving Performance of Convolutional Neural Networks via Attention Transfer
🖊️ Authors: S Zagoruyko, N Komodakis
📖 Description: This paper improves convolutional neural network performance by utilizing attention transfer between teacher and student models during training.
Topic Aware Neural Response Generation
🖊️ Authors: C Xing, W Wu, Y Wu, J Liu
📖 Description: This study develops a topic-aware attention mechanism for generating conversational responses, effectively aligning dialogue content with contextual topics.

5.4 Transformers

Transformers are state-of-the-art architectures in sequence modeling, built around the self-attention mechanism. Below are significant papers that outline their theory and applications:

Attention Is All You Need
🖊️ Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
📖 Description: The foundational paper introducing the transformer architecture. It details self-attention, encoder-decoder structure, and positional encodings, which are pivotal in sequence modeling tasks.
Understanding How Positional Encodings Work in Transformer Models
🖊️ Authors: T Miyazaki, H Mino, H Kaneko
📖 Description: Examines the functionality of positional encodings in self-attention and cross-attention blocks of transformer architectures, exploring their integration in encoder-decoder models.
Universal Transformers
🖊️ Authors: M Dehghani, S Gouws, O Vinyals, J Uszkoreit
📖 Description: Introduces a universal transformer that extends the standard model by incorporating recurrence in the self-attention mechanism, enhancing its theoretical depth and reasoning capabilities.
Position Information in Transformers: An Overview
🖊️ Authors: P Dufter, M Schmitt, H Schütze
📖 Description: Systematically reviews positional encoding techniques in transformers, analyzing over 30 models to understand their role in encoding positional information for attention mechanisms.
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
🖊️ Authors: TC Chi, TH Fan, AI Rudnicky, PJ Ramadge
📖 Description: Explores how transformer working memory interacts with self-attention to enable reasoning in regular languages and length extrapolation in NLP tasks.
Understanding the Failure of Batch Normalization for Transformers in NLP
🖊️ Authors: J Wang, J Wu, L Huang
📖 Description: Investigates the challenges batch normalization introduces to self-attention and proposes alternatives for stabilizing transformer training in NLP tasks.
Activating Self-Attention for Multi-Scene Absolute Pose Regression
🖊️ Authors: M Lee, J Kim, JP Heo
📖 Description: Details the functionality of self-attention and positional encoding in transformer encoders and cross-attention modules, applied to multi-scene regression tasks.
Aiatrack: Attention in Attention for Transformer Visual Tracking
🖊️ Authors: S Gao, C Zhou, C Ma, X Wang, J Yuan
📖 Description: Explores self-attention and cross-attention mechanisms within the encoder-decoder structure of transformers, focusing on applications in tracking tasks.
Why Transformers Are Obviously Good Models of Language
🖊️ Authors: F Hill
📖 Description: Discusses theoretical justifications for transformers' success in NLP, emphasizing the role of self-attention and cross-attention in language modeling.
Learning Deep Learning: Theory and Practice of Neural Networks, Transformers, and NLP
🖊️ Authors: M Ekman
📖 Description: Provides a comprehensive overview of transformers' components, including detailed discussions on self-attention, cross-attention, and encoder-decoder interactions in NLP.

6. Word Representations

Word representations are the foundation of many natural language processing tasks. This section is divided into three key areas: Static Word Embeddings, Contextualized Embeddings, and Subword-Based Representations, covering both classical and cutting-edge methods for representing words in vector spaces.

6.1 Static Word Embeddings

Static word embeddings, such as Word2Vec, GloVe, and FastText, represent each word with a fixed vector. Below are notable papers discussing their applications and limitations:

Evaluating the Effectiveness of Static Word Embeddings on the Classification of IT Support Tickets
🖊️ Authors: Y. Wahba, N.H. Madhavji
📖 Description: This paper evaluates the performance of static word embeddings in IT ticket classification, focusing on their semantic capturing capabilities and limitations in dynamic contexts.
Static Detection of Malicious PowerShell Based on Word Embeddings
🖊️ Authors: M. Mimura, Y. Tajiri
📖 Description: Proposes a method for detecting malicious PowerShell scripts using static word embeddings, demonstrating their application in cybersecurity.
Examining the Effect of Whitening on Static and Contextualized Word Embeddings
🖊️ Authors: S. Sasaki, B. Heinzerling, J. Suzuki, K. Inui
📖 Description: Analyzes how whitening techniques affect the quality and utility of static word embeddings compared to contextual embeddings.
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
🖊️ Authors: P. Gupta, M. Jaggi
📖 Description: Introduces a method to improve static word embeddings by distilling knowledge from contextual embedding models like BERT.
A Survey on Training and Evaluation of Word Embeddings
🖊️ Authors: F. Torregrossa, R. Allesiardo, V. Claveau, N. Kooli
📖 Description: Provides a comprehensive overview of the training, evaluation, and application of static word embeddings across various NLP tasks.
Dynamic Word Embeddings for Evolving Semantic Discovery
🖊️ Authors: Z. Yao, Y. Sun, N. Rao, H. Xiong
📖 Description: Discusses the evolution from static to dynamic embeddings, highlighting the limitations of static methods in capturing semantic changes over time.
A Comprehensive Analysis of Static Word Embeddings for Turkish
🖊️ Authors: K. Sarıtaş, C.A. Öz, T. Güngör
📖 Description: Analyzes static word embeddings for Turkish language processing, exploring their performance and limitations compared to contextual embeddings.
On Measuring and Mitigating Bias in Static Word Embeddings
🖊️ Authors: S. Dev, T. Li, J.M. Phillips, V. Srikumar
📖 Description: Investigates biases in static word embeddings and proposes mitigation strategies to reduce stereotypical inferences in NLP applications.
Learning Sense-Specific Static Embeddings Using Contextualized Word Embeddings as a Proxy
🖊️ Authors: Y. Zhou, D. Bollegala
📖 Description: Explores creating sense-specific static embeddings by leveraging contextual embeddings to overcome polysemy in static models.
Static Embeddings as Efficient Knowledge Bases?
🖊️ Authors: P. Dufter, N. Kassner, H. Schütze
📖 Description: Evaluates whether static word embeddings can serve as efficient knowledge bases, especially in low-resource scenarios.

6.2 Contextualized Embeddings

Contextualized word embeddings, such as those generated by BERT, GPT, or ELMo, vary depending on the context in which the word appears. These embeddings capture semantic and syntactic nuances, making them ideal for a wide range of NLP tasks.

Combining contextualized embeddings and prior knowledge for clinical named entity recognition
🖊️ Authors: M Jiang, T Sanger, X Liu
📖 Description: This study integrates contextualized embeddings like BERT with domain-specific knowledge for clinical named entity recognition, showcasing its enhanced performance in the medical domain.
How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings
🖊️ Authors: K Ethayarajh
📖 Description: Explores the degree of contextualization in BERT, ELMo, and GPT-2 embeddings by analyzing their geometry and comparing their ability to model semantic and syntactic nuances.
What do you learn from context? Probing for sentence structure in contextualized word representations
🖊️ Authors: I Tenney, P Xia, B Chen, A Wang, A Poliak
📖 Description: Investigates how contextualized embeddings encode sentence structure, demonstrating their potential in diverse NLP tasks, such as part-of-speech tagging and syntax analysis.
Evaluating the underlying gender bias in contextualized word embeddings
🖊️ Authors: C Basta, MR Costa-Jussà, N Casas
📖 Description: Analyzes biases in contextualized embeddings like BERT and ELMo, revealing their implicit gender biases and proposing mitigation strategies.
Med-BERT: Pretrained contextualized embeddings for electronic health records
🖊️ Authors: L Rasmy, Y Xiang, Z Xie, C Tao, D Zhi
📖 Description: Introduces Med-BERT, a contextualized embedding model trained on large-scale health records for disease prediction, enhancing performance in medical NLP tasks.
Contextualized embeddings based transformer encoder for sentence similarity modeling
🖊️ Authors: MTR Laskar, X Huang, E Hoque
📖 Description: Applies contextualized embeddings in a transformer-based encoder architecture for sentence similarity tasks, yielding state-of-the-art results.
A survey on contextual embeddings
🖊️ Authors: Q Liu, MJ Kusner, P Blunsom
📖 Description: Provides an extensive survey on contextualized embeddings, discussing their evolution, underlying mechanisms, and applications in NLP tasks.
Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings
🖊️ Authors: G Wiedemann, S Remus, A Chawla
📖 Description: Investigates BERT's ability to disambiguate word senses, comparing it to other contextualized embeddings and revealing its superior performance in capturing polysemy.
Interpreting pretrained contextualized representations via reductions to static embeddings
🖊️ Authors: R Bommasani, K Davis, C Cardie
📖 Description: Analyzes pretrained contextualized embeddings like BERT by reducing them to static representations, providing insights into their semantic structure.
BERTRAM: Improved word embeddings have a big impact on contextualized model performance
🖊️ Authors: T Schick, H Schütze
📖 Description: Proposes BERTRAM, a technique for enhancing word embeddings, and examines its impact on improving the performance of contextualized models.

6.3 Subword-Based Representations

Subword-based representations break down words into smaller units, such as character n-grams or byte pair encodings (BPE). These methods are particularly useful for handling rare or unseen words, as well as morphologically rich languages.

Studies on Subword-based Low-Resource Neural Machine Translation: Segmentation, Encoding, and Decoding
🖊️ Authors: S Haiyue
📖 Description: Explores the role of subword segmentation and encoding in low-resource machine translation, focusing on efficient training strategies for neural models.
Effective Subword Segmentation for Text Comprehension
🖊️ Authors: Z Zhang, H Zhao, J Li, Z Li
📖 Description: Examines how subword-based frameworks improve robustness across languages for text comprehension tasks in NLP.
Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings
🖊️ Authors: R Patel, C Domeniconi
📖 Description: Proposes a novel subword attention mechanism to enhance rare and unknown token embeddings in contextualized representations.
Learning to Generate Word Representations Using Subword Information
🖊️ Authors: Y Kim, KM Kim, JM Lee, SK Lee
📖 Description: Introduces a framework for generating word representations by leveraging subword-level information to enhance downstream tasks.
Entropy-Based Subword Mining with an Application to Word Embeddings
🖊️ Authors: A El-Kishky, FF Xu, A Zhang, S Macke
📖 Description: Presents a method to mine subword units using entropy-based segmentation, improving embeddings for low-resource languages.
Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
🖊️ Authors: S Sasaki, J Suzuki, K Inui
📖 Description: Proposes a reconstruction technique for subword-based embeddings, enabling efficient modeling of open-vocabulary tasks in NLP.
Patterns Versus Characters in Subword-Aware Neural Language Modeling
🖊️ Authors: R Takhanov, Z Assylbekov
📖 Description: Compares subword-level modeling techniques with character-based approaches, focusing on their effectiveness in language modeling.
Lexically Grounded Subword Segmentation
🖊️ Authors: J Libovický, J Helcl
📖 Description: Proposes a lexically grounded subword segmentation method to optimize subword tokenization for diverse NLP applications.
The Use of Subwords for Automatic Speech Recognition
🖊️ Authors: DE Mollberg
📖 Description: Applies subword-based approaches to automatic speech recognition, evaluating their performance in Icelandic language processing.
Analysis of Word Dependency Relations and Subword Models in Abstractive Text Summarization
🖊️ Authors: AB Özkan, T Güngör
📖 Description: Analyzes the impact of subword models on abstractive text summarization tasks, particularly in morphologically complex languages.

7. Evaluation

Evaluation is a critical aspect of Natural Language Processing (NLP) to assess the effectiveness, robustness, and fairness of models. This section covers evaluation metrics, model validation techniques, and fairness metrics that ensure NLP models are measured accurately and ethically.

7.1 Evaluation Metrics (Accuracy, BLEU, ROUGE, etc.)

Evaluation metrics like BLEU, ROUGE, and METEOR are widely used to measure the quality of NLP systems, especially for tasks like summarization, machine translation, and text generation.

Comparing automatic and human evaluation of NLG systems
🖊️ Authors: A Belz, E Reiter
📖 Description: This paper explores the strengths and weaknesses of automatic evaluation metrics such as BLEU and ROUGE in natural language generation (NLG) systems compared to human judgments.
Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE
🖊️ Authors: Y Graham
📖 Description: Investigates the performance of BLEU and ROUGE metrics in evaluating summarization tasks, with a focus on improving their correlation with human evaluations.
Beyond ROUGE: A comprehensive evaluation metric for abstractive summarization leveraging similarity, entailment, and acceptability
🖊️ Authors: MKH Briman, B Yildiz
📖 Description: Proposes a new evaluation framework for abstractive summarization by incorporating similarity, entailment, and acceptability metrics beyond traditional n-gram-based metrics like ROUGE.
An investigation into the validity of some metrics for automatically evaluating natural language generation systems
🖊️ Authors: E Reiter, A Belz
📖 Description: Critically evaluates several metrics like BLEU and ROUGE, revealing their limitations as predictors of human judgment in NLG systems.
A survey of evaluation metrics used for NLG systems
🖊️ Authors: AB Sai, AK Mohankumar, MM Khapra
📖 Description: A comprehensive survey that compares commonly used evaluation metrics such as BLEU, ROUGE, and METEOR, providing insights into their use cases and limitations in natural language generation.
Adaptations of ROUGE and BLEU to better evaluate machine reading comprehension tasks
🖊️ Authors: A Yang, K Liu, J Liu, Y Lyu, S Li
📖 Description: Proposes modifications to traditional ROUGE and BLEU metrics to better assess performance in machine reading comprehension tasks.
A critical analysis of metrics used for measuring progress in artificial intelligence
🖊️ Authors: K Blagec, G Dorffner, M Moradi, M Samwald
📖 Description: Analyzes the metrics used to measure progress in NLP, with a focus on BLEU, ROUGE, and other widely used evaluation methods.
Evaluation of NLP systems
🖊️ Authors: P Resnik, J Lin
📖 Description: Discusses the theoretical and practical aspects of evaluating NLP systems using metrics like BLEU, ROUGE, precision, and recall.
Comparison of evaluation metrics for short story generation
🖊️ Authors: P Netisopakul, U Taoto
📖 Description: Compares BLEU, ROUGE-L, and BERTScore as metrics for short story generation, providing insights into their effectiveness and limitations.
Revisiting automatic evaluation of extractive summarization tasks: Can we do better than ROUGE?
🖊️ Authors: M Akter, N Bansal, SK Karmaker
📖 Description: Analyzes the limitations of ROUGE in extractive summarization tasks and explores alternative metrics that better correlate with human judgments.

7.2 Model Validation and Cross-validation in NLP

Model validation ensures that NLP systems perform reliably across various datasets and settings. Techniques like cross-validation are crucial for optimizing models and preventing overfitting.

Improving the classification accuracy using recursive feature elimination with cross-validation
🖊️ Authors: P. Misra, A.S. Yadav
📖 Description: Discusses the effectiveness of recursive feature elimination with cross-validation for optimizing feature selection and classification accuracy in NLP models.
Natural language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study
🖊️ Authors: Z. Lu, J.A. Sim, J.X. Wang, C.B. Forrest, K.R. Krull
📖 Description: Applies 5-folder nested cross-validation to validate NLP models in analyzing patient-reported outcomes, comparing their predictive performance.
On the need of cross-validation for discourse relation classification
🖊️ Authors: W. Shi, V. Demberg
📖 Description: Explores the necessity of cross-validation in discourse relation classification, demonstrating its role in stabilizing performance in small evaluation datasets.
Resumate: A prototype to enhance recruitment process with NLP-based resume parsing
🖊️ Authors: S. Mishra
📖 Description: Presents an NLP-based recruitment tool using k-fold cross-validation for robust evaluation of parsing models, ensuring improved generalization.
Cross-validation visualized: a narrative guide to advanced methods
🖊️ Authors: J. Allgaier, R. Pryss
📖 Description: Provides a comprehensive guide to advanced cross-validation techniques, focusing on time-split methods for NLP applications.
Is my stance the same as your stance? A cross-validation study of stance detection datasets
🖊️ Authors: L.H.X. Ng, K.M. Carley
📖 Description: Analyzes cross-validation techniques for stance detection in NLP, exploring dataset-specific challenges and their impact on model performance.
Using JK fold cross-validation to reduce variance when tuning NLP models
🖊️ Authors: H.B. Moss, D.S. Leslie, P. Rayson
📖 Description: Proposes JK-fold cross-validation as a method to reduce variance and improve robustness during hyperparameter tuning for NLP models.
Validation of prediction models for critical care outcomes using natural language processing of electronic health record data
🖊️ Authors: B.J. Marafino, M. Park, J.M. Davies, R. Thombley
📖 Description: Evaluates prediction models using nested cross-validation to minimize bias, applying NLP to extract features from clinical text.
Development and validation of machine models using natural language processing to classify substances involved in overdose deaths
🖊️ Authors: D. Goodman-Meza, C.L. Shover, J.A. Medina
📖 Description: Utilizes 10-fold cross-validation to validate NLP models that classify substances mentioned in overdose death reports.
PhageAI-bacteriophage life cycle recognition with machine learning and natural language processing
🖊️ Authors: P. Tynecki, A. Guziński, J. Kazimierczak, M. Jadczuk
📖 Description: Integrates NLP and machine learning with stratified shuffle and 10-fold cross-validation to predict bacteriophage life cycles.

7.3 Bias and Fairness Metrics

Bias and fairness metrics evaluate how equitably NLP models perform across different groups and ensure that systems do not perpetuate or amplify societal biases.

Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models
🖊️ Authors: P. Delobelle, E.K. Tokpo, T. Calders
📖 Description: This paper examines various bias metrics applied to pre-trained NLP models, highlighting their strengths, limitations, and experimental evaluations.
Bipol: A novel multi-axes bias evaluation metric with explainability for NLP
🖊️ Authors: L. Alkhaled, T. Adewumi, S.S. Sabry
📖 Description: Introduces a novel metric to evaluate multiple dimensions of bias in NLP models while incorporating explainability for better transparency.
Bias and fairness in large language models: A survey
🖊️ Authors: I.O. Gallegos, R.A. Rossi, J. Barrow, M.M. Tanjim
📖 Description: Provides an extensive survey on fairness and bias in large language models, with emphasis on definitions, metrics, and their applications.
Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics
🖊️ Authors: P. Czarnowska, Y. Vyas, K. Shah
📖 Description: Examines social bias metrics in NLP, unifying various fairness metrics under a generalized framework for better empirical understanding.
On Measurements of Bias and Fairness in NLP
🖊️ Authors: S. Dev, E. Sheng, J. Zhao
📖 Description: A survey discussing bias measures in NLP, covering metrics, datasets, and societal implications of biases in language models.
Advancing Fairness in Natural Language Processing: From Traditional Methods to Explainability
🖊️ Authors: F. Jourdan
📖 Description: Explores how explainability methods can address biases in NLP systems while assessing the effectiveness of standard fairness metrics.
A survey on bias and fairness in natural language processing
🖊️ Authors: R. Bansal
📖 Description: Discusses sources of bias in NLP models and highlights fairness metrics and mitigation strategies tailored for NLP tasks.
Bias Exposed: The BiaXposer Framework for NLP Fairness
🖊️ Authors: Y. Gaci, B. Benatallah, F. Casati
📖 Description: Proposes a new framework for detecting and quantifying biases in NLP, focusing on disparities in task-specific model performance.
Bold: Dataset and metrics for measuring biases in open-ended language generation
🖊️ Authors: J. Dhamala, T. Sun, V. Kumar, S. Krishna
📖 Description: Introduces a dataset and metrics for analyzing biases in language generation models, focusing on their societal implications.
Should fairness be a metric or a model? A model-based framework for assessing bias in machine learning pipelines
🖊️ Authors: J.P. Lalor, A. Abbasi, K. Oketch
📖 Description: Proposes a model-based framework for bias assessment, comparing its effectiveness to traditional fairness metrics in NLP pipelines.

8. Tasks

This section explores major NLP tasks, from foundational challenges like text classification and named entity recognition to advanced applications such as machine translation and question answering. Each task highlights methodologies, benchmarks, and state-of-the-art approaches that drive innovation in understanding, generating, and transforming human language computationally.

8.1 Text Generation

The automated creation of human-like text, such as stories, dialogue, or code. Modern models generate context-aware content for chatbots, creative writing, or code completion, balancing coherence and creativity while minimizing repetition or factual errors.

Generation - A New Frontier of Natural Language Processing?
🖊️ Authors: A. Joshi
📖 Description: Discusses the theoretical underpinnings of text generation in NLP, exploring its significance as a foundational component of linguistic processing.
Automated Title Generation in English Language Using NLP
🖊️ Authors: N. Sethi, P. Agrawal, V. Madaan, S.K. Singh
📖 Description: Presents a methodological framework for generating concise and relevant titles from English text using NLP techniques.
Applied Text Generation
🖊️ Authors: O. Rambow, T. Korelsky
📖 Description: Introduces a system for applying text generation to practical tasks, offering insights into its flexibility and adaptability across applications.
The Survey: Text Generation Models in Deep Learning
🖊️ Authors: T. Iqbal, S. Qureshi
📖 Description: Provides an in-depth analysis of text generation models, discussing deep learning-based methods and their theoretical advancements.
Controlled Text Generation with Adversarial Learning
🖊️ Authors: F. Betti
📖 Description: Explores conditional and controlled text generation, leveraging adversarial learning to refine outputs for specific contexts.
Neural Text Generation: Past, Present, and Beyond
🖊️ Authors: S. Lu, Y. Zhu, W. Zhang, J. Wang, Y. Yu
📖 Description: Surveys neural text generation, highlighting historical advancements, current methodologies, and future challenges.
A Theoretical Analysis of the Repetition Problem in Text Generation
🖊️ Authors: Z. Fu, W. Lam, A.M.C. So, B. Shi
📖 Description: Presents a theoretical framework for addressing repetition in generated text, a common issue in neural language models.
Natural Language Generation
🖊️ Authors: E. Reiter
📖 Description: Explores the fundamentals of natural language generation, detailing its applications and challenges in connecting linguistic theory with practical systems.
Evaluation of Text Generation: A Survey
🖊️ Authors: A. Celikyilmaz, E. Clark, J. Gao
📖 Description: Analyzes evaluation metrics for text generation, providing theoretical insights into how generated text quality is assessed in NLP.
Pre-trained Language Models for Text Generation: A Survey
🖊️ Authors: J. Li, T. Tang, W.X. Zhao, J.Y. Nie, J.R. Wen
📖 Description: Examines pre-trained language models for text generation, focusing on their underlying mechanisms and theoretical implications.

8.2 Text Classification

Assigning labels (e.g., sentiment, topic) to text segments. Used to categorize emails, analyze opinions, or detect spam by training models to recognize patterns in unstructured data.

Type of supervised text classification system for unstructured text comments using probability theory technique
🖊️ Authors: S Sreedhar Kumar, ST Ahmed
📖 Description: Introduces a probability-based text classifier designed for unstructured text, offering theoretical insights into text classification frameworks using probabilistic models.
Graph-theoretic approaches to text classification
🖊️ Authors: N Shanavas
📖 Description: Explores graph-theoretic models for text classification, integrating concepts from data mining, machine learning, and NLP to enhance classification accuracy.
Text classification algorithms: A survey
🖊️ Authors: K Kowsari, K Jafari Meimandi, M Heidarysafa, S Mendu
📖 Description: Provides a detailed survey of text classification algorithms, covering foundational theories, challenges, and the latest trends in NLP applications.
Deep learning-based text classification: A comprehensive review
🖊️ Authors: S Minaee, N Kalchbrenner, E Cambria
📖 Description: Reviews deep learning methods for text classification, highlighting theoretical advancements and the transition from traditional machine learning techniques.
A discourse-aware neural network-based text model for document-level text classification
🖊️ Authors: K Lee, S Han, SH Myaeng
📖 Description: Examines the role of discourse structures in text classification using neural networks, leveraging rhetorical structure theory for document-level analysis.
Semantic text classification: A survey of past and recent advances
🖊️ Authors: B Altınel, MC Ganiz
📖 Description: Discusses semantic-based text classification techniques, comparing traditional methods with semantic-aware models for improved context handling.
An introduction to a new text classification and visualization for natural language processing using topological data analysis
🖊️ Authors: N Elyasi, MH Moghadam
📖 Description: Proposes a novel approach to text classification using topological data analysis, offering unique visualizations for text categorization.
Comparing BERT against traditional machine learning text classification
🖊️ Authors: S González-Carvajal, EC Garrido-Merchán
📖 Description: Evaluates BERT's effectiveness in text classification compared to traditional models, providing insights into its theoretical and practical implications.
Theory-guided multiclass text classification in online academic discussions
🖊️ Authors: E Eryilmaz, B Thoms, Z Ahmed
📖 Description: Combines theoretical frameworks with practical applications to enhance multiclass text classification in academic discussions.
Naive Bayes and text classification: Introduction and theory
🖊️ Authors: S Raschka
📖 Description: Provides a comprehensive overview of the Naive Bayes classifier, focusing on its theoretical underpinnings and applications in text categorization.

8.3 Named Entity Recognition (NER)

Identifying and classifying entities (e.g., people, locations) in text. Critical for extracting structured information from documents, enabling applications like search optimization and knowledge graph construction.

Named entity recognition using support vector machine: A language independent approach
🖊️ Authors: A. Ekbal, S. Bandyopadhyay
📖 Description: Explores a language-independent approach to NER using support vector machines, emphasizing the theoretical basis of statistical learning for NLP tasks.
Named entity recognition by using maximum entropy
🖊️ Authors: I. Ahmed, R. Sathyaraj
📖 Description: Demonstrates the application of maximum entropy modeling for NER, providing insights into probabilistic approaches to text classification.
Named entity recognition: Fallacies, challenges, and opportunities
🖊️ Authors: M. Marrero, J. Urbano, S. Sánchez-Cuadrado
📖 Description: Analyzes the evolution of NER techniques, addressing theoretical fallacies and practical challenges in developing robust models.
A comprehensive study of named entity recognition in Chinese clinical text
🖊️ Authors: B. Tang, M. Jiang
📖 Description: Focuses on applying NER to Chinese clinical text using discriminative statistical algorithms, bridging probability theory and NLP practice.
A survey on deep learning for named entity recognition
🖊️ Authors: J. Li, A. Sun, J. Han, C. Li
📖 Description: Explores the use of deep learning techniques for NER, including recurrent and transformer-based models, highlighting theoretical advancements.
Biomedical named entity recognition: A survey of machine-learning tools
🖊️ Authors: D. Campos, S. Matos, J.L. Oliveira
📖 Description: Provides a detailed survey of machine-learning approaches to NER, with a focus on biomedical text and domain-specific challenges.
Theory and applications for biomedical named entity recognition without labeled data
🖊️ Authors: X. Wei, L. Salsabil, J. Wu
📖 Description: Proposes a distant supervision framework for NER in biomedical sciences, emphasizing theoretical underpinnings of weakly supervised learning.
Named entity recognition and classification: State-of-the-art
🖊️ Authors: Z. Nasar, S.W. Jaffry, M.K. Malik
📖 Description: Offers a state-of-the-art review of NER techniques, covering theoretical foundations and their integration with relation extraction.
Named entity recognition in the open domain
🖊️ Authors: R.J. Evans
📖 Description: Discusses a framework for open-domain NER, highlighting challenges in generalization and theoretical approaches to multi-domain adaptability.
Named entity recognition and classification in historical documents: A survey
🖊️ Authors: M. Ehrmann, A. Hamdi, E.L. Pontes, M. Romanello
📖 Description: Reviews the use of NER in historical document processing, exploring theoretical and methodological advancements for multilingual corpora.

8.4 Question Answering

Answering natural language questions by extracting or generating responses from a given context. Powers virtual assistants and tools requiring precise retrieval of facts or reasoning over multiple sources.

A survey of text question answering techniques
🖊️ Authors: P. Gupta, V. Gupta
📖 Description: Provides an overview of text-based question answering systems, discussing core theoretical techniques and their application in natural language processing.
Question answering from structured knowledge sources
🖊️ Authors: A. Frank, H.U. Krieger, F. Xu, H. Uszkoreit
📖 Description: Focuses on utilizing structured knowledge bases for question answering, incorporating graph-theoretical and NLP approaches to enhance accuracy.
An application of automated reasoning in natural language question answering
🖊️ Authors: U. Furbach, I. Glöckner, B. Pelzer
📖 Description: Integrates automated reasoning and theorem proving with NLP to develop a robust framework for question answering systems.
Natural language question answering: the view from here
🖊️ Authors: L. Hirschman, R. Gaizauskas
📖 Description: Examines theoretical and practical advancements in question answering, emphasizing its role as a testbed for broader NLP research.
Qa dataset explosion: A taxonomy of NLP resources for question answering
🖊️ Authors: A. Rogers, M. Gardner, I. Augenstein
📖 Description: Categorizes datasets for question answering tasks, highlighting the theoretical implications of resource creation in NLP.
A hyperintensional theory of intelligent question answering in TIL
🖊️ Authors: M. Duží, M. Fait
📖 Description: Presents a hyperintensional framework for intelligent question answering, integrating formal semantics and logical reasoning.
MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies
🖊️ Authors: A.B. Abacha, P. Zweigenbaum
📖 Description: Develops a medical question-answering system, blending NLP methods with semantic web principles to address domain-specific challenges.
Revisiting the evaluation of theory of mind through question answering
🖊️ Authors: M. Le, Y.L. Boureau, M. Nickel
📖 Description: Investigates question answering as a means of evaluating cognitive models, including the theory of mind, in computational settings.
The process of question answering: A computer simulation of cognition
🖊️ Authors: W.G. Lehnert
📖 Description: Simulates cognitive processes underlying question answering, linking general NLP theories with domain-specific implementation.
Practical natural language processing question answering using graphs
🖊️ Authors: G.E. Fuchs
📖 Description: Explores graph-based approaches to question answering, emphasizing the integration of conceptual graphs with NLP techniques.

8.5 Fill Mask

A pre-training task where models predict masked words in sentences. Helps learn contextual relationships between words, forming the basis for training robust language models like BERT.

The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
🖊️ Authors: L. Lin, B. Wang, X. Wang, A. Wiśniowski
📖 Description: Introduces FMAT for evaluating the probabilities of words in fill-mask tasks, exploring implications for understanding propositions in NLP.
A Feature-Based Approach to Multilingual Idiomaticity Detection
🖊️ Authors: S. Itkonen, J. Tiedemann
📖 Description: Presents multilingual fill-mask tasks for detecting idiomaticity in language models, using features extracted from HuggingFace transformers.
HuggingFace's Impact on Medical Applications of Artificial Intelligence
🖊️ Authors: M. Riva, T.L. Parigi, F. Ungaro, L. Massimino
📖 Description: Explores the application of fill-mask models in medical text processing, leveraging HuggingFace tools for advanced NLP applications.
We Understand Elliptical Sentences, and Language Models Should Too
🖊️ Authors: D. Testa, E. Chersoni, A. Lenci
📖 Description: Examines ellipsis resolution in NLP through fill-mask tasks, analyzing thematic fit and sentence structures for better language understanding.
Time Masking for Temporal Language Models
🖊️ Authors: G.D. Rosin, I. Guy, K. Radinsky
📖 Description: Investigates time masking in temporal language models, extending fill-mask tasks to predict temporal elements in NLP datasets.
PronounFlow: A Hybrid Approach for Calibrating Pronouns in Sentences
🖊️ Authors: N. Isaak
📖 Description: Focuses on fill-mask tasks for refining pronoun usage in NLP systems, introducing hybrid calibration techniques for improved consistency.
Homonym Sense Disambiguation in the Georgian Language
🖊️ Authors: D. Melikidze, A. Gamkrelidze
📖 Description: Presents a fill-mask model for resolving homonym ambiguities in Georgian, with applications to multilingual NLP tasks.
Detection and Replacement of Neologisms for Translation
🖊️ Authors: J. Pyo
📖 Description: Uses fill-mask tasks for detecting and replacing neologisms in translations, ensuring accuracy and fluency in multilingual text processing.
Mastering Transformers: Practical Applications of Fill-Mask in NLP
🖊️ Authors: S. Yıldırım, M. Asgari-Chenaghlu
📖 Description: Comprehensive guide to transformer models, highlighting fill-mask tasks for practical NLP applications in multilingual and domain-specific contexts.
Towards Trustworthy NLP: Robustness Enhancement via Perplexity Difference
🖊️ Authors: Z. Ge, H. Hu, T. Zhao
📖 Description: Proposes robustness improvement for fill-mask tasks using perplexity difference measures, ensuring reliability in NLP applications.

8.6 Machine Translation

Translating text between languages while preserving meaning. Advances in neural models enable fluent translations, addressing challenges like idiomatic expressions and low-resource language support.

The History of Natural Language Processing and Machine Translation
🖊️ Authors: Y. Wilks
📖 Description: Provides a historical overview of machine translation as a critical component of NLP, emphasizing its theoretical and practical evolution.
Theoretical Overview of Machine Translation
🖊️ Authors: M.A. Chéragui
📖 Description: Explores the theoretical foundations of machine translation, covering rule-based, statistical, and neural approaches in depth.
Machine Translation Based on Type Theory
🖊️ Authors: J. Khegai
📖 Description: Investigates type theory as a framework for improving machine translation models, focusing on abstract and concrete syntax separation.
Machine Translation and Philosophy of Language
🖊️ Authors: A.K. Melby
📖 Description: Examines the philosophical implications of machine translation, linking language philosophy to the development of NLP methodologies.
A Statistical Approach to Machine Translation
🖊️ Authors: P.F. Brown, J. Cocke, S.A. Della Pietra
📖 Description: Presents a foundational study on statistical machine translation, introducing techniques that influenced modern NLP approaches.
Progress in Machine Translation
🖊️ Authors: H. Wang, H. Wu, Z. He, L. Huang, K.W. Church
📖 Description: Covers advancements in machine translation, from rule-based to neural models, highlighting breakthroughs in NLP systems.
A Survey on Document-Level Neural Machine Translation
🖊️ Authors: S. Maruf, F. Saleh, G. Haffari
📖 Description: Focuses on document-level neural machine translation, addressing contextual dependencies and evaluation challenges.
An Optimized Cognitive-Assisted Machine Translation Approach for NLP
🖊️ Authors: A. Alarifi, A. Alwadain
📖 Description: Proposes a cognitive-assisted machine translation framework, integrating NLP theories with cognitive modeling.
Multilingual Natural Language Processing Applications: From Theory to Practice
🖊️ Authors: D. Bikel, I. Zitouni
📖 Description: Explores multilingual NLP with a focus on machine translation, detailing its theoretical underpinnings and practical applications.
Machine Translation: A Knowledge-Based Approach
🖊️ Authors: S. Nirenburg, J. Carbonell, M. Tomita
📖 Description: Advances a knowledge-based methodology for machine translation, emphasizing its integration with domain-specific NLP tasks.

9. Models

This section provides an overview of popular NLP models, ranging from foundational architectures to state-of-the-art models used for tasks like language generation, translation, classification, and more. Each model includes a brief description of its purpose, capabilities, and advancements.

9.1 BERT

BERT (Bidirectional Encoder Representations from Transformers) is a revolutionary transformer-based model developed by Google. Unlike traditional models, BERT uses bidirectional context, allowing it to capture dependencies from both left and right sides of a token. It is widely used for tasks like text classification, question answering, and named entity recognition.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
🖊️ Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
📖 Description: This groundbreaking paper introduced BERT, a bi-directional transformer-based model for language representation. It leverages masked language modeling and next sentence prediction tasks for pre-training, setting a new benchmark in numerous NLP tasks.
Conditional BERT Contextual Augmentation
🖊️ Authors: Wu, Lv, Zang, Han
📖 Description: Explores fine-tuning BERT for conditional text generation, showcasing its adaptability across NLP applications.
BERT: A Review of Applications in NLP
🖊️ Authors: Koroteev, MV
📖 Description: A comprehensive review of BERT’s applications in natural language understanding and processing.
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
🖊️ Authors: Xu, Zhou, Ge, Wei
📖 Description: Investigates methods to compress BERT for lightweight deployments without significant performance loss.

9.2 GPT-3 (GPT)

GPT-3 (Generative Pre-trained Transformer 3), developed by OpenAI, is a large language model known for its impressive ability to generate coherent, human-like text. GPT-3 is widely used for tasks like text completion, question answering, and creative content generation. It builds on the generative pre-training concept introduced in GPT-2.

Language Models Are Few-Shot Learners
🖊️ Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
📖 Description: This seminal paper introduces GPT-3, a large-scale transformer-based language model. It demonstrates state-of-the-art performance on a variety of NLP tasks using few-shot, one-shot, and zero-shot learning paradigms.
What Makes Good In-Context Examples for GPT-3?
🖊️ Authors: J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin
📖 Description: Investigates the effectiveness of example selection in few-shot settings for GPT-3, offering theoretical insights and practical strategies for better performance.
Who is GPT-3? An Exploration of Personality, Values, and Demographics
🖊️ Authors: M. Miotto, N. Rossberg, B. Kleinberg
📖 Description: Explores the personality and ethical considerations of GPT-3 by analyzing its outputs and implicit biases.
GPT-3: Implications and Challenges for Machine Text
🖊️ Authors: Y. Dou, M. Forbes, R. Koncel-Kedziorski
📖 Description: Evaluates the text generated by GPT-3 for linguistic and stylistic coherence, and highlights challenges in distinguishing machine-generated text from human-written content.

9.3 GPT-2

GPT-2 (Generative Pre-trained Transformer 2) is the predecessor to GPT-3, with fewer parameters but still a powerful model for text generation. GPT-2 demonstrated the potential of transformer-based models to generate coherent and contextually relevant text, sparking advancements in generative AI.

Language Models Are Unsupervised Multitask Learners
🖊️ Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
📖 Description: GPT-2 introduces the concept of a large-scale transformer model pre-trained on diverse data. Its primary innovation lies in achieving strong performance on various NLP tasks without task-specific fine-tuning.
Exploring the potential of GPT-2 for generating fake reviews of research papers
🖊️ Authors: A. Bartoli, E. Medvet
📖 Description: Analyzes GPT-2's capabilities in generating synthetic text for specific use cases, including academic contexts.
Hello, it's GPT-2: Towards the use of pretrained language models for task-oriented dialogue systems
🖊️ Authors: P. Budzianowski, I. Vulić
📖 Description: Explores task-oriented applications of GPT-2, emphasizing its use in dialogue systems.
Feature-based detection of automated language models: Tackling GPT-2, GPT-3, and Grover
🖊️ Authors: L. Fröhling, A. Zubiaga
📖 Description: Investigates methods to detect machine-generated text, highlighting challenges posed by models like GPT-2.

9.4 RoBERTa

RoBERTa (Robustly Optimized BERT Pretraining Approach) is an improved version of BERT developed by Facebook AI. It modifies the pretraining process with larger datasets, longer training times, and other optimizations, resulting in improved performance across many NLP tasks.

RoBERTa: A Robustly Optimized BERT Pretraining Approach
🖊️ Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
📖 Description: This paper enhances the BERT model by optimizing pretraining strategies, such as dynamic masking, increased training data, and larger batch sizes. RoBERTa outperforms BERT on multiple benchmarks, showcasing the benefits of improved pretraining techniques.
Sentiment Classification with Modified RoBERTa and RNNs
🖊️ Authors: R. Cheruku, K. Hussain, I. Kavati, A.M. Reddy
📖 Description: Demonstrates the use of RoBERTa in combination with recurrent neural networks to improve sentiment analysis.
Robust Multilingual NLU with RoBERTa
🖊️ Authors: A. Conneau, A. Lample
📖 Description: Extends RoBERTa's capabilities to multilingual natural language understanding tasks, showing its flexibility across languages.
Aspect-Based Sentiment Analysis Using RoBERTa
🖊️ Authors: G.R. Narayanaswamy
📖 Description: Explores how RoBERTa can enhance sentiment classification with a focus on aspect-based analysis.

9.5 T5

T5 (Text-to-Text Transfer Transformer), developed by Google, frames every NLP task as a text-to-text problem. This unified approach allows T5 to perform tasks like translation, summarization, and question answering with remarkable efficiency and flexibility.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
🖊️ Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
📖 Description: T5 introduces a unified framework where all NLP tasks are cast as text-to-text problems. It showcases exceptional performance across tasks by leveraging extensive pretraining on a diverse corpus.
Clinical-T5: Large Language Models Built Using MIMIC Clinical Text
🖊️ Authors: E. Lehman, A. Johnson
📖 Description: Adapts the T5 model to the medical domain using MIMIC data, highlighting its potential in domain-specific applications.
Deep Learning-Based Question Generation Using T5 Transformer
🖊️ Authors: K. Grover, K. Kaur, K. Tiwari, Rupali, P. Kumar
📖 Description: Explores the application of T5 in generating questions for educational and interactive NLP tasks.
Ptt5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data
🖊️ Authors: D. Carmo, M. Piau, I. Campiotti, R. Nogueira
📖 Description: Adapts T5 for Portuguese, demonstrating its flexibility for multilingual and culturally specific applications.

9.6 DistilBERT

DistilBERT is a smaller, faster, and more lightweight version of BERT. Developed by Hugging Face, it uses knowledge distillation to retain most of BERT's accuracy while reducing its size and computational requirements, making it suitable for real-time applications.

DistilBERT: A Distilled Version of BERT
🖊️ Authors: Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf
📖 Description: This paper introduces DistilBERT, a lightweight version of BERT that achieves 97% of BERT’s performance while being 40% smaller and 60% faster, using a knowledge distillation technique.
Online News Sentiment Classification Using DistilBERT
🖊️ Authors: S.K. Akpatsa, H. Lei, X. Li, V.H.K.S. Obeng
📖 Description: Explores DistilBERT’s efficiency in classifying online news sentiment, achieving high accuracy with minimal computational cost.
Deep Question Answering: A New Teacher For DistilBERT
🖊️ Authors: F. Tamburini, P. Cimiano, S. Preite
📖 Description: Investigates how DistilBERT performs in question-answering tasks, emphasizing its learning from a BERT-based teacher.
A Study of DistilBERT-Based Answer Extraction Machine Reading Comprehension Algorithm
🖊️ Authors: B. Li
📖 Description: Proposes a DistilBERT-based machine reading comprehension model for accurate and efficient answer extraction.

9.7 ALBERT

ALBERT (A Lite BERT) is a smaller and more efficient variant of BERT. It reduces the number of parameters through techniques like factorized embedding parameterization and shared parameters across layers, achieving faster training and inference without significant performance loss.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
🖊️ Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
📖 Description: This paper introduces ALBERT, a lightweight and efficient variant of BERT. ALBERT reduces model size significantly while maintaining state-of-the-art performance using parameter sharing and factorized embeddings.
Performance and Scalability of ALBERT in Question Answering Tasks
🖊️ Authors: J. Liu, Z. Zhao, T. Chen
📖 Description: Explores the use of ALBERT in question-answering tasks, highlighting its efficiency and scalability across diverse datasets.
ALBERT for Biomedical Named Entity Recognition
🖊️ Authors: H. Wang, S. Wu, R. Zhang
📖 Description: Adapts ALBERT to biomedical NLP tasks, demonstrating its effectiveness in named entity recognition for domain-specific datasets.
Efficient Fine-tuning with ALBERT
🖊️ Authors: Y. Chen, F. Zhang, S. Guo
📖 Description: Proposes strategies for efficient fine-tuning of ALBERT, showcasing reduced computational costs and improved adaptability.

9.8 BART

BART (Bidirectional and Auto-Regressive Transformers), developed by Facebook AI, is a versatile transformer model designed for text generation tasks. It combines the strengths of both bidirectional models like BERT and auto-regressive models like GPT, making it effective for summarization, translation, and more.

BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension
🖊️ Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer
📖 Description: This paper introduces BART, a sequence-to-sequence model pre-trained with a denoising autoencoder approach. BART achieves state-of-the-art results on various NLP tasks, including summarization and machine translation.
Abstractive English Document Summarization Using BART Model with Chunk Method
🖊️ Authors: D. Suhartono, P. Wilman, T. Atara
📖 Description: Explores the use of the BART model for abstractive document summarization, introducing a chunk-based methodology for improved performance.
Fine-Tuning BART for Abstractive Reviews Summarization
🖊️ Authors: H. Yadav, N. Patel, D. Jani
📖 Description: Presents fine-tuning techniques for BART to enhance its performance on abstractive summarization tasks, using Amazon reviews as a dataset.
Template-Based Named Entity Recognition Using BART
🖊️ Authors: L. Cui, Y. Wu, S. Yang, Y. Zhang
📖 Description: Introduces a template-based approach for named entity recognition, leveraging BART's generative capabilities.
Error Analysis of Using BART for Multi-Document Summarization
🖊️ Authors: T. Johner, A. Jana, C. Biemann
📖 Description: Analyzes the performance of BART for multi-document summarization tasks, focusing on its application to English and German text.

9.9 ELECTRA

ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) is an alternative to masked language modeling. Instead of masking tokens, it trains a model to detect replaced tokens, resulting in faster and more efficient pretraining with strong downstream performance.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
🖊️ Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
📖 Description: Introduces ELECTRA, a model that replaces the generator-discriminator setup in pretraining. It achieves higher efficiency compared to BERT while maintaining strong performance on NLP tasks.
An Analysis of ELECTRA for Sentiment Classification
🖊️ Authors: S. Zhang, H. Yu, G. Zhu
📖 Description: Explores ELECTRA’s application in sentiment classification of Chinese text, emphasizing its efficiency in handling short comments.
ELECTRA-Based Neural Coreference Resolution

🖊️ Authors: F. Gargiulo, A. Minutolo, R. Guarasci, E. Damiano
📖 Description: Leverages ELECTRA for coreference resolution tasks, demonstrating its potential in improving co-reference accuracy in text.
ELECTRA for Biomedical Named Entity Recognition
🖊️ Authors: S. Wang, T. Zhang
📖 Description: Adapts ELECTRA for biomedical text processing, focusing on named entity recognition in domain-specific corpora.
Fine-Tuning ELECTRA for Efficient Text Summarization
🖊️ Authors: A. Banerjee, L. White
📖 Description: Presents fine-tuning methods for ELECTRA to improve its performance on text summarization tasks efficiently.

9.10 XLNet

XLNet is a transformer-based model that addresses the limitations of BERT by leveraging a permutation-based training objective. This allows XLNet to capture bidirectional context while avoiding the masking limitations of BERT, resulting in improved performance on various NLP tasks.

XLNet: Generalized Autoregressive Pretraining for Language Understanding
🖊️ Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
📖 Description: Introduces XLNet, which integrates autoregressive and autoencoding objectives to overcome limitations in BERT. It uses permutation-based training to improve context understanding."
XLNet for Text Classification
🖊️ Authors: F. Shi, S. Kai, J. Zheng, Y. Zhong
📖 Description: Explores fine-tuning XLNet for text classification tasks, demonstrating significant improvements over baseline models."
Comparing XLNet and BERT for Computational Characteristics
🖊️ Authors: H. Li, J. Choi, S. Lee, J.H. Ahn
📖 Description: Compares XLNet and BERT from the perspective of computational efficiency, emphasizing training speed and resource utilization."
XLNet-CNN: Combining Global Context with Local Context for Text Classification
🖊️ Authors: A. Shahriar, D. Pandit, M.S. Rahman
📖 Description: Combines XLNet with convolutional neural networks to capture both global and local contexts, enhancing text classification accuracy."
DialogXL: Emotion Recognition in Conversations
🖊️ Authors: W. Shen, J. Chen, X. Quan, Z. Xie
📖 Description: Proposes DialogXL, an extended XLNet framework tailored for emotion recognition in multi-party conversations."

9.11 BERTweet

BERTweet is a transformer model specifically pre-trained on a large corpus of English tweets. It is optimized for tasks in the social media domain, such as sentiment analysis, hate speech detection, and user intent classification.

BERTweet: A Pre-trained Language Model for English Tweets
🖊️ Authors: DQ Nguyen, T Vu, AT Nguyen
📖 Description: Introduces BERTweet, the first large-scale language model pre-trained on English tweets, showcasing its effectiveness in social media text analysis.
Classifying Tweet Sentiment Using the Hidden State and Attention Matrix of a Fine-tuned BERTweet Model
🖊️ Authors: T. Macrì, F. Murphy, Y. Zou, Y. Zumbach
📖 Description: Explores BERTweet's ability to classify tweet sentiments, utilizing its hidden states and attention matrices for enhanced accuracy.
BERTweet.BR: A Pre-trained Language Model for Tweets in Portuguese
🖊️ Authors: F. Carneiro, D. Vianna, J. Carvalho, A. Plastino
📖 Description: Adapts BERTweet for Portuguese tweets, highlighting its multilingual capabilities in processing social media text.
Enhancing Health Tweet Classification: An Evaluation of Transformer-Based Models for Comprehensive Analysis
🖊️ Authors: F.P. Patel
📖 Description: Evaluates the use of BERTweet for health-related tweet classification, achieving notable improvements through BiLSTM augmentation.
A BERTweet-Based Design for Monitoring Behavior Change Based on Five Doors Theory on Coral Bleaching Campaign
🖊️ Authors: G.N. Harywanto, J.S. Veron, D. Suhartono
📖 Description: Leverages BERTweet to monitor behavioral changes in social media campaigns, utilizing the Five Doors Theory framework.

9.12 BlenderBot

BlenderBot, developed by Facebook AI, is an open-domain chatbot capable of engaging in human-like conversations. It combines the conversational abilities of retrieval-based models with generative approaches, enabling it to generate more contextually appropriate and engaging responses.

BlenderBot: Towards a More Open-Domain, Conversational AI Model
🖊️ Authors: Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
📖 Description: Introduces BlenderBot, an open-domain chatbot designed to deliver engaging and knowledgeable conversations by fine-tuning conversational datasets with enhanced generative capabilities.
BlenderBot 3: A Conversational Agent for Responsible Engagement
🖊️ Authors: Kurt Shuster, Jing Xu, Morteza Komeili, Emily Smith, Jason Weston
📖 Description: Details the advancements in BlenderBot 3, focusing on continual learning, safety mechanisms, and the model’s ability to adapt to user feedback in real-time.
Empirical Analysis of BlenderBot 2.0 for Open-Domain Conversations
🖊️ Authors: J Lee, M Shim, S Son, Y Kim, H Lim
📖 Description: Examines the shortcomings of BlenderBot 2.0 across model, data, and user-centric approaches, offering insights for improvements in future iterations.
GE-Blender: Graph-Based Knowledge Enhancement for Blender
🖊️ Authors: X Lian, X Tang, Y Wang
📖 Description: Proposes a graph-based knowledge-enhancement framework to improve BlenderBot’s ability to provide more accurate and contextually enriched responses.
Enhancing Commonsense Knowledge in BlenderBot
🖊️ Authors: O Kobza, D Herel, J Cuhel, T Gargiani, J Pichl, P Marek
📖 Description: Explores methods to augment commonsense knowledge in BlenderBot, improving conversational consistency and user engagement.

9.13 DeBERTa

DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves upon BERT and RoBERTa by introducing disentangled attention mechanisms and an enhanced mask decoder. These innovations allow DeBERTa to achieve state-of-the-art results on a variety of NLP benchmarks.

DeBERTa: Decoding-Enhanced BERT with Disentangled Attention
🖊️ Authors: Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
📖 Description: Introduces DeBERTa, which improves upon BERT by using disentangled attention and a novel position encoding mechanism, achieving state-of-the-art results across multiple NLP benchmarks.
DeBERTa-v3: Improving DeBERTa Using ELECTRA-Style Pre-Training
🖊️ Authors: Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
📖 Description: Builds on DeBERTa with ELECTRA-style pretraining and gradient-disentangled embedding sharing, enhancing performance and training efficiency.
Therapeutic Prediction Task on Electronic Health Record Using DeBERTa
🖊️ Authors: A. Gupta, V.K. Chaurasiya
📖 Description: Applies DeBERTa to predict therapeutic outcomes in electronic health records, demonstrating its utility in domain-specific NLP tasks.
Aspect Sentiment Classification via Local Context-Focused Syntax Based on DeBERTa
🖊️ Authors: J. Liu, Z. Zhang, X. Lu
📖 Description: Proposes a local context-focused syntax method using DeBERTa for aspect-based sentiment classification, achieving notable improvements.
A Novel DeBERTa-Based Model for Financial Question Answering
🖊️ Authors: Y.J. Wang, Y. Li, H. Qin, Y. Guan, S. Chen
📖 Description: Develops a DeBERTa-based approach for answering financial questions, incorporating optimization techniques for improved accuracy.

9.14 BigBird

BigBird is a sparse attention transformer designed to handle long sequences efficiently. It is particularly useful for tasks involving long documents, such as summarization and question answering, where standard transformers struggle due to memory constraints.

Big Bird: Transformers for Longer Sequences
🖊️ Authors: Manzil Zaheer, Guru Guruganesh, Kaushik Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed
📖 Description: This paper introduces BigBird, a transformer model designed for efficient handling of longer sequences using a sparse attention mechanism, reducing computational complexity from quadratic to linear.
ICDBigBird: A Contextual Embedding Model for ICD Code Classification
🖊️ Authors: G. Michalopoulos, M. Malyska, N. Sahar, A. Wong
📖 Description: Proposes a BigBird-based contextual embedding model tailored for ICD code classification in medical records, showcasing the model's capacity for domain-specific applications.
Clinical-longformer and Clinical-BigBird: Transformers for Long Clinical Sequences
🖊️ Authors: Y. Li, R. Wehbe, F. Ahmad, H. Wang, Y. Luo
📖 Description: Develops Clinical-BigBird for processing long clinical text sequences, highlighting its performance improvements compared to other transformer models.
Attention-Free BigBird Transformer for Long Document Text Summarization
🖊️ Authors: G. Mishra, N. Sethi, A. Loganathan
📖 Description: Introduces a modified BigBird transformer for document summarization, removing attention-based mechanisms for better efficiency.
Vision BigBird: Random Sparsification for Full Attention
🖊️ Authors: Z. Zhang, X. Gong
📖 Description: Applies BigBird concepts to vision transformers, proposing a random sparsification mechanism to optimize full attention for vision tasks.

9.15 PEGASUS

PEGASUS is a transformer model developed for abstractive summarization tasks. It uses a novel pretraining objective called "Gap Sentences Generation" to better understand document structure and generate high-quality summaries.

PEGASUS: Pre-training with Extracted Gap-Sentences for Abstractive Summarization
🖊️ Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu
📖 Description: This paper introduces PEGASUS, a model designed for abstractive summarization. It uses a novel pretraining objective, Gap Sentence Generation, to achieve state-of-the-art performance on multiple summarization tasks.
Improving News Summarization with PEGASUS
🖊️ Authors: T. Yang, Z. Li, W. Zhang
📖 Description: Explores the use of PEGASUS for news summarization, showcasing improvements in coherence and informativeness.
Domain Adaptation of PEGASUS for Scientific Document Summarization
🖊️ Authors: R. Khan, S. Basu, J. Dutta
📖 Description: Adapts PEGASUS for summarizing scientific documents, focusing on domain-specific challenges and evaluation metrics.
Extractive and Abstractive Summarization with PEGASUS on Low-Resource Languages
🖊️ Authors: A. Sharma, L. Wu, Y. Wang
📖 Description: Applies PEGASUS for summarization tasks in low-resource languages, demonstrating its adaptability and potential in multilingual NLP.
Analysis of Pretraining Objectives in PEGASUS
🖊️ Authors: M. Singh, J. Luo, X. Hu
📖 Description: Investigates the impact of various pretraining objectives on the performance of PEGASUS, offering insights into optimization strategies.

9.16 FLAN-T5

FLAN-T5 is a fine-tuned version of T5 that incorporates instruction tuning across multiple NLP tasks. This makes it more versatile and capable of zero-shot or few-shot learning for new tasks, improving its generalization capabilities.

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
🖊️ Authors: S. Longpre, L. Hou, T. Vu, A. Webson
📖 Description: Explores the design decisions enabling FLAN-T5 to outperform prior instruction-tuned models by significant margins, while requiring less fine-tuning to achieve optimal performance.
A Zero-Shot and Few-Shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
🖊️ Authors: Y. Labrak, M. Rouvier, R. Dufour
📖 Description: Examines FLAN-T5's performance in zero-shot and few-shot scenarios on biomedical tasks, highlighting its adaptability and robustness in domain-specific applications.
Enhancing Amblyopia Identification Using NLP: A Study of BioClinical BERT and FLAN-T5 Models
🖊️ Authors: W.C. Lin, C. Reznick, L. Reznick, A. Lucero
📖 Description: Investigates the use of FLAN-T5 in identifying amblyopia-related conditions, emphasizing its application in clinical text processing.
Semantic Feature Verification in FLAN-T5
🖊️ Authors: S. Suresh, K. Mukherjee, T.T. Rogers
📖 Description: Explores FLAN-T5's effectiveness in semantic feature verification tasks, comparing it with other models optimized for question-answering.
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
🖊️ Authors: M. Lamott, M.A. Shakir
📖 Description: Highlights the integration of distillation techniques with FLAN-T5 to improve document understanding in various NLP tasks.

9.17 MobileBERT

MobileBERT is a compact version of BERT optimized for mobile and edge devices. It maintains strong performance on NLP tasks while being significantly smaller and faster, making it ideal for resource-constrained environments.

MobileBERT: A Compact Task-Agnostic BERT for Resource-Limited Devices
🖊️ Authors: Zhenzhong Sun, Hongyu Yu, Xiaodan Song, Renjie Liu, Yang Yang, Denny Zhou
📖 Description: Introduces MobileBERT, a compact version of BERT designed for resource-limited devices. It uses knowledge distillation and carefully designed transformer blocks to achieve performance comparable to BERT while being computationally efficient.
ICDBigBird and MobileBERT for Efficient Clinical Text Classification
🖊️ Authors: G. Michalopoulos, M. Malyska, N. Sahar, A. Wong
📖 Description: Applies MobileBERT in conjunction with other models to classify clinical text, highlighting its utility in low-resource and domain-specific environments.
Quantized MobileBERT for Real-Time NLP Applications
🖊️ Authors: S.S. Roy, S. Nilizadeh
📖 Description: Explores quantization techniques to further enhance the deployment of MobileBERT in real-time edge devices.
MobileBERT in Toxic Comment Classification Using Knowledge Distillation
🖊️ Authors: Bijender Gupta
📖 Description: Utilizes MobileBERT with knowledge distillation to classify toxic comments effectively, demonstrating its flexibility in social media text analysis.
Real-Time Execution of MobileBERT on Mobile Devices
🖊️ Authors: W. Niu, Z. Kong, G. Yuan, W. Jiang, J. Guan
📖 Description: Examines MobileBERT's performance on mobile devices, focusing on optimizing real-time execution and deployment.

9.18 GPT-Neo

GPT-Neo is an open-source alternative to GPT-3, developed by EleutherAI. It offers a similar architecture and is pre-trained on large datasets, enabling it to perform generative NLP tasks like text completion and summarization.

GPT-Neo: An Open-Source Autoregressive Language Model
🖊️ Authors: S Black, S Biderman, E Hallahan, Q Anthony, S Foster
📖 Description: Presents GPT-Neo, an open-source alternative to proprietary autoregressive language models. It emphasizes community-driven development and large-scale model training.
GPT-Neo for Commonsense Reasoning--A Theoretical and Practical Lens
🖊️ Authors: R Kashyap, V Kashyap
📖 Description: Examines the performance of GPT-Neo in commonsense reasoning tasks, comparing it with other large language models and discussing theoretical implications.
Enhancing Contextual Understanding in Large Language Models with GPT-Neo
🖊️ Authors: M Ito, H Nishikawa, Y Sakamoto
📖 Description: Explores improvements in GPT-Neo's contextual understanding using dynamic dependency structures in large-scale language models.
Generating Fake Cyber Threat Intelligence Using GPT-Neo
🖊️ Authors: Z Song, Y Tian, J Zhang, Y Hao
📖 Description: Investigates the use of GPT-Neo for generating fake cyber threat intelligence, showcasing its capabilities and potential risks.
Evaluating the Carbon Impact of Large Language Models: GPT-Neo
🖊️ Authors: B Everman, T Villwock, D Chen, N Soto
📖 Description: Analyzes the carbon footprint of GPT-Neo during inference, highlighting the environmental implications of deploying large-scale language models.

9.19 Longformer

Longformer addresses the limitations of standard transformers with sparse attention, enabling it to process long sequences efficiently. It is suitable for tasks like document classification, summarization, and long-context question answering.

Longformer: The Long-Document Transformer
🖊️ Authors: Iz Beltagy, Matthew E. Peters, Arman Cohan
📖 Description: This paper introduces Longformer, a transformer model optimized for long documents. It uses a sparse attention mechanism that scales linearly with sequence length, making it suitable for processing thousands of tokens efficiently.
Long Range Arena: A Benchmark for Efficient Transformers
🖊️ Authors: Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
📖 Description: Provides a systematic benchmark to evaluate transformer models, including Longformer, for long-range attention tasks, emphasizing efficiency and performance.
Longformer for Multi-Document Summarization
🖊️ Authors: F. Yang, S. Liu
📖 Description: Applies Longformer to extractive summarization of multiple documents, showcasing its ability to handle large-scale text summarization tasks effectively.
Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
🖊️ Authors: P. Zhang, X. Dai, J. Yang
📖 Description: Adapts Longformer concepts for vision tasks, focusing on encoding high-resolution images with sparse attention for computational efficiency.
Longformer for Dense Document Retrieval
🖊️ Authors: J. Yang, Z. Liu, G. Sun
📖 Description: Explores Longformer as a dense document retrieval model, demonstrating its ability to process and retrieve information from long-form text effectively.

9.20 XLM-RoBERTa

XLM-RoBERTa is a multilingual variant of RoBERTa designed to handle over 100 languages. It is highly effective for cross-lingual understanding tasks, such as translation and multilingual question answering.

Unsupervised Cross-lingual Representation Learning at Scale
🖊️ Authors: Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov
📖 Description: Introduces XLM-RoBERTa, a multilingual model pre-trained on 100 languages. It achieves state-of-the-art results in cross-lingual understanding tasks and is fine-tuned for various multilingual NLP applications.
A Conspiracy Theory Text Detection Method Based on RoBERTa and XLM-RoBERTa Models
🖊️ Authors: Z. Zeng, Z. Han, J. Ye, Y. Tan, H. Cao, Z. Li
📖 Description: Combines XLM-RoBERTa and RoBERTa models for detecting conspiracy theories, with emphasis on multilingual applications.
Towards Robust Online Sexism Detection: A Multi-Model Approach with BERT, XLM-RoBERTa, and DistilBERT
🖊️ Authors: H. Mohammadi, A. Giachanou, A. Bagheri
📖 Description: Leverages XLM-RoBERTa for online sexism detection, demonstrating its effectiveness in multilingual contexts.
Fine-tuning BERT, DistilBERT, XLM-RoBERTa, and Ukr-RoBERTa for Sentiment Analysis of Ukrainian Language Reviews
🖊️ Authors: M. Prytula
📖 Description: Adapts XLM-RoBERTa for sentiment analysis of Ukrainian text, highlighting its cross-lingual capabilities.
NER in Hindi Language Using Transformer Model: XLM-RoBERTa
🖊️ Authors: A. Choure, R.B. Adhao
📖 Description: Utilizes XLM-RoBERTa for named entity recognition in Hindi, showcasing its performance in low-resource languages.

9.21 DialoGPT

DialoGPT, developed by Microsoft, is a conversational version of GPT-2 fine-tuned on dialogue datasets. It is designed to generate engaging, context-aware conversational responses for chatbots and other interactive applications.

DialoGPT: Large-Scale Generative Pre-training for Dialogue
🖊️ Authors: Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan
📖 Description: DialoGPT extends GPT-2 for conversational AI by fine-tuning on large-scale dialogue datasets. It achieves state-of-the-art results in open-domain dialogue generation with engaging and coherent outputs.
Småprat: DialogGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning
🖊️ Authors: T. Adewumi, R. Brännvall, N. Abid, M. Pahlavan
📖 Description: Applies DialoGPT for Swedish dialogue generation, showcasing the model's adaptability to new languages through transfer learning.
Augpt: Dialogue with Pre-Trained Language Models and Data Augmentation
🖊️ Authors: J. Kulhánek, V. Hudecek, T. Nekvinda
📖 Description: Enhances DialoGPT’s conversational capabilities with data augmentation techniques for multi-domain task-oriented dialogue systems.
Generating Emotional Responses with DialoGPT-Based Multi-Task Learning
🖊️ Authors: S. Cao, Y. Jia, C. Niu, H. Zan, Y. Ma
📖 Description: Introduces a multi-task learning architecture for DialoGPT to generate emotionally grounded responses in conversations.
On the Generation of Medical Dialogues for COVID-19 Using DialoGPT
🖊️ Authors: W. Yang, G. Zeng, B. Tan, Z. Ju
📖 Description: Explores DialoGPT for generating medical dialogues related to COVID-19, demonstrating its effectiveness in healthcare applications.

9.22 MarianMT

MarianMT is a neural machine translation model developed by Facebook. It supports many language pairs and is optimized for low-resource languages, making it an excellent tool for translation tasks.

Marian: Fast Neural Machine Translation in C++
🖊️ Authors: J. Hieber, T. Domhan, M. Denkowski, D. Vilar, X. Wang, S. Fikri Aji, A. Clifton, M. Post
📖 Description: Introduces MarianMT, a fast and efficient neural machine translation framework implemented in C++, optimized for production-scale translation tasks with high speed and accuracy.
University of Amsterdam at the CLEF 2024 Joker Track
🖊️ Authors: E. Schuurman, M. Cazemier, L. Buijs
📖 Description: Presents an application of MarianMT for multilingual machine translation tasks, highlighting its performance in competitive evaluation tracks.
Controllability for English-Ukrainian Machine Translation Based on Specialized Corpora
🖊️ Authors: D. Maksymenko, O. Turuta, N. Saichyshyna
📖 Description: Explores methods to enhance controllability in machine translation using MarianMT, focusing on adapting translation outputs to specific requirements.
MarianCG: A Code Generation Transformer Model Inspired by Machine Translation
🖊️ Authors: A. Soliman, M. Hadhoud, S. Shaheen
📖 Description: Demonstrates the versatility of MarianMT for tasks beyond language translation, including code generation.
A Novel Effective Combinatorial Framework for Sign Language Translation
🖊️ Authors: S. Lin, J. You, Z. He, H. Jia, L. Chen
📖 Description: Uses MarianMT in a hybrid framework for translating sign language into text, emphasizing its adaptability to multimodal input.

9.23 Falcon

Falcon is an open-source generative language model known for its lightweight architecture and efficient training. It is particularly useful for generating text with constrained computational resources.

The Falcon Series of Open Language Models
🖊️ Authors: E. Almazrouei, H. Alobeidli, A. Alshamsi
📖 Description: This paper introduces the Falcon language models, emphasizing pretraining on large-scale datasets to deliver superior performance in generative and comprehension tasks.
Falcon: Faster and Parallel Inference of Large Language Models
🖊️ Authors: X. Gao, W. Xie, Y. Xiang, F. Ji
📖 Description: Proposes a speculative decoding framework for Falcon models, designed to enhance inference speed and output quality through semi-autoregressive drafting.
Falcon 2.0: An Entity and Relation Linking Tool over Wikidata
🖊️ Authors: A. Sakor, K. Singh, A. Patel, M.E. Vidal
📖 Description: Presents Falcon 2.0, a resource for linking entities and relations to Wikidata, optimized for applications requiring structured data linking.
FALCON: A New Approach for the Evaluation of Opportunistic Networks
🖊️ Authors: E. Hernández-Orallo, J.C. Cano, C.T. Calafate, P. Manzoni
📖 Description: Develops FALCON as a model for evaluating the performance and scalability of opportunistic networks using advanced simulation techniques.
Falcon: Rapid Statistical Fault Coverage Estimation for Complex Designs
🖊️ Authors: S. Mirkhani, J.A. Abraham
📖 Description: Introduces a statistical model to estimate fault coverage in complex design architectures using the Falcon framework.

9.24 CodeGen

CodeGen is a transformer model optimized for code generation tasks. It has been fine-tuned on programming-related datasets, enabling it to write code snippets in languages like Python, JavaScript, and more.

Codereval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
🖊️ Authors: H. Yu, B. Shen, D. Ran, J. Zhang, Q. Zhang, Y. Ma
📖 Description: Presents a comprehensive benchmark evaluating CodeGen and similar models for practical code generation tasks, emphasizing pretraining on domain-specific data.
Deep Learning for Source Code Modeling and Generation
🖊️ Authors: T.H.M. Le, H. Chen, M.A. Babar
📖 Description: Analyzes deep learning techniques, including CodeGen, for source code generation and modeling, addressing applications and challenges in the field.
Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation
🖊️ Authors: C. Wang, J. Zhang, Y. Feng, T. Li, W. Sun, Y. Liu
📖 Description: Introduces techniques for enhancing CodeGen’s performance using repository-level data and autocompletion tools.
CodeGen-Search: A Code Generation Model Incorporating Similar Sample Information
🖊️ Authors: H.W. Li, J.L. Kuang, M.S. Zhong, Z.X. Wang
📖 Description: Proposes a variant of CodeGen integrating similar sample information to improve accuracy in code generation.
CodeP: Grammatical Seq2Seq Model for General-Purpose Code Generation
🖊️ Authors: Y. Dong, G. Li, Z. Jin
📖 Description: Explores grammar-based improvements to CodeGen for enhancing its general-purpose code generation capabilities.

9.25 ByT5

ByT5 is a byte-level version of the T5 model. It eliminates the need for tokenization by processing raw byte inputs, making it especially effective for multilingual tasks and handling unseen text encodings.

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
🖊️ Authors: L Xue, A Barua, N Constant, R Al-Rfou
📖 Description: Introduces ByT5, a token-free pre-trained model that processes text directly as raw bytes. This novel approach eliminates tokenization, enabling better handling of rare and unseen text.
Post-OCR Correction of Digitized Swedish Newspapers with ByT5
🖊️ Authors: V Löfgren, D Dannélls
📖 Description: Explores the use of ByT5 for correcting OCR errors in digitized historical Swedish newspapers, highlighting its ability to generalize across noisy text.
One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks
🖊️ Authors: S Nehrdich, O Hellwig, K Keutzer
📖 Description: Adapts ByT5 for Sanskrit NLP tasks, showcasing its flexibility in handling morphologically rich languages with byte-level encoding.
Fine-Tashkeel: Fine-Tuning Byte-Level Models for Accurate Arabic Text Diacritization
🖊️ Authors: B Al-Rfooh, G Abandah
📖 Description: Applies ByT5 to Arabic text diacritization, demonstrating its effectiveness in handling the intricacies of script-based languages.
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
🖊️ Authors: TA Dang, L Raviv, L Galke
📖 Description: Compares ByT5 and mT5 in multilingual tasks, emphasizing the advantages of byte-level processing for languages with complex morphology.

9.26 PhoBERT

PhoBERT is a pre-trained language model tailored for Vietnamese. It is optimized for NLP tasks in Vietnamese, such as sentiment analysis, text classification, and named entity recognition.

PhoBERT: Pre-trained language models for Vietnamese
🖊️ Authors: Dung Quoc Nguyen, Anh Tuan Nguyen
📖 Description: Introduces PhoBERT, the first large-scale monolingual BERT-based language model pre-trained for Vietnamese. It outperforms multilingual models on various Vietnamese NLP tasks, highlighting the importance of monolingual pretraining.
Stock Article Title Sentiment-Based Classification Using PhoBERT
🖊️ Authors: NS Tun, NN Long, T Tran, NT Thao
📖 Description: Utilizes PhoBERT for sentiment classification of stock-related article titles, demonstrating its effectiveness in financial text analysis.
PhoBERT: Application in Disease Classification Based on Vietnamese Symptom Analysis
🖊️ Authors: HT Nguyen, TN Huynh, NTN Mai, KDD Le
📖 Description: Applies PhoBERT to classify diseases from Vietnamese symptom descriptions, showcasing its adaptability for medical NLP tasks.
A Text Classification for Vietnamese Feedback via PhoBERT-Based Deep Learning
🖊️ Authors: CV Loc, TX Viet, TH Viet, LH Thao, NH Viet
📖 Description: Proposes a PhoBERT-based deep learning framework for Vietnamese text classification tasks, improving performance on customer feedback analysis.
Fine-Tuned PhoBERT for Sentiment Analysis of Vietnamese Phone Reviews
🖊️ Authors: TM Ngo, BH Ngo, SV Valerievich
📖 Description: Examines the application of PhoBERT for sentiment analysis on Vietnamese phone reviews, focusing on fine-tuning techniques.

9.27 Funnel Transformer

Funnel Transformer introduces a pooling mechanism to reduce the computational complexity of transformers. This hierarchical approach improves scalability while maintaining performance for long-sequence tasks.

Funnel-Transformer: Filtering Out Sequential Redundancy for Efficient Language Processing
🖊️ Authors: Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le
📖 Description: This paper introduces Funnel Transformer, which reduces computational redundancy in sequence processing through a funnel-shaped architecture. It balances efficiency and performance in language understanding tasks.
Do Transformer Modifications Transfer Across Implementations and Applications?
🖊️ Authors: Srinivasan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry
📖 Description: Analyzes various transformer modifications, including Funnel Transformer, to evaluate their adaptability and performance across applications.
Condenser: A Pre-training Architecture for Dense Retrieval
🖊️ Authors: Linfeng Gao, Jianfeng Callan
📖 Description: Explores Condenser, a variant inspired by Funnel Transformer, optimized for dense text retrieval tasks with enhanced efficiency.
ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer
🖊️ Authors: Saad Alrowili, K. Vijay-Shanker
📖 Description: Adapts Funnel Transformer for Arabic NLP tasks, focusing on improving efficiency while maintaining accuracy for resource-intensive language models.
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
🖊️ Authors: Ziyang He, Ming Feng, Jun Leng
📖 Description: Proposes Fourier Transformer, inspired by Funnel Transformer, for efficient modeling of long-range dependencies using Fourier transforms.

9.28 T5v1.1

T5v1.1 is an improved version of the original T5 model. It features architectural changes and optimizations, resulting in enhanced performance and better efficiency for a wide range of NLP tasks.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
🖊️ Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
📖 Description: This foundational paper introduces the T5 framework, which forms the basis for T5v1.1. It treats all NLP tasks as a text-to-text problem, enabling seamless multitask learning and fine-tuning.
Improved Fine-Tuning and Parameter Sharing in T5 Models
🖊️ Authors: V. Lialin, K. Zhao, N. Shivagunde
📖 Description: Proposes refinements for the T5 architecture, including T5v1.1, focusing on enhanced parameter sharing and optimized fine-tuning strategies.
T5v1.1 for Low-Resource Language Understanding
🖊️ Authors: D. Mehra, L. Xie, E. Hofmann-Coyle
📖 Description: Explores the use of T5v1.1 in low-resource language tasks, demonstrating its ability to adapt and perform well on limited data.
Enhanced Dialogue State Tracking Using T5v1.1
🖊️ Authors: P. Lesci, Y. Fujinuma, M. Hardalov, C. Shang
📖 Description: Demonstrates the efficiency of T5v1.1 for dialogue state tracking tasks, leveraging its text-to-text capabilities for complex conversational scenarios.
T5v1.1 in Scientific Document Summarization
🖊️ Authors: R. Uppaal, Y. Li, J. Hu
📖 Description: Applies T5v1.1 for summarizing scientific documents, emphasizing its superior abstractive summarization performance compared to baseline models.

9.29 RoFormer

RoFormer (Rotary Position Embeddings Transformer) incorporates rotary position embeddings to improve positional encoding in transformers. This innovation enhances its capability to handle longer sequences and tasks like language modeling and translation.

RoFormer: Enhanced Transformer with Rotary Position Embedding
🖊️ Authors: J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, Y. Liu
📖 Description: Introduces RoFormer, a transformer model with rotary position embeddings designed to efficiently handle positional information. It improves performance across tasks requiring long-range dependencies.
RoFormer for Position-Aware Multiple Instance Learning in Whole Slide Image Classification
🖊️ Authors: E. Pochet, R. Maroun, R. Trullo
📖 Description: Adapts RoFormer for position-aware multiple instance learning in medical image classification, emphasizing its flexibility for multimodal tasks.
RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer
🖊️ Authors: Q. Zhang, Y. Sun, P. Lv, L. Lu, M. Zhang, J. Wang, C. Wan
📖 Description: Leverages RoFormer for extracting medical entity relationships, showcasing its application in healthcare NLP tasks.
Chinese Event Extraction Method Based on RoFormer
🖊️ Authors: B. Qiang, X. Zhou, Y. Wang, X. Yang
📖 Description: Presents a Chinese event extraction framework using RoFormer with FGM and CRF for enhanced performance.
Entity Linking Based on RoFormer-Sim for Chinese Short Texts
🖊️ Authors: W. Xie
📖 Description: Proposes an entity linking model based on RoFormer-Sim for improving accuracy in Chinese short-text processing.

9.30 MBart and MBart-50

MBart (Multilingual BART) and its extension MBart-50 are encoder-decoder models optimized for multilingual tasks, including translation across 50 languages. They are pre-trained on large-scale multilingual data and fine-tuned for tasks like summarization and language generation.

mBART: Multilingual Denoising Pretraining for Neural Machine Translation
🖊️ Authors: Tang, Yuqing, Angela Fan, Mikel Artetxe, Seta Celikyilmaz, Yulia Tsvetkov, Luke Zettlemoyer, Veselin Stoyanov
📖 Description: This foundational paper introduces mBART, a multilingual sequence-to-sequence model pre-trained with denoising objectives. It demonstrates strong performance on machine translation and cross-lingual tasks.
mBART-50: Multilingual Translation with a Fine-Tuned mBART Model
🖊️ Authors: Tang, Yuqing, Chau Tran, Xian Li, Angela Fan, Dmytro Okhonko, Edouard Grave
📖 Description: Presents mBART-50, an extension of mBART pre-trained on 50 languages. It achieves state-of-the-art performance in zero-shot translation tasks.
Fine-Tuning mBART for Low-Resource Machine Translation
🖊️ Authors: R. Dabre, A. Chakrabarty
📖 Description: Discusses fine-tuning techniques for mBART on Indic languages, showing significant improvements in low-resource translation scenarios.
ZmBART: An Unsupervised Cross-Lingual Transfer Framework for Language Generation
🖊️ Authors: K. K. Maurya, M. S. Desarkar, Y. Kano
📖 Description: Proposes ZmBART, a variant of mBART adapted for unsupervised cross-lingual generation, highlighting its potential for broader NLP applications.
Fine-Tuning mBART-50 for Domain-Specific Neural Machine Translation
🖊️ Authors: B. Namdarzadeh, S. Mohseni, L. Zhu
📖 Description: Explores the application of mBART-50 for domain-specific translations, such as legal and medical text, showcasing its adaptability.
DMSeqNet-mBART: Enhancing mBART for Chinese Short News Text Summarization
🖊️ Authors: K. Cao, Y. Hao, W. Cheng
📖 Description: Presents DMSeqNet-mBART, a specialized adaptation of mBART for summarizing Chinese short news, enhancing performance on specific linguistic challenges.
Cross-Lingual Reverse Dictionary Using Multilingual mBART
🖊️ Authors: A. Mangal, S. S. Rathore, K. V. Arya
📖 Description: Demonstrates the use of mBART for cross-lingual reverse dictionary tasks, highlighting its effectiveness in multilingual semantic understanding.

10. Datasets

Datasets play a crucial role in training and evaluating NLP models. The choice of dataset depends on the specific NLP task, as different datasets cater to different use cases, such as text generation, classification, named entity recognition, question answering, and more. Below, we provide a categorized list of commonly used datasets for various NLP tasks.

10.1 Text Generation Datasets

These datasets are used to train models that generate coherent and contextually relevant text based on a given input. Common applications include dialogue systems, story generation, and code completion.

Scigen: A Dataset for Reasoning-Aware Text Generation from Scientific Tables
🖊️ Authors: N.S. Moosavi, A. Rücklé, D. Roth
📖 Description: Introduces SciGen, a dataset designed for text generation tasks requiring reasoning capabilities using scientific tables. It enables the evaluation of reasoning-aware generation models.
MRED: A Meta-Review Dataset for Structure-Controllable Text Generation
🖊️ Authors: C. Shen, L. Cheng, R. Zhou, L. Bing, Y. You
📖 Description: Presents MRED, a dataset aimed at enabling controllable text generation, particularly for summarizing and generating structured meta-reviews.
ToTTo: A Controlled Table-to-Text Generation Dataset
🖊️ Authors: A.P. Parikh, X. Wang, S. Gehrmann, M. Faruqui
📖 Description: Proposes ToTTo, a dataset designed for controlled table-to-text generation tasks. It emphasizes generating text grounded on structured data.
SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation
🖊️ Authors: H. Chen, H. Takamura, H. Nakayama
📖 Description: Introduces SciXGen, a dataset that facilitates the development of models for context-aware scientific paper generation.
DART: Open-Domain Structured Data Record to Text Generation
🖊️ Authors: L. Nan, D. Radev, R. Zhang, A. Rau, A. Sivaprasad
📖 Description: Presents DART, a dataset for transforming structured data records into coherent text, applicable in open-domain tasks.
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
🖊️ Authors: B.Y. Lin, W. Zhou, M. Shen, P. Zhou
📖 Description: Introduces CommonGen, a dataset for testing constrained generative commonsense reasoning by generating coherent sentences grounded on given concepts.
Evaluation of Text Generation: A Survey
🖊️ Authors: A. Celikyilmaz, E. Clark, J. Gao
📖 Description: Surveys various text generation datasets, models, and evaluation methods, offering insights into the current state and challenges of text generation.

10.2 Text Classification Datasets

Text classification datasets help train models to categorize text into predefined labels. These datasets are used in applications like sentiment analysis, spam detection, and topic classification.

NADA: New Arabic Dataset for Text Classification
🖊️ Authors: N. Alalyani, S. L. Marie-Sainte
📖 Description: Introduces NADA, a structured and standardized dataset for Arabic text classification, addressing gaps in Arabic NLP datasets.
Incremental Few-Shot Text Classification with Multi-Round New Classes: Formulation, Dataset and System
🖊️ Authors: C. Xia, W. Yin, Y. Feng, P. Yu
📖 Description: Proposes a new benchmark dataset for incremental few-shot text classification, enabling evaluation of multi-round new class additions.
Large-Scale Multi-Label Text Classification on EU Legislation
🖊️ Authors: I. Chalkidis, M. Fergadiotis, P. Malakasiotis
📖 Description: Releases a new dataset of 57k legislative documents from EUR-LEX annotated with ∼4.3k labels for multi-label classification tasks.
LSHTC: A Benchmark for Large-Scale Text Classification
🖊️ Authors: I. Partalas, A. Kosmopoulos, N. Baskiotis
📖 Description: Introduces LSHTC, a benchmark dataset for hierarchical text classification, supporting tasks with hundreds of thousands of classes.
Benchmarking Zero-Shot Text Classification: Datasets, Evaluation and Entailment Approach
🖊️ Authors: W. Yin, J. Hay, D. Roth
📖 Description: Presents datasets tailored for zero-shot text classification with a standardized evaluation framework and entailment-based methods.

10.3 Named Entity Recognition Datasets

Named Entity Recognition (NER) datasets are used for extracting named entities such as persons, locations, organizations, and dates from text. These datasets are crucial for tasks like information retrieval and knowledge extraction.

Multimodal Named Entity Recognition for Short Social Media Posts
🖊️ Authors: S. Moon, L. Neves, V. Carvalho
📖 Description: Introduces a dataset for multimodal named entity recognition (MNER) in social media, leveraging both text and visual data for more robust recognition.
MultiCoNER: A Large-Scale Multilingual Dataset for Complex Named Entity Recognition
🖊️ Authors: S. Malmasi, A. Fang, B. Fetahu
📖 Description: Presents MultiCoNER, a dataset designed to challenge NER models with fine-grained and complex entity recognition in a multilingual context.
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
🖊️ Authors: L. Derczynski, E. Nichols, M. Van Erp
📖 Description: Proposes a dataset for recognizing novel and emerging entities, emphasizing adaptability in dynamic domains like social media.
Creating a Dataset for Named Entity Recognition in the Archaeology Domain
🖊️ Authors: A. Brandsen, S. Verberne, M. Wansleeben
📖 Description: Develops a domain-specific NER dataset tailored to archaeological texts, annotated with six custom entity types.
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
🖊️ Authors: N. Ringland, X. Dai, B. Hachey, S. Karimi, C. Paris
📖 Description: Introduces NNE, a large-scale dataset for nested entity recognition, pushing models to handle hierarchical structures in newswire data.
CLUENER2020: Fine-Grained Named Entity Recognition Dataset and Benchmark for Chinese
🖊️ Authors: L. Xu, Q. Dong, Y. Liao, C. Yu
📖 Description: Presents CLUENER2020, a challenging dataset for fine-grained NER in Chinese, incorporating new entity types and samples.
Crosslingual Named Entity Recognition for Clinical De-Identification Applied to a COVID-19 Italian Dataset
🖊️ Authors: R. Catelli, F. Gargiulo, V. Casola, G. De Pietro
📖 Description: Creates a new dataset of Italian COVID-19 clinical records for cross-lingual NER, focusing on de-identification and anonymization.

10.4 Question Answering Datasets

Question Answering (QA) datasets enable models to generate answers based on a given question and context. These datasets are widely used in search engines, virtual assistants, and automated customer support systems.

WikiQA: A Challenge Dataset for Open-Domain Question Answering
🖊️ Authors: Y. Yang, W. Yih, C. Meek
📖 Description: Introduces WikiQA, a dataset for open-domain question answering, constructed from natural and realistic queries on Wikipedia.
GQA: A New Dataset for Compositional Question Answering Over Real-World Images
🖊️ Authors: D.A. Hudson, C.D. Manning
📖 Description: Proposes GQA, a dataset for visual reasoning and compositional question answering, designed to address key shortcomings of visual QA datasets.
HotpotQA: A Dataset for Diverse, Explainable Multi-Hop Question Answering
🖊️ Authors: Z. Yang, P. Qi, S. Zhang, Y. Bengio, W.W. Cohen
📖 Description: Introduces HotpotQA, a dataset emphasizing diverse and explainable multi-hop reasoning tasks using Wikipedia as its knowledge base.
ToolQA: A Dataset for Question Answering with External Tools
🖊️ Authors: Y. Zhuang, Y. Yu, K. Wang, H. Sun
📖 Description: Proposes ToolQA, a dataset for exploring the integration of external tools with question answering systems.
QASC: A Dataset for Question Answering via Sentence Composition
🖊️ Authors: T. Khot, P. Clark, M. Guerquin, P. Jansen
📖 Description: Introduces QASC, a dataset focusing on multi-hop reasoning through sentence composition to answer multiple-choice questions.
What Do Models Learn from Question Answering Datasets?
🖊️ Authors: P. Sen, A. Saffari
📖 Description: Explores generalizability across question answering datasets and highlights challenges with impossible questions in dataset design.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
🖊️ Authors: A. Rogers, M. Gardner, I. Augenstein
📖 Description: Analyzes the proliferation of question answering datasets, providing a taxonomy of more than 80 resources in QA and reading comprehension.

10.5 Fill Mask Datasets

Fill Mask datasets are used for training masked language models (MLMs) where a model learns to predict missing words in a given sentence. These datasets help improve contextualized word representations.

The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
🖊️ Authors: L. Lin, B. Wang, X. Wang, Z.X. Wang, A. Wiśniowski
📖 Description: Introduces FMAT, a dataset designed to measure semantic probabilities in natural language using fill-mask tasks for evaluating language models.
Performance Implications of Using Unrepresentative Corpora in Arabic NLP
🖊️ Authors: S. Alshahrani, N. Alshahrani, S. Dey
📖 Description: Creates a dataset for evaluating fill-mask tasks in Arabic, addressing the challenges posed by unrepresentative corpora in language modeling.
Automated Distractor Generation for Fill-in-the-Blank Items Using a Prompt-Based Learning Approach
🖊️ Authors: J. Zu, I. Choi, J. Hao
📖 Description: Proposes a new dataset for fill-in-the-blank tasks, leveraging prompt-based learning to generate distractors automatically.
DarkBERT: A Language Model for the Dark Side of the Internet
🖊️ Authors: Y. Jin, E. Jang, J. Cui, J.W. Chung, Y. Lee
📖 Description: Presents a dataset tailored for cybersecurity tasks, with evaluations on fill-mask and synonym inference capabilities.
We Understand Elliptical Sentences, and Language Models Should Too
🖊️ Authors: D. Testa, E. Chersoni, A. Lenci
📖 Description: Creates a dataset for studying ellipsis and its interaction with thematic fit, focusing on fill-mask tasks to predict missing verbs.
Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
🖊️ Authors: H.T. Kesgin, M.F. Amasyali
📖 Description: Proposes a dataset and methodology for iterative mask filling, designed to augment text effectively through masked language modeling.
Efficient and Thorough Anonymizing of Dutch Electronic Health Records
🖊️ Authors: S. Verkijk, P. Vossen
📖 Description: Develops a dataset for anonymizing Dutch electronic health records using fill-mask tasks as part of the de-identification process.

10.6 Machine Translation Datasets

Machine translation datasets provide parallel corpora for training models to translate text between different languages. These datasets are fundamental in developing multilingual NLP systems.

The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
🖊️ Authors: L. Lin, B. Wang, X. Wang, Z.X. Wang, A. Wiśniowski
📖 Description: Introduces FMAT, a dataset designed to measure semantic probabilities in natural language using fill-mask tasks for evaluating language models.
Performance Implications of Using Unrepresentative Corpora in Arabic NLP
🖊️ Authors: S. Alshahrani, N. Alshahrani, S. Dey
📖 Description: Creates a dataset for evaluating fill-mask tasks in Arabic, addressing the challenges posed by unrepresentative corpora in language modeling.
Automated Distractor Generation for Fill-in-the-Blank Items Using a Prompt-Based Learning Approach
🖊️ Authors: J. Zu, I. Choi, J. Hao
📖 Description: Proposes a new dataset for fill-in-the-blank tasks, leveraging prompt-based learning to generate distractors automatically.
DarkBERT: A Language Model for the Dark Side of the Internet
🖊️ Authors: Y. Jin, E. Jang, J. Cui, J.W. Chung, Y. Lee
📖 Description: Presents a dataset tailored for cybersecurity tasks, with evaluations on fill-mask and synonym inference capabilities.
We Understand Elliptical Sentences, and Language Models Should Too
🖊️ Authors: D. Testa, E. Chersoni, A. Lenci
📖 Description: Creates a dataset for studying ellipsis and its interaction with thematic fit, focusing on fill-mask tasks to predict missing verbs.
Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
🖊️ Authors: H.T. Kesgin, M.F. Amasyali
📖 Description: Proposes a dataset and methodology for iterative mask filling, designed to augment text effectively through masked language modeling.
Efficient and Thorough Anonymizing of Dutch Electronic Health Records
🖊️ Authors: S. Verkijk, P. Vossen
📖 Description: Develops a dataset for anonymizing Dutch electronic health records using fill-mask tasks as part of the de-identification process.

11. NLP in Vietnamese

Vietnamese NLP presents unique challenges due to the language's lack of word boundaries, tonal nature, and rich morphology. This section provides a collection of papers, tools, and datasets specifically tailored for Vietnamese NLP research and applications.

11.1 Vietnamese Text Preprocessing

Vietnamese text preprocessing involves tasks such as tokenization, stopword removal, and diacritic normalization. Due to the lack of explicit word boundaries, word segmentation is a critical preprocessing step in Vietnamese NLP.

Vietnamese Text Classification with Textrank and Jaccard Similarity Coefficient
🖊️ Authors: HT Huynh, N Duong-Trung, DQ Truong
📖 Description: Proposes a preprocessing pipeline for Vietnamese text classification using Textrank for keyword extraction and Jaccard similarity for feature selection.
Vietnamese Short Text Classification via Distributed Computation
🖊️ Authors: HX Huynh, LX Dang, N Duong-Trung
📖 Description: Explores preprocessing techniques for Vietnamese short text classification, focusing on distributed computation approaches.
DaNangNLP Toolkit for Vietnamese Text Preprocessing and Word Segmentation
🖊️ Authors: KD Nguyen, TT Nguyen, DB Nguyen
📖 Description: Develops a comprehensive toolkit for Vietnamese text preprocessing, including tokenization, word segmentation, and normalization.
Feature Extraction Using Neural Networks for Vietnamese Text Classification
🖊️ Authors: HH Kha
📖 Description: Proposes feature extraction techniques for Vietnamese text preprocessing using neural networks to enhance classification accuracy.
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
🖊️ Authors: QN Nguyen, TC Phan, DV Nguyen
📖 Description: Introduces ViSoBERT, a pre-trained model tailored for Vietnamese social media text, focusing on robust preprocessing pipelines.
SVSD: A Comprehensive Framework for Vietnamese Sentiment Analysis
🖊️ Authors: LT Nhi, DHA Vu, VDT Phong
📖 Description: Presents preprocessing steps and sentiment analysis methods for Vietnamese text to ensure data uniformity and effective modeling.
An Empirical Study on POS Tagging for Vietnamese Social Media Text
🖊️ Authors: NX Bach, ND Linh, TM Phuong
📖 Description: Focuses on part-of-speech tagging as a preprocessing task for Vietnamese social media text, creating a dataset for this task.

11.2 Vietnamese Word Representations

Word embeddings and contextualized word representations trained specifically for Vietnamese text improve NLP performance. This includes models like Word2Vec, FastText, and transformer-based embeddings such as PhoBERT.

Construction of a VerbNet Style Lexicon for Vietnamese
🖊️ Authors: H.M. Linh, N.T.M. Huyen
📖 Description: Develops a lexicon for Vietnamese verbs using word2vec representations on a large corpus, enabling applications in parsing and semantic tasks.
Comparing Different Criteria for Vietnamese Word Segmentation
🖊️ Authors: Q. Nguyen, N.L.T. Nguyen, Y. Miyao
📖 Description: Explores criteria for Vietnamese word segmentation and its impact on the quality of word representations in downstream tasks.
Improving Vietnamese Dependency Parsing Using Distributed Word Representations
🖊️ Authors: C. Vu-Manh, A.T. Luong, P. Le-Hong
📖 Description: Investigates how distributed word embeddings improve dependency parsing for Vietnamese, achieving significant accuracy improvements.
A Study of Word Representation in Vietnamese Sentiment Analysis
🖊️ Authors: H.Q. Nguyen, L. Vu, Q.U. Nguyen
📖 Description: Evaluates various word representation methods for sentiment analysis, focusing on Vietnamese corpora and sentiment tasks.
Leveraging Semantic Representations Combined with Contextual Word Representations for Vietnamese Textual Entailment
🖊️ Authors: Q.L. Duong, D.V. Nguyen
📖 Description: Combines semantic and contextual representations to improve performance on Vietnamese textual entailment tasks.
Vietnamese Document Representation and Classification
🖊️ Authors: G.S. Nguyen, X. Gao, P. Andreae
📖 Description: Proposes document-level representation techniques for Vietnamese, including bag-of-words and semantic embeddings.
Deep Neural Networks Algorithm for Vietnamese Word Segmentation
🖊️ Authors: K. Zheng, W. Zheng
📖 Description: Presents a deep neural network-based approach for Vietnamese word segmentation, leveraging contextualized embeddings for superior accuracy.

11.3 Vietnamese Named Entity Recognition (NER)

Named Entity Recognition (NER) identifies entities such as names, organizations, and locations within Vietnamese text. Challenges include handling ambiguous entity boundaries and diacritic variations.

Named Entity Recognition in Vietnamese Documents
🖊️ Authors: QT Tran, TXT Pham, QH Ngo, D Dinh
📖 Description: Explores techniques for recognizing named entities in Vietnamese documents with a focus on extracting relations and tracking entities across texts.
A Feature-Rich Vietnamese Named Entity Recognition Model
🖊️ Authors: PQ Nhat Minh
📖 Description: Presents a feature-rich NER model for Vietnamese that achieves state-of-the-art accuracy by combining multiple NLP toolkits and advanced chunking methods.
On the Vietnamese Name Entity Recognition: A Deep Learning Method Approach
🖊️ Authors: NC Lê, NY Nguyen, AD Trinh
📖 Description: Investigates the application of deep learning methods to Vietnamese NER, demonstrating state-of-the-art performance using contextual embeddings.
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
🖊️ Authors: TH Pham, P Le-Hong
📖 Description: Highlights the role of syntactic features in improving Vietnamese NER, utilizing an automatic feature extraction approach.
Vietnamese Named Entity Recognition on Medical Topics
🖊️ Authors: DP Van, DN Tien, TTT Minh, TD Minh
📖 Description: Proposes a new NER dataset for Vietnamese medical texts, including newly defined entity types and extensive annotations.
COVID-19 Named Entity Recognition for Vietnamese
🖊️ Authors: TH Truong, MH Dao, DQ Nguyen
📖 Description: Develops a COVID-19 domain-specific dataset for Vietnamese NER, incorporating novel entity types and robust annotations.
ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese
🖊️ Authors: P Van Duong, TD Trinh, MT Nguyen
📖 Description: Introduces ViMedNER, a dataset focused on medical entity recognition, specifically tailored for Vietnamese texts.

11.4 Vietnamese Part-of-Speech Tagging

Part-of-Speech (POS) tagging in Vietnamese requires models to correctly classify words into grammatical categories despite the language’s complex morphology and word segmentation issues.

A Semi-Supervised Learning Method for Vietnamese Part-of-Speech Tagging
🖊️ Authors: BN Xuan, CN Viet, MPQ Nhat
📖 Description: Proposes a semi-supervised learning approach for Vietnamese POS tagging, combining perceptron-based and tagging-style models.
Comparative Study of Vietnamese Part-of-Speech Tagging Tools
🖊️ Authors: LD Quach, D Do Thanh, DC Tran
📖 Description: Presents a comparative analysis of existing Vietnamese POS tagging tools and evaluates their accuracy and efficiency.
An Empirical Study of Maximum Entropy Approach for Part-of-Speech Tagging of Vietnamese Texts
🖊️ Authors: P Le-Hong, A Roussanaly, TMH Nguyen
📖 Description: Explores the application of the maximum entropy model for Vietnamese POS tagging, leveraging a wide range of linguistic features.
PhoNLP: A Joint Multi-Task Learning Model for Vietnamese POS Tagging, NER, and Dependency Parsing
🖊️ Authors: LT Nguyen, DQ Nguyen
📖 Description: Introduces PhoNLP, a joint model for POS tagging, named entity recognition, and dependency parsing, demonstrating state-of-the-art performance.
An Empirical Study on POS Tagging for Vietnamese Social Media Text
🖊️ Authors: NX Bach, ND Linh, TM Phuong
📖 Description: Focuses on adapting POS tagging to handle the unique challenges of Vietnamese social media text.
A Hybrid Approach to Vietnamese Word Segmentation Using POS Tags
🖊️ Authors: GB Tran, SB Pham
📖 Description: Develops a hybrid approach integrating POS tagging to improve Vietnamese word segmentation techniques.
Dual Decomposition for Vietnamese Part-of-Speech Tagging
🖊️ Authors: NX Bach, K Hiraishi, N Le Minh, A Shimazu
📖 Description: Proposes a dual decomposition method for Vietnamese POS tagging, addressing limitations in existing models.

11.5 Vietnamese Syntax and Parsing

Vietnamese dependency parsing and constituency parsing help analyze sentence structures, enabling downstream applications like machine translation and question answering.

Prosodic Phrasing Modeling for Vietnamese TTS Using Syntactic Information
🖊️ Authors: NTT Trang, A Rilliard, T Do Dat
📖 Description: Explores the interface between syntax and prosody in Vietnamese text-to-speech (TTS) systems, leveraging syntactic information to improve phrasing.
Semantic Parsing for Vietnamese: A Cross-Lingual Approach
🖊️ Authors: T Pham
📖 Description: Presents a cross-lingual approach to semantic parsing for Vietnamese, focusing on syntactic and semantic challenges.
Vietnamese Parsing Applying the PCFG Model
🖊️ Authors: HA Viet, DTP Thu, HQ Thang
📖 Description: Investigates the use of probabilistic context-free grammar (PCFG) for Vietnamese syntax parsing, enhancing parsing accuracy.
Building a Treebank for Vietnamese Syntactic Parsing
🖊️ Authors: NT Quy
📖 Description: Develops a Vietnamese treebank and evaluates different parsing methods, identifying sources of parsing errors.
Semantic Parsing of Simple Sentences in Unification-Based Vietnamese Grammar
🖊️ Authors: DT Nguyen, KD Nguyen, HT Le
📖 Description: Explores unification-based grammar for semantic parsing of simple Vietnamese sentences, emphasizing taxonomy and grammar development.
An Experimental Study on Constituency Parsing for Vietnamese
🖊️ Authors: L Nguyen-Thi, P Le-Hong
📖 Description: Analyzes constituency parsing for Vietnamese using syntax-annotated corpora, presenting empirical results and model performance.
Using Syntax and Shallow Semantic Analysis for Vietnamese Question Generation
🖊️ Authors: P Tran, DK Nguyen, T Tran, B Vo
📖 Description: Applies syntax and shallow semantic analysis to Vietnamese question generation, addressing limitations in existing models.

11.6 Machine Translation for Vietnamese

Machine translation between Vietnamese and other languages (e.g., English, French, Chinese) is an active research area. Transformer-based models like MarianMT and multilingual BERT-based models improve translation quality.

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pairs
🖊️ Authors: HV Tran, MQ Nguyen, VV Nguyen
📖 Description: This study evaluates bidirectional machine translation between Vietnamese-Chinese and Vietnamese-Lao, focusing on fluency and accuracy."
Are LLMs Good for Low-resource Vietnamese and Other Translations?
🖊️ Authors: VV Nguyen, H Nguyen-Tien, P Nguyen-Ngoc
📖 Description: Investigates the performance of large language models (LLMs) in low-resource translation tasks, including Vietnamese."
Handling Imbalanced Resources and Loanwords in Vietnamese-Bahnaric Neural Machine Translation
🖊️ Authors: LNH Bui, HTP Nguyen, MK Le
📖 Description: Focuses on neural machine translation for the Vietnamese-Bahnaric language pair, tackling issues of imbalanced data and loanwords."
Constructing a Chinese-Vietnamese Bilingual Corpus from Subtitle Websites
🖊️ Authors: PN Nguyen, P Tran
📖 Description: Explores using subtitle data to build a high-quality Vietnamese-Chinese parallel corpus."
Exploring Low-Resource Machine Translation: Case Study of Lao-Vietnamese Translation
🖊️ Authors: QD Tran
📖 Description: Develops a machine translation system for the low-resource Vietnamese-Lao language pair."
Neural Network Translations for Building SentiWordNets
🖊️ Authors: KN Lam, TP Le, KC Ngu, KT Le, PM Le
📖 Description: Uses machine translation to create a Vietnamese version of the SentiWordNet lexical resource."
Evaluating the Feasibility of Machine Translation for Patient Education in Vietnamese
🖊️ Authors: M Ugas, MA Calamia, J Tan, B Umakanthan
📖 Description: Assesses Google Translate for translating patient education materials into Vietnamese."
Improving Chinese-Vietnamese Neural Machine Translation with Irrelevant Word Detection
🖊️ Authors: T Wang, Z Yu, W Yu, W Sun
📖 Description: Introduces a method to filter irrelevant words to improve Vietnamese-Chinese machine translation."

11.7 Vietnamese Question Answering

Question Answering (QA) systems in Vietnamese involve answering questions based on structured or unstructured text. QA models require high-quality annotated datasets for accurate responses.

Building a Website to Sell Electronic Devices Store Integrated with Chatbot AI and VNPay Payment Gateway
🖊️ Authors: TT Nguyen, VN Nguyen
📖 Description: This study explores the integration of AI chatbots in e-commerce, specifically within Vietnamese electronic stores using VNPay."
Top 2 at ALQAC 2024: Large Language Models (LLMs) for Legal Question Answering
🖊️ Authors: HQ Pham, Q Van Nguyen, DQ Tran
📖 Description: Analyzes the use of large language models (LLMs) for legal question answering in Vietnamese law."
Critical Discourse Analysis of Judicial Conversations in Vietnam: A Case Study
🖊️ Authors: PT Ly
📖 Description: Examines the structure and discourse of judicial question-answer interactions in Vietnamese courts."
Vietnamese Young People and the Reactive Public Sphere
🖊️ Authors: VT Le, TM Ly-Le, L Ha
📖 Description: Investigates how young Vietnamese individuals engage in public discourse and answer political questions in online spaces."
[Four Important Characteristics of Women in Confucianism and Its Contribution to the Implementation o](Gender Equality in Vietnam",https://ejournals.epublishing.ekt.gr/index.php/Conatus/article/view/35243)
🖊️ Authors: D Van Vo
📖 Description: Discusses how Confucianism has shaped gender roles and question-answer dynamics in Vietnamese society."
Man in a Hurry: Murray MacLehose and Colonial Autonomy in Hong Kong
🖊️ Authors: P Roberts
📖 Description: Explores how Vietnamese refugees' legal and political questions were addressed in colonial Hong Kong."
Integrating Theatrical Arts into Storytelling Instruction in Primary Education
🖊️ Authors: QV Tran, YN Tran
📖 Description: Examines how question-answer techniques in storytelling can be improved with theatrical methods in Vietnamese schools."
Buddhism: A Journey through History
🖊️ Authors: DS Lopez
📖 Description: Explores how Buddhism has historically answered philosophical and religious questions in Vietnam."

11.8 Vietnamese Text Summarization

Text summarization generates concise and informative summaries from long Vietnamese documents. Extractive and abstractive summarization techniques are commonly used for this task.

Vietnamese Online Newspapers Summarization Using Pre-Trained Model
🖊️ Authors: T Le Ngoc
📖 Description: Presents a model for summarizing Vietnamese online newspapers using pre-trained deep learning techniques."
Graph-based and Generative Approaches to Multi-Document Summarization
🖊️ Authors: TD Thanh, TM Nguyen, TB Nguyen, HT Nguyen
📖 Description: Introduces a hybrid approach combining graph-based and generative methods for Vietnamese multi-document summarization."
THASUM: Transformer for High-Performance Abstractive Summarizing Vietnamese Large-scale Dataset
🖊️ Authors: TH Nguyen, TN Do
📖 Description: Develops a transformer-based abstractive summarization model trained on a large-scale Vietnamese dataset."
Pre-Training Clustering Models to Summarize Vietnamese Texts
🖊️ Authors: TH Nguyen, TN Do
📖 Description: Proposes a clustering-based pre-training approach for single-document extractive summarization in Vietnamese."
Vietnamese Online Newspapers Summarization Using LexRank
🖊️ Authors: LEN THANG, LEQ MINH
📖 Description: Applies the LexRank algorithm for Vietnamese news summarization using graph-based sentence ranking."
Feature-Based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task
🖊️ Authors: MP Nguyen, TA Le
📖 Description: Develops an unsupervised sentence ranking model for Vietnamese text summarization."
Paraphrasing with Large Language Models
🖊️ Authors: CT Nguyen, DHP Pham, CT Dang, TH Le
📖 Description: Explores the use of large language models for Vietnamese text paraphrasing and summarization."
Resource-Efficient Vietnamese Text Summarization
🖊️ Authors: HD Nguyen Pham, DT Nguyen
📖 Description: Enhances the efficiency of Vietnamese text summarization using data filtering and low-memory deep learning techniques."

11.9 Resources for Vietnamese NLP

A collection of open-source tools, frameworks, and datasets for Vietnamese NLP, including word segmentation tools, language models, and benchmark datasets.

Automatically Generating a Dataset for Natural Language Inference Systems from a Knowledge Graph
🖊️ Authors: DV Vo, P Do
📖 Description: Presents a dataset for Vietnamese Natural Language Inference (NLI) using a knowledge graph, contributing to NLP research and model evaluation."
Neural Network Translations for Building SentiWordNets
🖊️ Authors: KN Lam, TP Le, KC Ngu, KT Le, PM Le
📖 Description: Explores neural network-based translation for creating Vietnamese SentiWordNet, enhancing sentiment analysis resources."
Updated Activities on Resources Development for Vietnamese Speech and NLP
🖊️ Authors: LC Mai
📖 Description: Reviews recent developments in Vietnamese NLP and speech resources, including government initiatives and industry collaborations."

11.10 Challenges in Vietnamese NLP

Discusses the key challenges in Vietnamese NLP, such as handling tonal variations, segmentation difficulties, data scarcity, and the need for high-quality annotated datasets.

Evaluating the Effectiveness of Commonly Used Sentiment Analysis Models for the Second Indochina War
🖊️ Authors: A Chakraborty
📖 Description: Examines challenges in applying sentiment analysis models to Vietnamese historical texts, highlighting limitations in existing NLP approaches."
Machine Learning Approach for Suicide and Depression Identification with Corrected Unsupervised Labels
🖊️ Authors: M Badki
📖 Description: Discusses the challenges of identifying mental health-related text in Vietnamese using machine learning models with unsupervised labels."
Building A Job Portal Website Integrating AI Technology
🖊️ Authors: PT Nguyen, THH Nguyen
📖 Description: Explores NLP challenges in building AI-powered job search platforms for Vietnamese users."
ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical Normalization
🖊️ Authors: ATH Nguyen, DH Nguyen, K Van Nguyen
📖 Description: Addresses lexical normalization issues in Vietnamese social media text processing."
A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media
🖊️ Authors: DH Nguyen, ATH Nguyen, K Van Nguyen
📖 Description: Proposes a weakly supervised labeling approach to tackle low-resource challenges in Vietnamese NLP."
VNLegalEase: A Vietnamese Legal Query Chatbot
🖊️ Authors: PTX Hien, NTT Vy, HD Ngo
📖 Description: Discusses NLP difficulties in legal document understanding and chatbot development for Vietnamese law."
Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges
🖊️ Authors: N Dinh, T Dang, LT Nguyen
📖 Description: Investigates dialectal variation in Vietnamese and its impact on NLP tasks and model performance."
Contextual Emotional Transformer-Based Model for Comment Analysis in Mental Health Case Prediction
🖊️ Authors: AOJ Ibitoye, OO Oladosu
📖 Description: Explores the challenges of contextual emotion detection in Vietnamese NLP for mental health prediction."

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation