π Awesome NLP is a curated collection of high-quality resources, papers, libraries, tools, and datasets for Natural Language Processing (NLP). Whether you're a beginner exploring the basics or an expert diving into cutting-edge research, this repository has something for everyone.
- π° 1. Introduction
- π οΈ 2. How to Use
- π€ 3. Contributing
- π§ 4. Fundamentals of Deep Learning
- β³ 5. Sequence Modeling
- π 5.1 RNNs and LSTMs
- π 5.2 Sequence Models
- π― 5.3 Attention Mechanism
- π₯ 5.4 Transformers
- π 6. Word Representations
- π 7. Evaluation
- π― 8. Tasks
- π 8.1 Text Generation
- π·οΈ 8.2 Text Classification
- π 8.3 Named Entity Recognition
- β 8.4 Question Answering
- π 8.5 Fill Mask
- π 8.6 Machine Translation
- π€ 9. Models
- 9.1 BERT
- 9.2 GPT-3 (GPT)
- 9.3 GPT-2
- 9.4 RoBERTa
- 9.5 T5
- 9.6 DistilBERT
- 9.7 ALBERT
- 9.8 BART
- 9.9 ELECTRA
- 9.10 XLNet
- 9.11 BERTweet
- 9.12 BlenderBot
- 9.13 DeBERTa
- 9.14 BigBird
- 9.15 PEGASUS
- 9.16 FLAN-T5
- 9.17 MobileBERT
- 9.18 GPT-Neo
- 9.19 Longformer
- 9.20 XLM-RoBERTa
- 9.21 DialoGPT
- 9.22 MarianMT
- 9.23 Falcon
- 9.24 CodeGen
- 9.25 ByT5
- 9.26 PhoBERT
- 9.27 Funnel Transformer
- 9.28 T5v1.1
- 9.29 RoFormer
- 9.30 MBart and MBart-50
- π 10. Datasets
- π»π³ 11. NLP in Vietnamese
- π 11.1 Vietnamese Text Preprocessing
- π‘ 11.2 Vietnamese Word Representations
- π·οΈ 11.3 Vietnamese Named Entity Recognition (NER)
- βοΈ 11.4 Vietnamese Part-of-Speech Tagging
- π 11.5 Vietnamese Syntax and Parsing
- π 11.7 Machine Translation for Vietnamese
- β 11.8 Vietnamese Question Answering
- π 11.9 Vietnamese Text Summarization
- π 11.10 Resources for Vietnamese NLP
β οΈ 11.11 Challenges in Vietnamese NLP
Natural Language Processing (NLP) is a fast-evolving field at the intersection of π£οΈ linguistics, π€ artificial intelligence, and π§ deep learning. It powers various applications, from π¬ chatbots and π machine translation to βοΈ automated text generation and π information retrieval.
This repository organizes NLP research into key areas, making it easier for students, researchers, and practitioners to find relevant π papers, π οΈ tools, and π datasets. Below is an overview of the main sections:
- π§ Fundamentals of Deep Learning: Covers the core concepts of deep learning, including neural networks, activation functions, backpropagation, and optimization techniques.
- β³ Sequence Modeling: Focuses on sequential data processing, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Transformer-based architectures.
- π Word Representations: Explores word embedding techniques, including static embeddings (Word2Vec, GloVe) and contextualized embeddings (BERT, ELMo).
- π Evaluation: Discusses how to measure NLP model performance, including accuracy, BLEU, ROUGE, and fairness metrics.
- π― Tasks: A collection of research papers on key NLP applications such as π text generation, π·οΈ classification, π named entity recognition (NER), β question answering, and π machine translation.
- π€ Models: Covers state-of-the-art NLP models such as BERT, GPT-3, RoBERTa, T5, and many others, providing links to research papers and implementations.
- π Datasets: A list of public datasets commonly used in NLP research, categorized by task (e.g., π·οΈ text classification, π NER, π machine translation).
- π»π³ NLP in Vietnamese: Focuses on Vietnamese NLP research, including π text preprocessing, π€ embeddings, π·οΈ sentiment analysis, and π translation.
This structured collection makes it easier to π understand fundamental NLP concepts, π explore the latest research, and βοΈ apply NLP techniques to real-world problems.
This repository is designed to be a comprehensive reference for NLP research and applications. Hereβs how you can make the most of it:
If you're new to NLP, start with the Fundamentals of Deep Learning section. It provides a foundation in deep learning concepts that are essential for understanding modern NLP techniques.
Read about different sequence modeling techniques in the Sequence Modeling section. This will introduce you to RNNs, LSTMs, the Attention Mechanism, and the Transformer model, which forms the basis of most modern NLP models.
Check out the Word Representations section to learn how text is transformed into numerical vectors, including static embeddings (Word2Vec, GloVe) and contextualized embeddings (BERT, ELMo, GPT).
Visit the Evaluation section to understand how NLP models are evaluated. This section covers common metrics such as BLEU for translation, ROUGE for summarization, and fairness metrics.
Browse the Tasks section for papers related to text classification, question answering, machine translation, and more.
Visit the Models section to find research papers on models like BERT, GPT-3, RoBERTa, T5, and others.
If you're looking for training datasets, check out the Datasets section, which categorizes datasets based on NLP tasks.
For researchers focusing on Vietnamese NLP, the NLP in Vietnamese section includes papers and resources on Vietnamese text preprocessing, NER, sentiment analysis, and machine translation.
The field of NLP is evolving rapidly. Keep an eye on new research papers and updates to this repository.
If you have found a useful NLP paper or tool, consider contributing! See the Contributing section for details.
We welcome contributions to make this repository better! Hereβs how you can help:
-
Suggest Papers or Resources:
Found an important NLP paper, dataset, or tool? Open an issue or submit a pull request. -
Report Issues:
Noticed a broken link or incorrect information? Let us know by opening an issue. -
Enhance Documentation:
Help improve descriptions, summaries, or structure. -
Submit Pull Requests:
- Fork the repository.
- Create a new branch for your changes.
- Commit your updates, ensuring they follow the existing format.
- Submit a pull request with a clear description of your contribution.
Check out our Contribution Guidelines for detailed instructions.
- If you find this repository helpful, star β it on GitHub and share it with the NLP community.
- Start exploring topics from the table of contents.
- Feel free to contribute by adding new papers, tools, or datasets.
Happy Learning! π
This section covers the foundational concepts of deep learning, including neural networks, activation functions, backpropagation, gradient descent, and optimization techniques. Each subsection includes links to important research papers and descriptions for further reading.
Explore the fundamental building blocks of deep learning and their applications across various domains.
-
Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective
ποΈ Authors: Sarker, Iqbal H
π Description: This paper provides an in-depth exploration of neural networks and deep learning applications in cybersecurity. It discusses frameworks, challenges, and future research directions, emphasizing adaptability in cyber defense. -
Introduction to machine learning, neural networks, and deep learning
ποΈ Authors: Choi, Rene Y; Coyner, Aaron S; Kalpathy-Cramer, Jayashree; Chiang, Michael F; Campbell, J Peter
π Description: A foundational overview of machine learning principles, focusing on neural networks and deep learning methodologies applied in medical imaging and diagnostics. -
An introduction to neural networks and deep learning
ποΈ Authors: Suk, Heung-Il
π Description: A comprehensive introduction to neural network structures and their progression into deep learning systems, focusing on practical medical applications. -
Survey on neural network architectures with deep learning
ποΈ Authors: Smys, S; Chen, J; Shakya, S
π Description: A taxonomy of neural network architectures and their design paradigms, highlighting optimization techniques and use cases across industries. -
Fundamentals of artificial neural networks and deep learning
ποΈ Authors: Montesinos LΓ³pez, O. A; Montesinos LΓ³pez, A
π Description: A theoretical exploration of artificial neural networks, detailing their evolution into advanced deep learning systems. -
Conceptual understanding of convolutional neural network: A deep learning approach
ποΈ Authors: Indolia, S; Goswami, A K; Asopa, P
π Description: Insights into CNNs as a cornerstone of deep learning, showcasing their advantages for high-dimensional data. -
Application of meta-heuristic algorithms for training neural networks and deep learning architectures
ποΈ Authors: Kaveh, M; Mesgari, M S
π Description: A review of optimization algorithms applied to neural networks, emphasizing hyperparameter tuning and performance enhancement. -
Neural networks and deep learning in urban geography: A systematic review and meta-analysis
ποΈ Authors: Grekousis, G
π Description: An analysis of deep learning applications in urban studies, offering insights into spatial modeling using neural networks. -
Deep learning neural networks: Design and case studies
ποΈ Authors: Graupe, D
π Description: A textbook exploring neural network design, training methods, and real-world case studies. -
Deep learning in neural networks: An overview
ποΈ Authors: Schmidhuber, J
π Description: A highly cited review covering the history, methodologies, and applications of deep learning.
Learn about the key role of activation functions in neural networks and their impact on model performance.
-
A Universal Activation Function for Deep Learning
ποΈ Authors: Hwang, S. Y. & Kim, J. J.
π Description: Proposes a novel activation function adaptable across tasks, enhancing model performance and reducing training complexity. -
Enhancing Brain Tumor Detection: A Novel CNN Approach with Advanced Activation Functions
ποΈ Authors: Kaifi, R.
π Description: Develops a specialized activation function tailored for medical imaging, significantly improving accuracy in tumor detection. -
An Overview of the Activation Functions Used in Deep Learning Algorithms
ποΈ Authors: KΔ±lΔ±Γ§arslan, S., Adem, K., & Γelik, M.
π Description: Reviews a broad spectrum of fixed and trainable activation functions, discussing their computational properties and impacts. -
Smish: A Novel Activation Function for Deep Learning Methods
ποΈ Authors: Wang, X., Ren, H., & Wang, A.
π Description: Introduces 'Smish,' a smooth, non-monotonic activation function that outperforms traditional functions in various scenarios. -
Learning Specialized Activation Functions for Physics-Informed Neural Networks
ποΈ Authors: Wang, H., Lu, L., Song, S., & Huang, G.
π Description: Focuses on customized activation functions designed for solving physics-informed problems with neural networks. -
Rmaf: ReLU-Memristor-like Activation Function for Deep Learning
ποΈ Authors: Yu, Y., Adu, K., & Wang, X.
π Description: Proposes an activation function inspired by memristive properties to enhance network flexibility and learning. -
Catalysis of Neural Activation Functions: Adaptive Feed-forward Training for Big Data Applications
ποΈ Authors: Sarkar, S., Agrawal, S., & Baker, T.
π Description: Explores dynamic activation functions that adapt during training, optimizing performance for large-scale datasets. -
The Most Used Activation Functions: Classic Versus Current
ποΈ Authors: Mercioni, M. A., & Holban, S.
π Description: Compares traditional and modern activation functions, identifying trends and shifts in their usage. -
Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic
ποΈ Authors: Hammad, M. M.
π Description: Categorizes activation functions into diverse classes and evaluates their roles in neural network training. -
Parametric Activation Functions for Neural Networks: A Tutorial Survey
ποΈ Authors: PusztahΓ‘zi, L. S., Eigner, G., & CsiszΓ‘r, O.
π Description: A detailed tutorial on parametric activation functions, highlighting their adaptability and advantages over static counterparts.
Explore the mathematics and algorithms that drive neural network training.
-
A Mathematical Theory of Communication
ποΈ Authors: Claude Shannon
π Description: This seminal work laid the foundation for information theory, which is crucial for neural networks. -
Learning Internal Representations by Error Propagation
ποΈ Authors: David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
π Description: Introduced the backpropagation algorithm, a powerful method for training multi-layer perceptrons. -
On the Convergence Properties of the Back-Propagation Algorithm
ποΈ Authors: Y. LeCun, L. D. Jackel, L. Bottou
π Description: Investigates the convergence properties of the backpropagation algorithm, providing insights into its strengths and limitations. -
An overview of gradient descent optimization algorithms
ποΈ Authors: Sebastian Ruder
π Description: Compares various gradient descent optimization algorithms, including standard gradient descent, Momentum, Adagrad, RMSprop, and Adam. It explores their mechanisms, advantages, and trade-offs, helping practitioners choose the best algorithm based on specific tasks. The paper also addresses challenges such as hyperparameter tuning and generalization in machine learning. -
Efficient Backprop
ποΈ Authors: Yann LeCun, LΓ©on Bottou, Yoshua Bengio, Patrick Haffner
π Description: Explores techniques for improving the efficiency of backpropagation, which is crucial for training large neural networks. -
Asynchronous stochastic gradient descent with decoupled backpropagation and layer-wise updates
ποΈ Authors: Cabrel Teguemne Fokam, Khaleelulla Khan Nazeer, Lukas KΓΆnig, David Kappel, Anand Subramoney
π Description: Presents a novel asynchronous approach to stochastic gradient descent, which decouples backpropagation across layers to improve efficiency in deep networks. -
Generalizing Backpropagation for Gradient-Based Interpretability
ποΈ Authors: Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell
π Description: Explores the concept of backpropagation and its generalization to understand gradient-based interpretability in machine learning models. -
Gradient Descent based Optimization Algorithms for Deep Learning Models Training
ποΈ Authors: Jiawei Zhang
π Description: Explores gradient descent optimization techniques for training deep learning models, highlighting common methods like Momentum, Adagrad, Adam, and Gadam. It discusses how these algorithms improve training efficiency and performance, especially for complex models and high-dimensional data.
Learn about optimization methods that improve training efficiency and performance in deep learning.
-
Optimization Techniques in Machine Learning and Deep Learning
ποΈ Authors: Ashutosh V. Patil, Gayatri Y. Bhangle
π Description: Explores optimization techniques like gradient descent, its variants, and convergence properties. -
Optimization for deep learning: theory and algorithms
ποΈ Authors: Ruoyu Sun
π Description: Discusses optimization techniques for deep learning, with a focus on gradient descent and stochastic gradient descent (SGD). -
Optimization Methods in Deep Learning: A Comprehensive Overview
ποΈ Authors: David Shulman
π Description: Offers an extensive review of optimization techniques for deep learning, covering methods like gradient descent, SGD, and their variants. Provides insights into their mathematical foundations and practical applications. -
Advanced metaheuristic optimization techniques in applications of deep neural networks: a review
ποΈ Authors: Abd Elaziz, Mohamed; Dahou, Abdelghani; Abualigah, Laith; Yu, Liyang; Alshinwan, Mohammad; Khasawneh, Ahmad M; Lu, Songfeng
π Description: Reviews advanced metaheuristic optimization techniques applied to deep neural networks, focusing on methods like genetic algorithms, particle swarm optimization, and simulated annealing to enhance DNN training efficiency.
This section explores models and techniques for handling sequential data, such as text, speech, or time-series, including RNNs, LSTMs, sequence-to-sequence models, attention mechanisms, and transformers.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are widely used for processing sequential data. Below are key papers on their development and applications:
-
Recurrent neural network and LSTM models for lexical utterance classification
ποΈ Authors: SV Ravuri, A Stolcke
π Description: This paper explores the application of RNN and LSTM models for lexical utterance classification, highlighting the effectiveness of LSTMs for long utterances and RNNs for shorter ones. -
Introduction to sequence learning models: RNN, LSTM, GRU
ποΈ Authors: S Zargar
π Description: An introduction to sequence learning models including RNN, LSTM, and GRU, focusing on their architectures and applications in sequence-based tasks. -
Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM, and GRU
ποΈ Authors: A Shewalkar, D Nyavanandi, SA Ludwig
π Description: This paper evaluates the performance of RNNs, LSTMs, and GRUs in speech recognition tasks, emphasizing LSTM's superior word error rate. -
TTS synthesis with bidirectional LSTM-based recurrent neural networks
ποΈ Authors: Y Fan, Y Qian, FL Xie, FK Soong
π Description: A study on text-to-speech synthesis using bidirectional LSTM networks, demonstrating improved modeling of sequential data. -
Learning precise timing with LSTM recurrent networks
ποΈ Authors: FA Gers, NN Schraudolph, J Schmidhuber
π Description: This paper introduces LSTM networks with peepholes and forget gates, showcasing their ability to handle precise timing in sequential data. -
A critical review of RNN and LSTM variants in hydrological time series predictions
ποΈ Authors: M Waqas, UW Humphries
π Description: A review of RNN and LSTM models applied to hydrological time series data, analyzing their strengths and limitations. -
RNN-LSTM: From applications to modeling techniques and beyondβSystematic review
ποΈ Authors: SM Al-Selwi, MF Hassan, SJ Abdulkadir
π Description: A systematic review of RNN-LSTM applications and modeling techniques across various domains. -
Understanding LSTM--A tutorial into long short-term memory recurrent neural networks
ποΈ Authors: RC Staudemeyer, ER Morris
π Description: A tutorial offering a detailed explanation of LSTM networks and their role in addressing long-term dependency challenges in RNNs. -
A review of recurrent neural networks: LSTM cells and network architectures
ποΈ Authors: Y Yu, X Si, C Hu, J Zhang
π Description: This review categorizes various LSTM architectures and their applications, highlighting improvements over standard RNNs. -
Learning to diagnose with LSTM recurrent neural networks
ποΈ Authors: ZC Lipton
π Description: This paper demonstrates the use of LSTM networks for medical diagnosis tasks, showing their capability to process sequential patient data effectively.
Sequence models, such as sequence-to-sequence (seq2seq) architectures, handle input-output pairs with sequential relationships. Below are key papers on this topic:
-
An Analysis of 'Attention' in Sequence-to-Sequence Models
ποΈ Authors: R Prabhavalkar, TN Sainath, B Li, K Rao, N Jaitly
π Description: This paper examines the role of attention mechanisms in sequence-to-sequence models, focusing on their impact on tasks like speech recognition and translation. -
Sequence Modeling with CTC
ποΈ Authors: A Hannun
π Description: This work introduces connectionist temporal classification (CTC) for sequence modeling, illustrating its use in aligning sequences like audio-to-text without explicit alignments. -
Neural Machine Translation and Sequence-to-Sequence Models: A Tutorial
ποΈ Authors: G Neubig
π Description: A comprehensive tutorial covering sequence-to-sequence models in machine translation, including encoder-decoder structures and attention mechanisms. -
Deep Reinforcement Learning for Sequence-to-Sequence Models
ποΈ Authors: Y Keneshloo, T Shi, N Ramakrishnan
π Description: The paper explores the integration of reinforcement learning techniques with sequence-to-sequence models for improved performance. -
Seq2sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
ποΈ Authors: M Cheng, J Yi, PY Chen, H Zhang, CJ Hsieh
π Description: This work analyzes the robustness of sequence-to-sequence models under adversarial attacks, proposing frameworks to evaluate their stability. -
A Causal Framework for Explaining the Predictions of Black-Box Sequence-to-Sequence Models
ποΈ Authors: D Alvarez-Melis, TS Jaakkola
π Description: The paper introduces a causal framework to understand and explain decisions made by black-box sequence-to-sequence models. -
Lingvo: A Modular and Scalable Framework for Sequence-to-Sequence Modeling
ποΈ Authors: J Shen, P Nguyen, Y Wu, Z Chen, MX Chen
π Description: Lingvo, an open-source framework by Google, enables scalable training of sequence-to-sequence models for tasks like speech recognition and translation.
Attention mechanisms enable models to focus on the most relevant parts of the input when making predictions. This subsection includes research on various attention techniques:
-
Gaussian Prediction Based Attention for Online End-to-End Speech Recognition
ποΈ Authors: J Hou, S Zhang, LR Dai
π Description: This paper introduces a Gaussian prediction-based attention mechanism to improve online end-to-end speech recognition by refining sequence alignment. -
Pose-conditioned Spatio-temporal Attention for Human Action Recognition
ποΈ Authors: F Baradel, C Wolf, J Mille
π Description: Proposes a spatio-temporal attention mechanism conditioned on pose features for effective human action recognition from RGB video sequences. -
Recurrent Attention Network on Memory for Aspect Sentiment Analysis
ποΈ Authors: P Chen, Z Sun, L Bing, W Yang
π Description: This paper explores a recurrent attention network for aspect-level sentiment analysis by leveraging multiple attention mechanisms to focus on sentiment features. -
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
ποΈ Authors: L Chen, H Zhang, J Xiao, L Nie
π Description: A new attention mechanism that combines spatial and channel-wise attention is presented for improved image captioning performance. -
Gated Self-Matching Networks for Reading Comprehension and Question Answering
ποΈ Authors: W Wang, N Yang, F Wei, B Chang
π Description: This paper introduces gated self-matching attention for question answering, leveraging passage and question alignment to refine representations. -
Residual Attention Network for Image Classification
ποΈ Authors: F Wang, M Jiang, C Qian, S Yang
π Description: Residual attention networks enhance image classification by incorporating a novel attention mechanism into a deep residual network. -
Paying More Attention to Attention: Improving Performance of Convolutional Neural Networks via Attention Transfer
ποΈ Authors: S Zagoruyko, N Komodakis
π Description: This paper improves convolutional neural network performance by utilizing attention transfer between teacher and student models during training. -
Topic Aware Neural Response Generation
ποΈ Authors: C Xing, W Wu, Y Wu, J Liu
π Description: This study develops a topic-aware attention mechanism for generating conversational responses, effectively aligning dialogue content with contextual topics.
Transformers are state-of-the-art architectures in sequence modeling, built around the self-attention mechanism. Below are significant papers that outline their theory and applications:
-
Attention Is All You Need
ποΈ Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Εukasz Kaiser, Illia Polosukhin
π Description: The foundational paper introducing the transformer architecture. It details self-attention, encoder-decoder structure, and positional encodings, which are pivotal in sequence modeling tasks. -
Understanding How Positional Encodings Work in Transformer Models
ποΈ Authors: T Miyazaki, H Mino, H Kaneko
π Description: Examines the functionality of positional encodings in self-attention and cross-attention blocks of transformer architectures, exploring their integration in encoder-decoder models. -
Universal Transformers
ποΈ Authors: M Dehghani, S Gouws, O Vinyals, J Uszkoreit
π Description: Introduces a universal transformer that extends the standard model by incorporating recurrence in the self-attention mechanism, enhancing its theoretical depth and reasoning capabilities. -
Position Information in Transformers: An Overview
ποΈ Authors: P Dufter, M Schmitt, H SchΓΌtze
π Description: Systematically reviews positional encoding techniques in transformers, analyzing over 30 models to understand their role in encoding positional information for attention mechanisms. -
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
ποΈ Authors: TC Chi, TH Fan, AI Rudnicky, PJ Ramadge
π Description: Explores how transformer working memory interacts with self-attention to enable reasoning in regular languages and length extrapolation in NLP tasks. -
Understanding the Failure of Batch Normalization for Transformers in NLP
ποΈ Authors: J Wang, J Wu, L Huang
π Description: Investigates the challenges batch normalization introduces to self-attention and proposes alternatives for stabilizing transformer training in NLP tasks. -
Activating Self-Attention for Multi-Scene Absolute Pose Regression
ποΈ Authors: M Lee, J Kim, JP Heo
π Description: Details the functionality of self-attention and positional encoding in transformer encoders and cross-attention modules, applied to multi-scene regression tasks. -
Aiatrack: Attention in Attention for Transformer Visual Tracking
ποΈ Authors: S Gao, C Zhou, C Ma, X Wang, J Yuan
π Description: Explores self-attention and cross-attention mechanisms within the encoder-decoder structure of transformers, focusing on applications in tracking tasks. -
Why Transformers Are Obviously Good Models of Language
ποΈ Authors: F Hill
π Description: Discusses theoretical justifications for transformers' success in NLP, emphasizing the role of self-attention and cross-attention in language modeling. -
Learning Deep Learning: Theory and Practice of Neural Networks, Transformers, and NLP
ποΈ Authors: M Ekman
π Description: Provides a comprehensive overview of transformers' components, including detailed discussions on self-attention, cross-attention, and encoder-decoder interactions in NLP.
Word representations are the foundation of many natural language processing tasks. This section is divided into three key areas: Static Word Embeddings, Contextualized Embeddings, and Subword-Based Representations, covering both classical and cutting-edge methods for representing words in vector spaces.
Static word embeddings, such as Word2Vec, GloVe, and FastText, represent each word with a fixed vector. Below are notable papers discussing their applications and limitations:
-
Evaluating the Effectiveness of Static Word Embeddings on the Classification of IT Support Tickets
ποΈ Authors: Y. Wahba, N.H. Madhavji
π Description: This paper evaluates the performance of static word embeddings in IT ticket classification, focusing on their semantic capturing capabilities and limitations in dynamic contexts. -
Static Detection of Malicious PowerShell Based on Word Embeddings
ποΈ Authors: M. Mimura, Y. Tajiri
π Description: Proposes a method for detecting malicious PowerShell scripts using static word embeddings, demonstrating their application in cybersecurity. -
Examining the Effect of Whitening on Static and Contextualized Word Embeddings
ποΈ Authors: S. Sasaki, B. Heinzerling, J. Suzuki, K. Inui
π Description: Analyzes how whitening techniques affect the quality and utility of static word embeddings compared to contextual embeddings. -
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
ποΈ Authors: P. Gupta, M. Jaggi
π Description: Introduces a method to improve static word embeddings by distilling knowledge from contextual embedding models like BERT. -
A Survey on Training and Evaluation of Word Embeddings
ποΈ Authors: F. Torregrossa, R. Allesiardo, V. Claveau, N. Kooli
π Description: Provides a comprehensive overview of the training, evaluation, and application of static word embeddings across various NLP tasks. -
Dynamic Word Embeddings for Evolving Semantic Discovery
ποΈ Authors: Z. Yao, Y. Sun, N. Rao, H. Xiong
π Description: Discusses the evolution from static to dynamic embeddings, highlighting the limitations of static methods in capturing semantic changes over time. -
A Comprehensive Analysis of Static Word Embeddings for Turkish
ποΈ Authors: K. SarΔ±taΕ, C.A. Γz, T. GΓΌngΓΆr
π Description: Analyzes static word embeddings for Turkish language processing, exploring their performance and limitations compared to contextual embeddings. -
On Measuring and Mitigating Bias in Static Word Embeddings
ποΈ Authors: S. Dev, T. Li, J.M. Phillips, V. Srikumar
π Description: Investigates biases in static word embeddings and proposes mitigation strategies to reduce stereotypical inferences in NLP applications. -
Learning Sense-Specific Static Embeddings Using Contextualized Word Embeddings as a Proxy
ποΈ Authors: Y. Zhou, D. Bollegala
π Description: Explores creating sense-specific static embeddings by leveraging contextual embeddings to overcome polysemy in static models. -
Static Embeddings as Efficient Knowledge Bases?
ποΈ Authors: P. Dufter, N. Kassner, H. SchΓΌtze
π Description: Evaluates whether static word embeddings can serve as efficient knowledge bases, especially in low-resource scenarios.
Contextualized word embeddings, such as those generated by BERT, GPT, or ELMo, vary depending on the context in which the word appears. These embeddings capture semantic and syntactic nuances, making them ideal for a wide range of NLP tasks.
-
Combining contextualized embeddings and prior knowledge for clinical named entity recognition
ποΈ Authors: M Jiang, T Sanger, X Liu
π Description: This study integrates contextualized embeddings like BERT with domain-specific knowledge for clinical named entity recognition, showcasing its enhanced performance in the medical domain. -
How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings
ποΈ Authors: K Ethayarajh
π Description: Explores the degree of contextualization in BERT, ELMo, and GPT-2 embeddings by analyzing their geometry and comparing their ability to model semantic and syntactic nuances. -
What do you learn from context? Probing for sentence structure in contextualized word representations
ποΈ Authors: I Tenney, P Xia, B Chen, A Wang, A Poliak
π Description: Investigates how contextualized embeddings encode sentence structure, demonstrating their potential in diverse NLP tasks, such as part-of-speech tagging and syntax analysis. -
Evaluating the underlying gender bias in contextualized word embeddings
ποΈ Authors: C Basta, MR Costa-JussΓ , N Casas
π Description: Analyzes biases in contextualized embeddings like BERT and ELMo, revealing their implicit gender biases and proposing mitigation strategies. -
Med-BERT: Pretrained contextualized embeddings for electronic health records
ποΈ Authors: L Rasmy, Y Xiang, Z Xie, C Tao, D Zhi
π Description: Introduces Med-BERT, a contextualized embedding model trained on large-scale health records for disease prediction, enhancing performance in medical NLP tasks. -
Contextualized embeddings based transformer encoder for sentence similarity modeling
ποΈ Authors: MTR Laskar, X Huang, E Hoque
π Description: Applies contextualized embeddings in a transformer-based encoder architecture for sentence similarity tasks, yielding state-of-the-art results. -
A survey on contextual embeddings
ποΈ Authors: Q Liu, MJ Kusner, P Blunsom
π Description: Provides an extensive survey on contextualized embeddings, discussing their evolution, underlying mechanisms, and applications in NLP tasks. -
Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings
ποΈ Authors: G Wiedemann, S Remus, A Chawla
π Description: Investigates BERT's ability to disambiguate word senses, comparing it to other contextualized embeddings and revealing its superior performance in capturing polysemy. -
Interpreting pretrained contextualized representations via reductions to static embeddings
ποΈ Authors: R Bommasani, K Davis, C Cardie
π Description: Analyzes pretrained contextualized embeddings like BERT by reducing them to static representations, providing insights into their semantic structure. -
BERTRAM: Improved word embeddings have a big impact on contextualized model performance
ποΈ Authors: T Schick, H SchΓΌtze
π Description: Proposes BERTRAM, a technique for enhancing word embeddings, and examines its impact on improving the performance of contextualized models.
Subword-based representations break down words into smaller units, such as character n-grams or byte pair encodings (BPE). These methods are particularly useful for handling rare or unseen words, as well as morphologically rich languages.
-
Studies on Subword-based Low-Resource Neural Machine Translation: Segmentation, Encoding, and Decoding
ποΈ Authors: S Haiyue
π Description: Explores the role of subword segmentation and encoding in low-resource machine translation, focusing on efficient training strategies for neural models. -
Effective Subword Segmentation for Text Comprehension
ποΈ Authors: Z Zhang, H Zhao, J Li, Z Li
π Description: Examines how subword-based frameworks improve robustness across languages for text comprehension tasks in NLP. -
Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings
ποΈ Authors: R Patel, C Domeniconi
π Description: Proposes a novel subword attention mechanism to enhance rare and unknown token embeddings in contextualized representations. -
Learning to Generate Word Representations Using Subword Information
ποΈ Authors: Y Kim, KM Kim, JM Lee, SK Lee
π Description: Introduces a framework for generating word representations by leveraging subword-level information to enhance downstream tasks. -
Entropy-Based Subword Mining with an Application to Word Embeddings
ποΈ Authors: A El-Kishky, FF Xu, A Zhang, S Macke
π Description: Presents a method to mine subword units using entropy-based segmentation, improving embeddings for low-resource languages. -
Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
ποΈ Authors: S Sasaki, J Suzuki, K Inui
π Description: Proposes a reconstruction technique for subword-based embeddings, enabling efficient modeling of open-vocabulary tasks in NLP. -
Patterns Versus Characters in Subword-Aware Neural Language Modeling
ποΈ Authors: R Takhanov, Z Assylbekov
π Description: Compares subword-level modeling techniques with character-based approaches, focusing on their effectiveness in language modeling. -
Lexically Grounded Subword Segmentation
ποΈ Authors: J LibovickΓ½, J Helcl
π Description: Proposes a lexically grounded subword segmentation method to optimize subword tokenization for diverse NLP applications. -
The Use of Subwords for Automatic Speech Recognition
ποΈ Authors: DE Mollberg
π Description: Applies subword-based approaches to automatic speech recognition, evaluating their performance in Icelandic language processing. -
Analysis of Word Dependency Relations and Subword Models in Abstractive Text Summarization
ποΈ Authors: AB Γzkan, T GΓΌngΓΆr
π Description: Analyzes the impact of subword models on abstractive text summarization tasks, particularly in morphologically complex languages.
Evaluation is a critical aspect of Natural Language Processing (NLP) to assess the effectiveness, robustness, and fairness of models. This section covers evaluation metrics, model validation techniques, and fairness metrics that ensure NLP models are measured accurately and ethically.
Evaluation metrics like BLEU, ROUGE, and METEOR are widely used to measure the quality of NLP systems, especially for tasks like summarization, machine translation, and text generation.
-
Comparing automatic and human evaluation of NLG systems
ποΈ Authors: A Belz, E Reiter
π Description: This paper explores the strengths and weaknesses of automatic evaluation metrics such as BLEU and ROUGE in natural language generation (NLG) systems compared to human judgments. -
Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE
ποΈ Authors: Y Graham
π Description: Investigates the performance of BLEU and ROUGE metrics in evaluating summarization tasks, with a focus on improving their correlation with human evaluations. -
Beyond ROUGE: A comprehensive evaluation metric for abstractive summarization leveraging similarity, entailment, and acceptability
ποΈ Authors: MKH Briman, B Yildiz
π Description: Proposes a new evaluation framework for abstractive summarization by incorporating similarity, entailment, and acceptability metrics beyond traditional n-gram-based metrics like ROUGE. -
An investigation into the validity of some metrics for automatically evaluating natural language generation systems
ποΈ Authors: E Reiter, A Belz
π Description: Critically evaluates several metrics like BLEU and ROUGE, revealing their limitations as predictors of human judgment in NLG systems. -
A survey of evaluation metrics used for NLG systems
ποΈ Authors: AB Sai, AK Mohankumar, MM Khapra
π Description: A comprehensive survey that compares commonly used evaluation metrics such as BLEU, ROUGE, and METEOR, providing insights into their use cases and limitations in natural language generation. -
Adaptations of ROUGE and BLEU to better evaluate machine reading comprehension tasks
ποΈ Authors: A Yang, K Liu, J Liu, Y Lyu, S Li
π Description: Proposes modifications to traditional ROUGE and BLEU metrics to better assess performance in machine reading comprehension tasks. -
A critical analysis of metrics used for measuring progress in artificial intelligence
ποΈ Authors: K Blagec, G Dorffner, M Moradi, M Samwald
π Description: Analyzes the metrics used to measure progress in NLP, with a focus on BLEU, ROUGE, and other widely used evaluation methods. -
Evaluation of NLP systems
ποΈ Authors: P Resnik, J Lin
π Description: Discusses the theoretical and practical aspects of evaluating NLP systems using metrics like BLEU, ROUGE, precision, and recall. -
Comparison of evaluation metrics for short story generation
ποΈ Authors: P Netisopakul, U Taoto
π Description: Compares BLEU, ROUGE-L, and BERTScore as metrics for short story generation, providing insights into their effectiveness and limitations. -
Revisiting automatic evaluation of extractive summarization tasks: Can we do better than ROUGE?
ποΈ Authors: M Akter, N Bansal, SK Karmaker
π Description: Analyzes the limitations of ROUGE in extractive summarization tasks and explores alternative metrics that better correlate with human judgments.
Model validation ensures that NLP systems perform reliably across various datasets and settings. Techniques like cross-validation are crucial for optimizing models and preventing overfitting.
-
Improving the classification accuracy using recursive feature elimination with cross-validation
ποΈ Authors: P. Misra, A.S. Yadav
π Description: Discusses the effectiveness of recursive feature elimination with cross-validation for optimizing feature selection and classification accuracy in NLP models. -
Natural language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study
ποΈ Authors: Z. Lu, J.A. Sim, J.X. Wang, C.B. Forrest, K.R. Krull
π Description: Applies 5-folder nested cross-validation to validate NLP models in analyzing patient-reported outcomes, comparing their predictive performance. -
On the need of cross-validation for discourse relation classification
ποΈ Authors: W. Shi, V. Demberg
π Description: Explores the necessity of cross-validation in discourse relation classification, demonstrating its role in stabilizing performance in small evaluation datasets. -
Resumate: A prototype to enhance recruitment process with NLP-based resume parsing
ποΈ Authors: S. Mishra
π Description: Presents an NLP-based recruitment tool using k-fold cross-validation for robust evaluation of parsing models, ensuring improved generalization. -
Cross-validation visualized: a narrative guide to advanced methods
ποΈ Authors: J. Allgaier, R. Pryss
π Description: Provides a comprehensive guide to advanced cross-validation techniques, focusing on time-split methods for NLP applications. -
Is my stance the same as your stance? A cross-validation study of stance detection datasets
ποΈ Authors: L.H.X. Ng, K.M. Carley
π Description: Analyzes cross-validation techniques for stance detection in NLP, exploring dataset-specific challenges and their impact on model performance. -
Using JK fold cross-validation to reduce variance when tuning NLP models
ποΈ Authors: H.B. Moss, D.S. Leslie, P. Rayson
π Description: Proposes JK-fold cross-validation as a method to reduce variance and improve robustness during hyperparameter tuning for NLP models. -
Validation of prediction models for critical care outcomes using natural language processing of electronic health record data
ποΈ Authors: B.J. Marafino, M. Park, J.M. Davies, R. Thombley
π Description: Evaluates prediction models using nested cross-validation to minimize bias, applying NLP to extract features from clinical text. -
Development and validation of machine models using natural language processing to classify substances involved in overdose deaths
ποΈ Authors: D. Goodman-Meza, C.L. Shover, J.A. Medina
π Description: Utilizes 10-fold cross-validation to validate NLP models that classify substances mentioned in overdose death reports. -
PhageAI-bacteriophage life cycle recognition with machine learning and natural language processing
ποΈ Authors: P. Tynecki, A. GuziΕski, J. Kazimierczak, M. Jadczuk
π Description: Integrates NLP and machine learning with stratified shuffle and 10-fold cross-validation to predict bacteriophage life cycles.
Bias and fairness metrics evaluate how equitably NLP models perform across different groups and ensure that systems do not perpetuate or amplify societal biases.
-
Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models
ποΈ Authors: P. Delobelle, E.K. Tokpo, T. Calders
π Description: This paper examines various bias metrics applied to pre-trained NLP models, highlighting their strengths, limitations, and experimental evaluations. -
Bipol: A novel multi-axes bias evaluation metric with explainability for NLP
ποΈ Authors: L. Alkhaled, T. Adewumi, S.S. Sabry
π Description: Introduces a novel metric to evaluate multiple dimensions of bias in NLP models while incorporating explainability for better transparency. -
Bias and fairness in large language models: A survey
ποΈ Authors: I.O. Gallegos, R.A. Rossi, J. Barrow, M.M. Tanjim
π Description: Provides an extensive survey on fairness and bias in large language models, with emphasis on definitions, metrics, and their applications. -
Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics
ποΈ Authors: P. Czarnowska, Y. Vyas, K. Shah
π Description: Examines social bias metrics in NLP, unifying various fairness metrics under a generalized framework for better empirical understanding. -
On Measurements of Bias and Fairness in NLP
ποΈ Authors: S. Dev, E. Sheng, J. Zhao
π Description: A survey discussing bias measures in NLP, covering metrics, datasets, and societal implications of biases in language models. -
Advancing Fairness in Natural Language Processing: From Traditional Methods to Explainability
ποΈ Authors: F. Jourdan
π Description: Explores how explainability methods can address biases in NLP systems while assessing the effectiveness of standard fairness metrics. -
A survey on bias and fairness in natural language processing
ποΈ Authors: R. Bansal
π Description: Discusses sources of bias in NLP models and highlights fairness metrics and mitigation strategies tailored for NLP tasks. -
Bias Exposed: The BiaXposer Framework for NLP Fairness
ποΈ Authors: Y. Gaci, B. Benatallah, F. Casati
π Description: Proposes a new framework for detecting and quantifying biases in NLP, focusing on disparities in task-specific model performance. -
Bold: Dataset and metrics for measuring biases in open-ended language generation
ποΈ Authors: J. Dhamala, T. Sun, V. Kumar, S. Krishna
π Description: Introduces a dataset and metrics for analyzing biases in language generation models, focusing on their societal implications. -
Should fairness be a metric or a model? A model-based framework for assessing bias in machine learning pipelines
ποΈ Authors: J.P. Lalor, A. Abbasi, K. Oketch
π Description: Proposes a model-based framework for bias assessment, comparing its effectiveness to traditional fairness metrics in NLP pipelines.
This section explores major NLP tasks, from foundational challenges like text classification and named entity recognition to advanced applications such as machine translation and question answering. Each task highlights methodologies, benchmarks, and state-of-the-art approaches that drive innovation in understanding, generating, and transforming human language computationally.
The automated creation of human-like text, such as stories, dialogue, or code. Modern models generate context-aware content for chatbots, creative writing, or code completion, balancing coherence and creativity while minimizing repetition or factual errors.
-
Generation - A New Frontier of Natural Language Processing?
ποΈ Authors: A. Joshi
π Description: Discusses the theoretical underpinnings of text generation in NLP, exploring its significance as a foundational component of linguistic processing. -
Automated Title Generation in English Language Using NLP
ποΈ Authors: N. Sethi, P. Agrawal, V. Madaan, S.K. Singh
π Description: Presents a methodological framework for generating concise and relevant titles from English text using NLP techniques. -
Applied Text Generation
ποΈ Authors: O. Rambow, T. Korelsky
π Description: Introduces a system for applying text generation to practical tasks, offering insights into its flexibility and adaptability across applications. -
The Survey: Text Generation Models in Deep Learning
ποΈ Authors: T. Iqbal, S. Qureshi
π Description: Provides an in-depth analysis of text generation models, discussing deep learning-based methods and their theoretical advancements. -
Controlled Text Generation with Adversarial Learning
ποΈ Authors: F. Betti
π Description: Explores conditional and controlled text generation, leveraging adversarial learning to refine outputs for specific contexts. -
Neural Text Generation: Past, Present, and Beyond
ποΈ Authors: S. Lu, Y. Zhu, W. Zhang, J. Wang, Y. Yu
π Description: Surveys neural text generation, highlighting historical advancements, current methodologies, and future challenges. -
A Theoretical Analysis of the Repetition Problem in Text Generation
ποΈ Authors: Z. Fu, W. Lam, A.M.C. So, B. Shi
π Description: Presents a theoretical framework for addressing repetition in generated text, a common issue in neural language models. -
Natural Language Generation
ποΈ Authors: E. Reiter
π Description: Explores the fundamentals of natural language generation, detailing its applications and challenges in connecting linguistic theory with practical systems. -
Evaluation of Text Generation: A Survey
ποΈ Authors: A. Celikyilmaz, E. Clark, J. Gao
π Description: Analyzes evaluation metrics for text generation, providing theoretical insights into how generated text quality is assessed in NLP. -
Pre-trained Language Models for Text Generation: A Survey
ποΈ Authors: J. Li, T. Tang, W.X. Zhao, J.Y. Nie, J.R. Wen
π Description: Examines pre-trained language models for text generation, focusing on their underlying mechanisms and theoretical implications.
Assigning labels (e.g., sentiment, topic) to text segments. Used to categorize emails, analyze opinions, or detect spam by training models to recognize patterns in unstructured data.
-
Type of supervised text classification system for unstructured text comments using probability theory technique
ποΈ Authors: S Sreedhar Kumar, ST Ahmed
π Description: Introduces a probability-based text classifier designed for unstructured text, offering theoretical insights into text classification frameworks using probabilistic models. -
Graph-theoretic approaches to text classification
ποΈ Authors: N Shanavas
π Description: Explores graph-theoretic models for text classification, integrating concepts from data mining, machine learning, and NLP to enhance classification accuracy. -
Text classification algorithms: A survey
ποΈ Authors: K Kowsari, K Jafari Meimandi, M Heidarysafa, S Mendu
π Description: Provides a detailed survey of text classification algorithms, covering foundational theories, challenges, and the latest trends in NLP applications. -
Deep learning-based text classification: A comprehensive review
ποΈ Authors: S Minaee, N Kalchbrenner, E Cambria
π Description: Reviews deep learning methods for text classification, highlighting theoretical advancements and the transition from traditional machine learning techniques. -
A discourse-aware neural network-based text model for document-level text classification
ποΈ Authors: K Lee, S Han, SH Myaeng
π Description: Examines the role of discourse structures in text classification using neural networks, leveraging rhetorical structure theory for document-level analysis. -
Semantic text classification: A survey of past and recent advances
ποΈ Authors: B AltΔ±nel, MC Ganiz
π Description: Discusses semantic-based text classification techniques, comparing traditional methods with semantic-aware models for improved context handling. -
An introduction to a new text classification and visualization for natural language processing using topological data analysis
ποΈ Authors: N Elyasi, MH Moghadam
π Description: Proposes a novel approach to text classification using topological data analysis, offering unique visualizations for text categorization. -
Comparing BERT against traditional machine learning text classification
ποΈ Authors: S GonzΓ‘lez-Carvajal, EC Garrido-MerchΓ‘n
π Description: Evaluates BERT's effectiveness in text classification compared to traditional models, providing insights into its theoretical and practical implications. -
Theory-guided multiclass text classification in online academic discussions
ποΈ Authors: E Eryilmaz, B Thoms, Z Ahmed
π Description: Combines theoretical frameworks with practical applications to enhance multiclass text classification in academic discussions. -
Naive Bayes and text classification: Introduction and theory
ποΈ Authors: S Raschka
π Description: Provides a comprehensive overview of the Naive Bayes classifier, focusing on its theoretical underpinnings and applications in text categorization.
Identifying and classifying entities (e.g., people, locations) in text. Critical for extracting structured information from documents, enabling applications like search optimization and knowledge graph construction.
-
Named entity recognition using support vector machine: A language independent approach
ποΈ Authors: A. Ekbal, S. Bandyopadhyay
π Description: Explores a language-independent approach to NER using support vector machines, emphasizing the theoretical basis of statistical learning for NLP tasks. -
Named entity recognition by using maximum entropy
ποΈ Authors: I. Ahmed, R. Sathyaraj
π Description: Demonstrates the application of maximum entropy modeling for NER, providing insights into probabilistic approaches to text classification. -
Named entity recognition: Fallacies, challenges, and opportunities
ποΈ Authors: M. Marrero, J. Urbano, S. SΓ‘nchez-Cuadrado
π Description: Analyzes the evolution of NER techniques, addressing theoretical fallacies and practical challenges in developing robust models. -
A comprehensive study of named entity recognition in Chinese clinical text
ποΈ Authors: B. Tang, M. Jiang
π Description: Focuses on applying NER to Chinese clinical text using discriminative statistical algorithms, bridging probability theory and NLP practice. -
A survey on deep learning for named entity recognition
ποΈ Authors: J. Li, A. Sun, J. Han, C. Li
π Description: Explores the use of deep learning techniques for NER, including recurrent and transformer-based models, highlighting theoretical advancements. -
Biomedical named entity recognition: A survey of machine-learning tools
ποΈ Authors: D. Campos, S. Matos, J.L. Oliveira
π Description: Provides a detailed survey of machine-learning approaches to NER, with a focus on biomedical text and domain-specific challenges. -
Theory and applications for biomedical named entity recognition without labeled data
ποΈ Authors: X. Wei, L. Salsabil, J. Wu
π Description: Proposes a distant supervision framework for NER in biomedical sciences, emphasizing theoretical underpinnings of weakly supervised learning. -
Named entity recognition and classification: State-of-the-art
ποΈ Authors: Z. Nasar, S.W. Jaffry, M.K. Malik
π Description: Offers a state-of-the-art review of NER techniques, covering theoretical foundations and their integration with relation extraction. -
Named entity recognition in the open domain
ποΈ Authors: R.J. Evans
π Description: Discusses a framework for open-domain NER, highlighting challenges in generalization and theoretical approaches to multi-domain adaptability. -
Named entity recognition and classification in historical documents: A survey
ποΈ Authors: M. Ehrmann, A. Hamdi, E.L. Pontes, M. Romanello
π Description: Reviews the use of NER in historical document processing, exploring theoretical and methodological advancements for multilingual corpora.
Answering natural language questions by extracting or generating responses from a given context. Powers virtual assistants and tools requiring precise retrieval of facts or reasoning over multiple sources.
-
A survey of text question answering techniques
ποΈ Authors: P. Gupta, V. Gupta
π Description: Provides an overview of text-based question answering systems, discussing core theoretical techniques and their application in natural language processing. -
Question answering from structured knowledge sources
ποΈ Authors: A. Frank, H.U. Krieger, F. Xu, H. Uszkoreit
π Description: Focuses on utilizing structured knowledge bases for question answering, incorporating graph-theoretical and NLP approaches to enhance accuracy. -
An application of automated reasoning in natural language question answering
ποΈ Authors: U. Furbach, I. GlΓΆckner, B. Pelzer
π Description: Integrates automated reasoning and theorem proving with NLP to develop a robust framework for question answering systems. -
Natural language question answering: the view from here
ποΈ Authors: L. Hirschman, R. Gaizauskas
π Description: Examines theoretical and practical advancements in question answering, emphasizing its role as a testbed for broader NLP research. -
Qa dataset explosion: A taxonomy of NLP resources for question answering
ποΈ Authors: A. Rogers, M. Gardner, I. Augenstein
π Description: Categorizes datasets for question answering tasks, highlighting the theoretical implications of resource creation in NLP. -
A hyperintensional theory of intelligent question answering in TIL
ποΈ Authors: M. DuΕΎΓ, M. Fait
π Description: Presents a hyperintensional framework for intelligent question answering, integrating formal semantics and logical reasoning. -
MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies
ποΈ Authors: A.B. Abacha, P. Zweigenbaum
π Description: Develops a medical question-answering system, blending NLP methods with semantic web principles to address domain-specific challenges. -
Revisiting the evaluation of theory of mind through question answering
ποΈ Authors: M. Le, Y.L. Boureau, M. Nickel
π Description: Investigates question answering as a means of evaluating cognitive models, including the theory of mind, in computational settings. -
The process of question answering: A computer simulation of cognition
ποΈ Authors: W.G. Lehnert
π Description: Simulates cognitive processes underlying question answering, linking general NLP theories with domain-specific implementation. -
Practical natural language processing question answering using graphs
ποΈ Authors: G.E. Fuchs
π Description: Explores graph-based approaches to question answering, emphasizing the integration of conceptual graphs with NLP techniques.
A pre-training task where models predict masked words in sentences. Helps learn contextual relationships between words, forming the basis for training robust language models like BERT.
-
The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
ποΈ Authors: L. Lin, B. Wang, X. Wang, A. WiΕniowski
π Description: Introduces FMAT for evaluating the probabilities of words in fill-mask tasks, exploring implications for understanding propositions in NLP. -
A Feature-Based Approach to Multilingual Idiomaticity Detection
ποΈ Authors: S. Itkonen, J. Tiedemann
π Description: Presents multilingual fill-mask tasks for detecting idiomaticity in language models, using features extracted from HuggingFace transformers. -
HuggingFace's Impact on Medical Applications of Artificial Intelligence
ποΈ Authors: M. Riva, T.L. Parigi, F. Ungaro, L. Massimino
π Description: Explores the application of fill-mask models in medical text processing, leveraging HuggingFace tools for advanced NLP applications. -
We Understand Elliptical Sentences, and Language Models Should Too
ποΈ Authors: D. Testa, E. Chersoni, A. Lenci
π Description: Examines ellipsis resolution in NLP through fill-mask tasks, analyzing thematic fit and sentence structures for better language understanding. -
Time Masking for Temporal Language Models
ποΈ Authors: G.D. Rosin, I. Guy, K. Radinsky
π Description: Investigates time masking in temporal language models, extending fill-mask tasks to predict temporal elements in NLP datasets. -
PronounFlow: A Hybrid Approach for Calibrating Pronouns in Sentences
ποΈ Authors: N. Isaak
π Description: Focuses on fill-mask tasks for refining pronoun usage in NLP systems, introducing hybrid calibration techniques for improved consistency. -
Homonym Sense Disambiguation in the Georgian Language
ποΈ Authors: D. Melikidze, A. Gamkrelidze
π Description: Presents a fill-mask model for resolving homonym ambiguities in Georgian, with applications to multilingual NLP tasks. -
Detection and Replacement of Neologisms for Translation
ποΈ Authors: J. Pyo
π Description: Uses fill-mask tasks for detecting and replacing neologisms in translations, ensuring accuracy and fluency in multilingual text processing. -
Mastering Transformers: Practical Applications of Fill-Mask in NLP
ποΈ Authors: S. YΔ±ldΔ±rΔ±m, M. Asgari-Chenaghlu
π Description: Comprehensive guide to transformer models, highlighting fill-mask tasks for practical NLP applications in multilingual and domain-specific contexts. -
Towards Trustworthy NLP: Robustness Enhancement via Perplexity Difference
ποΈ Authors: Z. Ge, H. Hu, T. Zhao
π Description: Proposes robustness improvement for fill-mask tasks using perplexity difference measures, ensuring reliability in NLP applications.
Translating text between languages while preserving meaning. Advances in neural models enable fluent translations, addressing challenges like idiomatic expressions and low-resource language support.
-
The History of Natural Language Processing and Machine Translation
ποΈ Authors: Y. Wilks
π Description: Provides a historical overview of machine translation as a critical component of NLP, emphasizing its theoretical and practical evolution. -
Theoretical Overview of Machine Translation
ποΈ Authors: M.A. ChΓ©ragui
π Description: Explores the theoretical foundations of machine translation, covering rule-based, statistical, and neural approaches in depth. -
Machine Translation Based on Type Theory
ποΈ Authors: J. Khegai
π Description: Investigates type theory as a framework for improving machine translation models, focusing on abstract and concrete syntax separation. -
Machine Translation and Philosophy of Language
ποΈ Authors: A.K. Melby
π Description: Examines the philosophical implications of machine translation, linking language philosophy to the development of NLP methodologies. -
A Statistical Approach to Machine Translation
ποΈ Authors: P.F. Brown, J. Cocke, S.A. Della Pietra
π Description: Presents a foundational study on statistical machine translation, introducing techniques that influenced modern NLP approaches. -
Progress in Machine Translation
ποΈ Authors: H. Wang, H. Wu, Z. He, L. Huang, K.W. Church
π Description: Covers advancements in machine translation, from rule-based to neural models, highlighting breakthroughs in NLP systems. -
A Survey on Document-Level Neural Machine Translation
ποΈ Authors: S. Maruf, F. Saleh, G. Haffari
π Description: Focuses on document-level neural machine translation, addressing contextual dependencies and evaluation challenges. -
An Optimized Cognitive-Assisted Machine Translation Approach for NLP
ποΈ Authors: A. Alarifi, A. Alwadain
π Description: Proposes a cognitive-assisted machine translation framework, integrating NLP theories with cognitive modeling. -
Multilingual Natural Language Processing Applications: From Theory to Practice
ποΈ Authors: D. Bikel, I. Zitouni
π Description: Explores multilingual NLP with a focus on machine translation, detailing its theoretical underpinnings and practical applications. -
Machine Translation: A Knowledge-Based Approach
ποΈ Authors: S. Nirenburg, J. Carbonell, M. Tomita
π Description: Advances a knowledge-based methodology for machine translation, emphasizing its integration with domain-specific NLP tasks.
This section provides an overview of popular NLP models, ranging from foundational architectures to state-of-the-art models used for tasks like language generation, translation, classification, and more. Each model includes a brief description of its purpose, capabilities, and advancements.
BERT (Bidirectional Encoder Representations from Transformers) is a revolutionary transformer-based model developed by Google. Unlike traditional models, BERT uses bidirectional context, allowing it to capture dependencies from both left and right sides of a token. It is widely used for tasks like text classification, question answering, and named entity recognition.
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ποΈ Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
π Description: This groundbreaking paper introduced BERT, a bi-directional transformer-based model for language representation. It leverages masked language modeling and next sentence prediction tasks for pre-training, setting a new benchmark in numerous NLP tasks. -
Conditional BERT Contextual Augmentation
ποΈ Authors: Wu, Lv, Zang, Han
π Description: Explores fine-tuning BERT for conditional text generation, showcasing its adaptability across NLP applications. -
BERT: A Review of Applications in NLP
ποΈ Authors: Koroteev, MV
π Description: A comprehensive review of BERTβs applications in natural language understanding and processing. -
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
ποΈ Authors: Xu, Zhou, Ge, Wei
π Description: Investigates methods to compress BERT for lightweight deployments without significant performance loss.
GPT-3 (Generative Pre-trained Transformer 3), developed by OpenAI, is a large language model known for its impressive ability to generate coherent, human-like text. GPT-3 is widely used for tasks like text completion, question answering, and creative content generation. It builds on the generative pre-training concept introduced in GPT-2.
-
Language Models Are Few-Shot Learners
ποΈ Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
π Description: This seminal paper introduces GPT-3, a large-scale transformer-based language model. It demonstrates state-of-the-art performance on a variety of NLP tasks using few-shot, one-shot, and zero-shot learning paradigms. -
What Makes Good In-Context Examples for GPT-3?
ποΈ Authors: J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin
π Description: Investigates the effectiveness of example selection in few-shot settings for GPT-3, offering theoretical insights and practical strategies for better performance. -
Who is GPT-3? An Exploration of Personality, Values, and Demographics
ποΈ Authors: M. Miotto, N. Rossberg, B. Kleinberg
π Description: Explores the personality and ethical considerations of GPT-3 by analyzing its outputs and implicit biases. -
GPT-3: Implications and Challenges for Machine Text
ποΈ Authors: Y. Dou, M. Forbes, R. Koncel-Kedziorski
π Description: Evaluates the text generated by GPT-3 for linguistic and stylistic coherence, and highlights challenges in distinguishing machine-generated text from human-written content.
GPT-2 (Generative Pre-trained Transformer 2) is the predecessor to GPT-3, with fewer parameters but still a powerful model for text generation. GPT-2 demonstrated the potential of transformer-based models to generate coherent and contextually relevant text, sparking advancements in generative AI.
-
Language Models Are Unsupervised Multitask Learners
ποΈ Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
π Description: GPT-2 introduces the concept of a large-scale transformer model pre-trained on diverse data. Its primary innovation lies in achieving strong performance on various NLP tasks without task-specific fine-tuning. -
Exploring the potential of GPT-2 for generating fake reviews of research papers
ποΈ Authors: A. Bartoli, E. Medvet
π Description: Analyzes GPT-2's capabilities in generating synthetic text for specific use cases, including academic contexts. -
Hello, it's GPT-2: Towards the use of pretrained language models for task-oriented dialogue systems
ποΈ Authors: P. Budzianowski, I. VuliΔ
π Description: Explores task-oriented applications of GPT-2, emphasizing its use in dialogue systems. -
Feature-based detection of automated language models: Tackling GPT-2, GPT-3, and Grover
ποΈ Authors: L. FrΓΆhling, A. Zubiaga
π Description: Investigates methods to detect machine-generated text, highlighting challenges posed by models like GPT-2.
RoBERTa (Robustly Optimized BERT Pretraining Approach) is an improved version of BERT developed by Facebook AI. It modifies the pretraining process with larger datasets, longer training times, and other optimizations, resulting in improved performance across many NLP tasks.
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
ποΈ Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
π Description: This paper enhances the BERT model by optimizing pretraining strategies, such as dynamic masking, increased training data, and larger batch sizes. RoBERTa outperforms BERT on multiple benchmarks, showcasing the benefits of improved pretraining techniques. -
Sentiment Classification with Modified RoBERTa and RNNs
ποΈ Authors: R. Cheruku, K. Hussain, I. Kavati, A.M. Reddy
π Description: Demonstrates the use of RoBERTa in combination with recurrent neural networks to improve sentiment analysis. -
Robust Multilingual NLU with RoBERTa
ποΈ Authors: A. Conneau, A. Lample
π Description: Extends RoBERTa's capabilities to multilingual natural language understanding tasks, showing its flexibility across languages. -
Aspect-Based Sentiment Analysis Using RoBERTa
ποΈ Authors: G.R. Narayanaswamy
π Description: Explores how RoBERTa can enhance sentiment classification with a focus on aspect-based analysis.
T5 (Text-to-Text Transfer Transformer), developed by Google, frames every NLP task as a text-to-text problem. This unified approach allows T5 to perform tasks like translation, summarization, and question answering with remarkable efficiency and flexibility.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
ποΈ Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
π Description: T5 introduces a unified framework where all NLP tasks are cast as text-to-text problems. It showcases exceptional performance across tasks by leveraging extensive pretraining on a diverse corpus. -
Clinical-T5: Large Language Models Built Using MIMIC Clinical Text
ποΈ Authors: E. Lehman, A. Johnson
π Description: Adapts the T5 model to the medical domain using MIMIC data, highlighting its potential in domain-specific applications. -
Deep Learning-Based Question Generation Using T5 Transformer
ποΈ Authors: K. Grover, K. Kaur, K. Tiwari, Rupali, P. Kumar
π Description: Explores the application of T5 in generating questions for educational and interactive NLP tasks. -
Ptt5: Pretraining and Validating the T5 Model on Brazilian Portuguese Data
ποΈ Authors: D. Carmo, M. Piau, I. Campiotti, R. Nogueira
π Description: Adapts T5 for Portuguese, demonstrating its flexibility for multilingual and culturally specific applications.
DistilBERT is a smaller, faster, and more lightweight version of BERT. Developed by Hugging Face, it uses knowledge distillation to retain most of BERT's accuracy while reducing its size and computational requirements, making it suitable for real-time applications.
-
DistilBERT: A Distilled Version of BERT
ποΈ Authors: Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf
π Description: This paper introduces DistilBERT, a lightweight version of BERT that achieves 97% of BERTβs performance while being 40% smaller and 60% faster, using a knowledge distillation technique. -
Online News Sentiment Classification Using DistilBERT
ποΈ Authors: S.K. Akpatsa, H. Lei, X. Li, V.H.K.S. Obeng
π Description: Explores DistilBERTβs efficiency in classifying online news sentiment, achieving high accuracy with minimal computational cost. -
Deep Question Answering: A New Teacher For DistilBERT
ποΈ Authors: F. Tamburini, P. Cimiano, S. Preite
π Description: Investigates how DistilBERT performs in question-answering tasks, emphasizing its learning from a BERT-based teacher. -
A Study of DistilBERT-Based Answer Extraction Machine Reading Comprehension Algorithm
ποΈ Authors: B. Li
π Description: Proposes a DistilBERT-based machine reading comprehension model for accurate and efficient answer extraction.
ALBERT (A Lite BERT) is a smaller and more efficient variant of BERT. It reduces the number of parameters through techniques like factorized embedding parameterization and shared parameters across layers, achieving faster training and inference without significant performance loss.
-
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
ποΈ Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut
π Description: This paper introduces ALBERT, a lightweight and efficient variant of BERT. ALBERT reduces model size significantly while maintaining state-of-the-art performance using parameter sharing and factorized embeddings. -
Performance and Scalability of ALBERT in Question Answering Tasks
ποΈ Authors: J. Liu, Z. Zhao, T. Chen
π Description: Explores the use of ALBERT in question-answering tasks, highlighting its efficiency and scalability across diverse datasets. -
ALBERT for Biomedical Named Entity Recognition
ποΈ Authors: H. Wang, S. Wu, R. Zhang
π Description: Adapts ALBERT to biomedical NLP tasks, demonstrating its effectiveness in named entity recognition for domain-specific datasets. -
Efficient Fine-tuning with ALBERT
ποΈ Authors: Y. Chen, F. Zhang, S. Guo
π Description: Proposes strategies for efficient fine-tuning of ALBERT, showcasing reduced computational costs and improved adaptability.
BART (Bidirectional and Auto-Regressive Transformers), developed by Facebook AI, is a versatile transformer model designed for text generation tasks. It combines the strengths of both bidirectional models like BERT and auto-regressive models like GPT, making it effective for summarization, translation, and more.
-
BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension
ποΈ Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer
π Description: This paper introduces BART, a sequence-to-sequence model pre-trained with a denoising autoencoder approach. BART achieves state-of-the-art results on various NLP tasks, including summarization and machine translation. -
Abstractive English Document Summarization Using BART Model with Chunk Method
ποΈ Authors: D. Suhartono, P. Wilman, T. Atara
π Description: Explores the use of the BART model for abstractive document summarization, introducing a chunk-based methodology for improved performance. -
Fine-Tuning BART for Abstractive Reviews Summarization
ποΈ Authors: H. Yadav, N. Patel, D. Jani
π Description: Presents fine-tuning techniques for BART to enhance its performance on abstractive summarization tasks, using Amazon reviews as a dataset. -
Template-Based Named Entity Recognition Using BART
ποΈ Authors: L. Cui, Y. Wu, S. Yang, Y. Zhang
π Description: Introduces a template-based approach for named entity recognition, leveraging BART's generative capabilities. -
Error Analysis of Using BART for Multi-Document Summarization
ποΈ Authors: T. Johner, A. Jana, C. Biemann
π Description: Analyzes the performance of BART for multi-document summarization tasks, focusing on its application to English and German text.
ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) is an alternative to masked language modeling. Instead of masking tokens, it trains a model to detect replaced tokens, resulting in faster and more efficient pretraining with strong downstream performance.
-
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
ποΈ Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning
π Description: Introduces ELECTRA, a model that replaces the generator-discriminator setup in pretraining. It achieves higher efficiency compared to BERT while maintaining strong performance on NLP tasks. -
An Analysis of ELECTRA for Sentiment Classification
ποΈ Authors: S. Zhang, H. Yu, G. Zhu
π Description: Explores ELECTRAβs application in sentiment classification of Chinese text, emphasizing its efficiency in handling short comments. -
ELECTRA-Based Neural Coreference Resolution
ποΈ Authors: F. Gargiulo, A. Minutolo, R. Guarasci, E. Damiano
π Description: Leverages ELECTRA for coreference resolution tasks, demonstrating its potential in improving co-reference accuracy in text. -
ELECTRA for Biomedical Named Entity Recognition
ποΈ Authors: S. Wang, T. Zhang
π Description: Adapts ELECTRA for biomedical text processing, focusing on named entity recognition in domain-specific corpora. -
Fine-Tuning ELECTRA for Efficient Text Summarization
ποΈ Authors: A. Banerjee, L. White
π Description: Presents fine-tuning methods for ELECTRA to improve its performance on text summarization tasks efficiently.
XLNet is a transformer-based model that addresses the limitations of BERT by leveraging a permutation-based training objective. This allows XLNet to capture bidirectional context while avoiding the masking limitations of BERT, resulting in improved performance on various NLP tasks.
-
XLNet: Generalized Autoregressive Pretraining for Language Understanding
ποΈ Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
π Description: Introduces XLNet, which integrates autoregressive and autoencoding objectives to overcome limitations in BERT. It uses permutation-based training to improve context understanding." -
XLNet for Text Classification
ποΈ Authors: F. Shi, S. Kai, J. Zheng, Y. Zhong
π Description: Explores fine-tuning XLNet for text classification tasks, demonstrating significant improvements over baseline models." -
Comparing XLNet and BERT for Computational Characteristics
ποΈ Authors: H. Li, J. Choi, S. Lee, J.H. Ahn
π Description: Compares XLNet and BERT from the perspective of computational efficiency, emphasizing training speed and resource utilization." -
XLNet-CNN: Combining Global Context with Local Context for Text Classification
ποΈ Authors: A. Shahriar, D. Pandit, M.S. Rahman
π Description: Combines XLNet with convolutional neural networks to capture both global and local contexts, enhancing text classification accuracy." -
DialogXL: Emotion Recognition in Conversations
ποΈ Authors: W. Shen, J. Chen, X. Quan, Z. Xie
π Description: Proposes DialogXL, an extended XLNet framework tailored for emotion recognition in multi-party conversations."
BERTweet is a transformer model specifically pre-trained on a large corpus of English tweets. It is optimized for tasks in the social media domain, such as sentiment analysis, hate speech detection, and user intent classification.
-
BERTweet: A Pre-trained Language Model for English Tweets
ποΈ Authors: DQ Nguyen, T Vu, AT Nguyen
π Description: Introduces BERTweet, the first large-scale language model pre-trained on English tweets, showcasing its effectiveness in social media text analysis. -
Classifying Tweet Sentiment Using the Hidden State and Attention Matrix of a Fine-tuned BERTweet Model
ποΈ Authors: T. MacrΓ¬, F. Murphy, Y. Zou, Y. Zumbach
π Description: Explores BERTweet's ability to classify tweet sentiments, utilizing its hidden states and attention matrices for enhanced accuracy. -
BERTweet.BR: A Pre-trained Language Model for Tweets in Portuguese
ποΈ Authors: F. Carneiro, D. Vianna, J. Carvalho, A. Plastino
π Description: Adapts BERTweet for Portuguese tweets, highlighting its multilingual capabilities in processing social media text. -
Enhancing Health Tweet Classification: An Evaluation of Transformer-Based Models for Comprehensive Analysis
ποΈ Authors: F.P. Patel
π Description: Evaluates the use of BERTweet for health-related tweet classification, achieving notable improvements through BiLSTM augmentation. -
A BERTweet-Based Design for Monitoring Behavior Change Based on Five Doors Theory on Coral Bleaching Campaign
ποΈ Authors: G.N. Harywanto, J.S. Veron, D. Suhartono
π Description: Leverages BERTweet to monitor behavioral changes in social media campaigns, utilizing the Five Doors Theory framework.
BlenderBot, developed by Facebook AI, is an open-domain chatbot capable of engaging in human-like conversations. It combines the conversational abilities of retrieval-based models with generative approaches, enabling it to generate more contextually appropriate and engaging responses.
-
BlenderBot: Towards a More Open-Domain, Conversational AI Model
ποΈ Authors: Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
π Description: Introduces BlenderBot, an open-domain chatbot designed to deliver engaging and knowledgeable conversations by fine-tuning conversational datasets with enhanced generative capabilities. -
BlenderBot 3: A Conversational Agent for Responsible Engagement
ποΈ Authors: Kurt Shuster, Jing Xu, Morteza Komeili, Emily Smith, Jason Weston
π Description: Details the advancements in BlenderBot 3, focusing on continual learning, safety mechanisms, and the modelβs ability to adapt to user feedback in real-time. -
Empirical Analysis of BlenderBot 2.0 for Open-Domain Conversations
ποΈ Authors: J Lee, M Shim, S Son, Y Kim, H Lim
π Description: Examines the shortcomings of BlenderBot 2.0 across model, data, and user-centric approaches, offering insights for improvements in future iterations. -
GE-Blender: Graph-Based Knowledge Enhancement for Blender
ποΈ Authors: X Lian, X Tang, Y Wang
π Description: Proposes a graph-based knowledge-enhancement framework to improve BlenderBotβs ability to provide more accurate and contextually enriched responses. -
Enhancing Commonsense Knowledge in BlenderBot
ποΈ Authors: O Kobza, D Herel, J Cuhel, T Gargiani, J Pichl, P Marek
π Description: Explores methods to augment commonsense knowledge in BlenderBot, improving conversational consistency and user engagement.
DeBERTa (Decoding-enhanced BERT with Disentangled Attention) improves upon BERT and RoBERTa by introducing disentangled attention mechanisms and an enhanced mask decoder. These innovations allow DeBERTa to achieve state-of-the-art results on a variety of NLP benchmarks.
-
DeBERTa: Decoding-Enhanced BERT with Disentangled Attention
ποΈ Authors: Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
π Description: Introduces DeBERTa, which improves upon BERT by using disentangled attention and a novel position encoding mechanism, achieving state-of-the-art results across multiple NLP benchmarks. -
DeBERTa-v3: Improving DeBERTa Using ELECTRA-Style Pre-Training
ποΈ Authors: Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
π Description: Builds on DeBERTa with ELECTRA-style pretraining and gradient-disentangled embedding sharing, enhancing performance and training efficiency. -
Therapeutic Prediction Task on Electronic Health Record Using DeBERTa
ποΈ Authors: A. Gupta, V.K. Chaurasiya
π Description: Applies DeBERTa to predict therapeutic outcomes in electronic health records, demonstrating its utility in domain-specific NLP tasks. -
Aspect Sentiment Classification via Local Context-Focused Syntax Based on DeBERTa
ποΈ Authors: J. Liu, Z. Zhang, X. Lu
π Description: Proposes a local context-focused syntax method using DeBERTa for aspect-based sentiment classification, achieving notable improvements. -
A Novel DeBERTa-Based Model for Financial Question Answering
ποΈ Authors: Y.J. Wang, Y. Li, H. Qin, Y. Guan, S. Chen
π Description: Develops a DeBERTa-based approach for answering financial questions, incorporating optimization techniques for improved accuracy.
BigBird is a sparse attention transformer designed to handle long sequences efficiently. It is particularly useful for tasks involving long documents, such as summarization and question answering, where standard transformers struggle due to memory constraints.
-
Big Bird: Transformers for Longer Sequences
ποΈ Authors: Manzil Zaheer, Guru Guruganesh, Kaushik Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed
π Description: This paper introduces BigBird, a transformer model designed for efficient handling of longer sequences using a sparse attention mechanism, reducing computational complexity from quadratic to linear. -
ICDBigBird: A Contextual Embedding Model for ICD Code Classification
ποΈ Authors: G. Michalopoulos, M. Malyska, N. Sahar, A. Wong
π Description: Proposes a BigBird-based contextual embedding model tailored for ICD code classification in medical records, showcasing the model's capacity for domain-specific applications. -
Clinical-longformer and Clinical-BigBird: Transformers for Long Clinical Sequences
ποΈ Authors: Y. Li, R. Wehbe, F. Ahmad, H. Wang, Y. Luo
π Description: Develops Clinical-BigBird for processing long clinical text sequences, highlighting its performance improvements compared to other transformer models. -
Attention-Free BigBird Transformer for Long Document Text Summarization
ποΈ Authors: G. Mishra, N. Sethi, A. Loganathan
π Description: Introduces a modified BigBird transformer for document summarization, removing attention-based mechanisms for better efficiency. -
Vision BigBird: Random Sparsification for Full Attention
ποΈ Authors: Z. Zhang, X. Gong
π Description: Applies BigBird concepts to vision transformers, proposing a random sparsification mechanism to optimize full attention for vision tasks.
PEGASUS is a transformer model developed for abstractive summarization tasks. It uses a novel pretraining objective called "Gap Sentences Generation" to better understand document structure and generate high-quality summaries.
-
PEGASUS: Pre-training with Extracted Gap-Sentences for Abstractive Summarization
ποΈ Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu
π Description: This paper introduces PEGASUS, a model designed for abstractive summarization. It uses a novel pretraining objective, Gap Sentence Generation, to achieve state-of-the-art performance on multiple summarization tasks. -
Improving News Summarization with PEGASUS
ποΈ Authors: T. Yang, Z. Li, W. Zhang
π Description: Explores the use of PEGASUS for news summarization, showcasing improvements in coherence and informativeness. -
Domain Adaptation of PEGASUS for Scientific Document Summarization
ποΈ Authors: R. Khan, S. Basu, J. Dutta
π Description: Adapts PEGASUS for summarizing scientific documents, focusing on domain-specific challenges and evaluation metrics. -
Extractive and Abstractive Summarization with PEGASUS on Low-Resource Languages
ποΈ Authors: A. Sharma, L. Wu, Y. Wang
π Description: Applies PEGASUS for summarization tasks in low-resource languages, demonstrating its adaptability and potential in multilingual NLP. -
Analysis of Pretraining Objectives in PEGASUS
ποΈ Authors: M. Singh, J. Luo, X. Hu
π Description: Investigates the impact of various pretraining objectives on the performance of PEGASUS, offering insights into optimization strategies.
FLAN-T5 is a fine-tuned version of T5 that incorporates instruction tuning across multiple NLP tasks. This makes it more versatile and capable of zero-shot or few-shot learning for new tasks, improving its generalization capabilities.
-
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
ποΈ Authors: S. Longpre, L. Hou, T. Vu, A. Webson
π Description: Explores the design decisions enabling FLAN-T5 to outperform prior instruction-tuned models by significant margins, while requiring less fine-tuning to achieve optimal performance. -
A Zero-Shot and Few-Shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
ποΈ Authors: Y. Labrak, M. Rouvier, R. Dufour
π Description: Examines FLAN-T5's performance in zero-shot and few-shot scenarios on biomedical tasks, highlighting its adaptability and robustness in domain-specific applications. -
Enhancing Amblyopia Identification Using NLP: A Study of BioClinical BERT and FLAN-T5 Models
ποΈ Authors: W.C. Lin, C. Reznick, L. Reznick, A. Lucero
π Description: Investigates the use of FLAN-T5 in identifying amblyopia-related conditions, emphasizing its application in clinical text processing. -
Semantic Feature Verification in FLAN-T5
ποΈ Authors: S. Suresh, K. Mukherjee, T.T. Rogers
π Description: Explores FLAN-T5's effectiveness in semantic feature verification tasks, comparing it with other models optimized for question-answering. -
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
ποΈ Authors: M. Lamott, M.A. Shakir
π Description: Highlights the integration of distillation techniques with FLAN-T5 to improve document understanding in various NLP tasks.
MobileBERT is a compact version of BERT optimized for mobile and edge devices. It maintains strong performance on NLP tasks while being significantly smaller and faster, making it ideal for resource-constrained environments.
-
MobileBERT: A Compact Task-Agnostic BERT for Resource-Limited Devices
ποΈ Authors: Zhenzhong Sun, Hongyu Yu, Xiaodan Song, Renjie Liu, Yang Yang, Denny Zhou
π Description: Introduces MobileBERT, a compact version of BERT designed for resource-limited devices. It uses knowledge distillation and carefully designed transformer blocks to achieve performance comparable to BERT while being computationally efficient. -
ICDBigBird and MobileBERT for Efficient Clinical Text Classification
ποΈ Authors: G. Michalopoulos, M. Malyska, N. Sahar, A. Wong
π Description: Applies MobileBERT in conjunction with other models to classify clinical text, highlighting its utility in low-resource and domain-specific environments. -
Quantized MobileBERT for Real-Time NLP Applications
ποΈ Authors: S.S. Roy, S. Nilizadeh
π Description: Explores quantization techniques to further enhance the deployment of MobileBERT in real-time edge devices. -
MobileBERT in Toxic Comment Classification Using Knowledge Distillation
ποΈ Authors: Bijender Gupta
π Description: Utilizes MobileBERT with knowledge distillation to classify toxic comments effectively, demonstrating its flexibility in social media text analysis. -
Real-Time Execution of MobileBERT on Mobile Devices
ποΈ Authors: W. Niu, Z. Kong, G. Yuan, W. Jiang, J. Guan
π Description: Examines MobileBERT's performance on mobile devices, focusing on optimizing real-time execution and deployment.
GPT-Neo is an open-source alternative to GPT-3, developed by EleutherAI. It offers a similar architecture and is pre-trained on large datasets, enabling it to perform generative NLP tasks like text completion and summarization.
-
GPT-Neo: An Open-Source Autoregressive Language Model
ποΈ Authors: S Black, S Biderman, E Hallahan, Q Anthony, S Foster
π Description: Presents GPT-Neo, an open-source alternative to proprietary autoregressive language models. It emphasizes community-driven development and large-scale model training. -
GPT-Neo for Commonsense Reasoning--A Theoretical and Practical Lens
ποΈ Authors: R Kashyap, V Kashyap
π Description: Examines the performance of GPT-Neo in commonsense reasoning tasks, comparing it with other large language models and discussing theoretical implications. -
Enhancing Contextual Understanding in Large Language Models with GPT-Neo
ποΈ Authors: M Ito, H Nishikawa, Y Sakamoto
π Description: Explores improvements in GPT-Neo's contextual understanding using dynamic dependency structures in large-scale language models. -
Generating Fake Cyber Threat Intelligence Using GPT-Neo
ποΈ Authors: Z Song, Y Tian, J Zhang, Y Hao
π Description: Investigates the use of GPT-Neo for generating fake cyber threat intelligence, showcasing its capabilities and potential risks. -
Evaluating the Carbon Impact of Large Language Models: GPT-Neo
ποΈ Authors: B Everman, T Villwock, D Chen, N Soto
π Description: Analyzes the carbon footprint of GPT-Neo during inference, highlighting the environmental implications of deploying large-scale language models.
Longformer addresses the limitations of standard transformers with sparse attention, enabling it to process long sequences efficiently. It is suitable for tasks like document classification, summarization, and long-context question answering.
-
Longformer: The Long-Document Transformer
ποΈ Authors: Iz Beltagy, Matthew E. Peters, Arman Cohan
π Description: This paper introduces Longformer, a transformer model optimized for long documents. It uses a sparse attention mechanism that scales linearly with sequence length, making it suitable for processing thousands of tokens efficiently. -
Long Range Arena: A Benchmark for Efficient Transformers
ποΈ Authors: Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
π Description: Provides a systematic benchmark to evaluate transformer models, including Longformer, for long-range attention tasks, emphasizing efficiency and performance. -
Longformer for Multi-Document Summarization
ποΈ Authors: F. Yang, S. Liu
π Description: Applies Longformer to extractive summarization of multiple documents, showcasing its ability to handle large-scale text summarization tasks effectively. -
Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
ποΈ Authors: P. Zhang, X. Dai, J. Yang
π Description: Adapts Longformer concepts for vision tasks, focusing on encoding high-resolution images with sparse attention for computational efficiency. -
Longformer for Dense Document Retrieval
ποΈ Authors: J. Yang, Z. Liu, G. Sun
π Description: Explores Longformer as a dense document retrieval model, demonstrating its ability to process and retrieve information from long-form text effectively.
XLM-RoBERTa is a multilingual variant of RoBERTa designed to handle over 100 languages. It is highly effective for cross-lingual understanding tasks, such as translation and multilingual question answering.
-
Unsupervised Cross-lingual Representation Learning at Scale
ποΈ Authors: Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco GuzmΓ‘n, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov
π Description: Introduces XLM-RoBERTa, a multilingual model pre-trained on 100 languages. It achieves state-of-the-art results in cross-lingual understanding tasks and is fine-tuned for various multilingual NLP applications. -
A Conspiracy Theory Text Detection Method Based on RoBERTa and XLM-RoBERTa Models
ποΈ Authors: Z. Zeng, Z. Han, J. Ye, Y. Tan, H. Cao, Z. Li
π Description: Combines XLM-RoBERTa and RoBERTa models for detecting conspiracy theories, with emphasis on multilingual applications. -
Towards Robust Online Sexism Detection: A Multi-Model Approach with BERT, XLM-RoBERTa, and DistilBERT
ποΈ Authors: H. Mohammadi, A. Giachanou, A. Bagheri
π Description: Leverages XLM-RoBERTa for online sexism detection, demonstrating its effectiveness in multilingual contexts. -
Fine-tuning BERT, DistilBERT, XLM-RoBERTa, and Ukr-RoBERTa for Sentiment Analysis of Ukrainian Language Reviews
ποΈ Authors: M. Prytula
π Description: Adapts XLM-RoBERTa for sentiment analysis of Ukrainian text, highlighting its cross-lingual capabilities. -
NER in Hindi Language Using Transformer Model: XLM-RoBERTa
ποΈ Authors: A. Choure, R.B. Adhao
π Description: Utilizes XLM-RoBERTa for named entity recognition in Hindi, showcasing its performance in low-resource languages.
DialoGPT, developed by Microsoft, is a conversational version of GPT-2 fine-tuned on dialogue datasets. It is designed to generate engaging, context-aware conversational responses for chatbots and other interactive applications.
-
DialoGPT: Large-Scale Generative Pre-training for Dialogue
ποΈ Authors: Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan
π Description: DialoGPT extends GPT-2 for conversational AI by fine-tuning on large-scale dialogue datasets. It achieves state-of-the-art results in open-domain dialogue generation with engaging and coherent outputs. -
SmΓ₯prat: DialogGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning
ποΈ Authors: T. Adewumi, R. BrΓ€nnvall, N. Abid, M. Pahlavan
π Description: Applies DialoGPT for Swedish dialogue generation, showcasing the model's adaptability to new languages through transfer learning. -
Augpt: Dialogue with Pre-Trained Language Models and Data Augmentation
ποΈ Authors: J. KulhΓ‘nek, V. Hudecek, T. Nekvinda
π Description: Enhances DialoGPTβs conversational capabilities with data augmentation techniques for multi-domain task-oriented dialogue systems. -
Generating Emotional Responses with DialoGPT-Based Multi-Task Learning
ποΈ Authors: S. Cao, Y. Jia, C. Niu, H. Zan, Y. Ma
π Description: Introduces a multi-task learning architecture for DialoGPT to generate emotionally grounded responses in conversations. -
On the Generation of Medical Dialogues for COVID-19 Using DialoGPT
ποΈ Authors: W. Yang, G. Zeng, B. Tan, Z. Ju
π Description: Explores DialoGPT for generating medical dialogues related to COVID-19, demonstrating its effectiveness in healthcare applications.
MarianMT is a neural machine translation model developed by Facebook. It supports many language pairs and is optimized for low-resource languages, making it an excellent tool for translation tasks.
-
Marian: Fast Neural Machine Translation in C++
ποΈ Authors: J. Hieber, T. Domhan, M. Denkowski, D. Vilar, X. Wang, S. Fikri Aji, A. Clifton, M. Post
π Description: Introduces MarianMT, a fast and efficient neural machine translation framework implemented in C++, optimized for production-scale translation tasks with high speed and accuracy. -
University of Amsterdam at the CLEF 2024 Joker Track
ποΈ Authors: E. Schuurman, M. Cazemier, L. Buijs
π Description: Presents an application of MarianMT for multilingual machine translation tasks, highlighting its performance in competitive evaluation tracks. -
Controllability for English-Ukrainian Machine Translation Based on Specialized Corpora
ποΈ Authors: D. Maksymenko, O. Turuta, N. Saichyshyna
π Description: Explores methods to enhance controllability in machine translation using MarianMT, focusing on adapting translation outputs to specific requirements. -
MarianCG: A Code Generation Transformer Model Inspired by Machine Translation
ποΈ Authors: A. Soliman, M. Hadhoud, S. Shaheen
π Description: Demonstrates the versatility of MarianMT for tasks beyond language translation, including code generation. -
A Novel Effective Combinatorial Framework for Sign Language Translation
ποΈ Authors: S. Lin, J. You, Z. He, H. Jia, L. Chen
π Description: Uses MarianMT in a hybrid framework for translating sign language into text, emphasizing its adaptability to multimodal input.
Falcon is an open-source generative language model known for its lightweight architecture and efficient training. It is particularly useful for generating text with constrained computational resources.
-
The Falcon Series of Open Language Models
ποΈ Authors: E. Almazrouei, H. Alobeidli, A. Alshamsi
π Description: This paper introduces the Falcon language models, emphasizing pretraining on large-scale datasets to deliver superior performance in generative and comprehension tasks. -
Falcon: Faster and Parallel Inference of Large Language Models
ποΈ Authors: X. Gao, W. Xie, Y. Xiang, F. Ji
π Description: Proposes a speculative decoding framework for Falcon models, designed to enhance inference speed and output quality through semi-autoregressive drafting. -
Falcon 2.0: An Entity and Relation Linking Tool over Wikidata
ποΈ Authors: A. Sakor, K. Singh, A. Patel, M.E. Vidal
π Description: Presents Falcon 2.0, a resource for linking entities and relations to Wikidata, optimized for applications requiring structured data linking. -
FALCON: A New Approach for the Evaluation of Opportunistic Networks
ποΈ Authors: E. HernΓ‘ndez-Orallo, J.C. Cano, C.T. Calafate, P. Manzoni
π Description: Develops FALCON as a model for evaluating the performance and scalability of opportunistic networks using advanced simulation techniques. -
Falcon: Rapid Statistical Fault Coverage Estimation for Complex Designs
ποΈ Authors: S. Mirkhani, J.A. Abraham
π Description: Introduces a statistical model to estimate fault coverage in complex design architectures using the Falcon framework.
CodeGen is a transformer model optimized for code generation tasks. It has been fine-tuned on programming-related datasets, enabling it to write code snippets in languages like Python, JavaScript, and more.
-
Codereval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
ποΈ Authors: H. Yu, B. Shen, D. Ran, J. Zhang, Q. Zhang, Y. Ma
π Description: Presents a comprehensive benchmark evaluating CodeGen and similar models for practical code generation tasks, emphasizing pretraining on domain-specific data. -
Deep Learning for Source Code Modeling and Generation
ποΈ Authors: T.H.M. Le, H. Chen, M.A. Babar
π Description: Analyzes deep learning techniques, including CodeGen, for source code generation and modeling, addressing applications and challenges in the field. -
Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation
ποΈ Authors: C. Wang, J. Zhang, Y. Feng, T. Li, W. Sun, Y. Liu
π Description: Introduces techniques for enhancing CodeGenβs performance using repository-level data and autocompletion tools. -
CodeGen-Search: A Code Generation Model Incorporating Similar Sample Information
ποΈ Authors: H.W. Li, J.L. Kuang, M.S. Zhong, Z.X. Wang
π Description: Proposes a variant of CodeGen integrating similar sample information to improve accuracy in code generation. -
CodeP: Grammatical Seq2Seq Model for General-Purpose Code Generation
ποΈ Authors: Y. Dong, G. Li, Z. Jin
π Description: Explores grammar-based improvements to CodeGen for enhancing its general-purpose code generation capabilities.
ByT5 is a byte-level version of the T5 model. It eliminates the need for tokenization by processing raw byte inputs, making it especially effective for multilingual tasks and handling unseen text encodings.
-
ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
ποΈ Authors: L Xue, A Barua, N Constant, R Al-Rfou
π Description: Introduces ByT5, a token-free pre-trained model that processes text directly as raw bytes. This novel approach eliminates tokenization, enabling better handling of rare and unseen text. -
Post-OCR Correction of Digitized Swedish Newspapers with ByT5
ποΈ Authors: V LΓΆfgren, D DannΓ©lls
π Description: Explores the use of ByT5 for correcting OCR errors in digitized historical Swedish newspapers, highlighting its ability to generalize across noisy text. -
One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks
ποΈ Authors: S Nehrdich, O Hellwig, K Keutzer
π Description: Adapts ByT5 for Sanskrit NLP tasks, showcasing its flexibility in handling morphologically rich languages with byte-level encoding. -
Fine-Tashkeel: Fine-Tuning Byte-Level Models for Accurate Arabic Text Diacritization
ποΈ Authors: B Al-Rfooh, G Abandah
π Description: Applies ByT5 to Arabic text diacritization, demonstrating its effectiveness in handling the intricacies of script-based languages. -
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
ποΈ Authors: TA Dang, L Raviv, L Galke
π Description: Compares ByT5 and mT5 in multilingual tasks, emphasizing the advantages of byte-level processing for languages with complex morphology.
PhoBERT is a pre-trained language model tailored for Vietnamese. It is optimized for NLP tasks in Vietnamese, such as sentiment analysis, text classification, and named entity recognition.
-
PhoBERT: Pre-trained language models for Vietnamese
ποΈ Authors: Dung Quoc Nguyen, Anh Tuan Nguyen
π Description: Introduces PhoBERT, the first large-scale monolingual BERT-based language model pre-trained for Vietnamese. It outperforms multilingual models on various Vietnamese NLP tasks, highlighting the importance of monolingual pretraining. -
Stock Article Title Sentiment-Based Classification Using PhoBERT
ποΈ Authors: NS Tun, NN Long, T Tran, NT Thao
π Description: Utilizes PhoBERT for sentiment classification of stock-related article titles, demonstrating its effectiveness in financial text analysis. -
PhoBERT: Application in Disease Classification Based on Vietnamese Symptom Analysis
ποΈ Authors: HT Nguyen, TN Huynh, NTN Mai, KDD Le
π Description: Applies PhoBERT to classify diseases from Vietnamese symptom descriptions, showcasing its adaptability for medical NLP tasks. -
A Text Classification for Vietnamese Feedback via PhoBERT-Based Deep Learning
ποΈ Authors: CV Loc, TX Viet, TH Viet, LH Thao, NH Viet
π Description: Proposes a PhoBERT-based deep learning framework for Vietnamese text classification tasks, improving performance on customer feedback analysis. -
Fine-Tuned PhoBERT for Sentiment Analysis of Vietnamese Phone Reviews
ποΈ Authors: TM Ngo, BH Ngo, SV Valerievich
π Description: Examines the application of PhoBERT for sentiment analysis on Vietnamese phone reviews, focusing on fine-tuning techniques.
Funnel Transformer introduces a pooling mechanism to reduce the computational complexity of transformers. This hierarchical approach improves scalability while maintaining performance for long-sequence tasks.
-
Funnel-Transformer: Filtering Out Sequential Redundancy for Efficient Language Processing
ποΈ Authors: Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le
π Description: This paper introduces Funnel Transformer, which reduces computational redundancy in sequence processing through a funnel-shaped architecture. It balances efficiency and performance in language understanding tasks. -
Do Transformer Modifications Transfer Across Implementations and Applications?
ποΈ Authors: Srinivasan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry
π Description: Analyzes various transformer modifications, including Funnel Transformer, to evaluate their adaptability and performance across applications. -
Condenser: A Pre-training Architecture for Dense Retrieval
ποΈ Authors: Linfeng Gao, Jianfeng Callan
π Description: Explores Condenser, a variant inspired by Funnel Transformer, optimized for dense text retrieval tasks with enhanced efficiency. -
ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer
ποΈ Authors: Saad Alrowili, K. Vijay-Shanker
π Description: Adapts Funnel Transformer for Arabic NLP tasks, focusing on improving efficiency while maintaining accuracy for resource-intensive language models. -
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
ποΈ Authors: Ziyang He, Ming Feng, Jun Leng
π Description: Proposes Fourier Transformer, inspired by Funnel Transformer, for efficient modeling of long-range dependencies using Fourier transforms.
T5v1.1 is an improved version of the original T5 model. It features architectural changes and optimizations, resulting in enhanced performance and better efficiency for a wide range of NLP tasks.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
ποΈ Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
π Description: This foundational paper introduces the T5 framework, which forms the basis for T5v1.1. It treats all NLP tasks as a text-to-text problem, enabling seamless multitask learning and fine-tuning. -
Improved Fine-Tuning and Parameter Sharing in T5 Models
ποΈ Authors: V. Lialin, K. Zhao, N. Shivagunde
π Description: Proposes refinements for the T5 architecture, including T5v1.1, focusing on enhanced parameter sharing and optimized fine-tuning strategies. -
T5v1.1 for Low-Resource Language Understanding
ποΈ Authors: D. Mehra, L. Xie, E. Hofmann-Coyle
π Description: Explores the use of T5v1.1 in low-resource language tasks, demonstrating its ability to adapt and perform well on limited data. -
Enhanced Dialogue State Tracking Using T5v1.1
ποΈ Authors: P. Lesci, Y. Fujinuma, M. Hardalov, C. Shang
π Description: Demonstrates the efficiency of T5v1.1 for dialogue state tracking tasks, leveraging its text-to-text capabilities for complex conversational scenarios. -
T5v1.1 in Scientific Document Summarization
ποΈ Authors: R. Uppaal, Y. Li, J. Hu
π Description: Applies T5v1.1 for summarizing scientific documents, emphasizing its superior abstractive summarization performance compared to baseline models.
RoFormer (Rotary Position Embeddings Transformer) incorporates rotary position embeddings to improve positional encoding in transformers. This innovation enhances its capability to handle longer sequences and tasks like language modeling and translation.
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
ποΈ Authors: J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, Y. Liu
π Description: Introduces RoFormer, a transformer model with rotary position embeddings designed to efficiently handle positional information. It improves performance across tasks requiring long-range dependencies. -
RoFormer for Position-Aware Multiple Instance Learning in Whole Slide Image Classification
ποΈ Authors: E. Pochet, R. Maroun, R. Trullo
π Description: Adapts RoFormer for position-aware multiple instance learning in medical image classification, emphasizing its flexibility for multimodal tasks. -
RoGraphER: Enhanced Extraction of Chinese Medical Entity Relationships Using RoFormer
ποΈ Authors: Q. Zhang, Y. Sun, P. Lv, L. Lu, M. Zhang, J. Wang, C. Wan
π Description: Leverages RoFormer for extracting medical entity relationships, showcasing its application in healthcare NLP tasks. -
Chinese Event Extraction Method Based on RoFormer
ποΈ Authors: B. Qiang, X. Zhou, Y. Wang, X. Yang
π Description: Presents a Chinese event extraction framework using RoFormer with FGM and CRF for enhanced performance. -
Entity Linking Based on RoFormer-Sim for Chinese Short Texts
ποΈ Authors: W. Xie
π Description: Proposes an entity linking model based on RoFormer-Sim for improving accuracy in Chinese short-text processing.
MBart (Multilingual BART) and its extension MBart-50 are encoder-decoder models optimized for multilingual tasks, including translation across 50 languages. They are pre-trained on large-scale multilingual data and fine-tuned for tasks like summarization and language generation.
-
mBART: Multilingual Denoising Pretraining for Neural Machine Translation
ποΈ Authors: Tang, Yuqing, Angela Fan, Mikel Artetxe, Seta Celikyilmaz, Yulia Tsvetkov, Luke Zettlemoyer, Veselin Stoyanov
π Description: This foundational paper introduces mBART, a multilingual sequence-to-sequence model pre-trained with denoising objectives. It demonstrates strong performance on machine translation and cross-lingual tasks. -
mBART-50: Multilingual Translation with a Fine-Tuned mBART Model
ποΈ Authors: Tang, Yuqing, Chau Tran, Xian Li, Angela Fan, Dmytro Okhonko, Edouard Grave
π Description: Presents mBART-50, an extension of mBART pre-trained on 50 languages. It achieves state-of-the-art performance in zero-shot translation tasks. -
Fine-Tuning mBART for Low-Resource Machine Translation
ποΈ Authors: R. Dabre, A. Chakrabarty
π Description: Discusses fine-tuning techniques for mBART on Indic languages, showing significant improvements in low-resource translation scenarios. -
ZmBART: An Unsupervised Cross-Lingual Transfer Framework for Language Generation
ποΈ Authors: K. K. Maurya, M. S. Desarkar, Y. Kano
π Description: Proposes ZmBART, a variant of mBART adapted for unsupervised cross-lingual generation, highlighting its potential for broader NLP applications. -
Fine-Tuning mBART-50 for Domain-Specific Neural Machine Translation
ποΈ Authors: B. Namdarzadeh, S. Mohseni, L. Zhu
π Description: Explores the application of mBART-50 for domain-specific translations, such as legal and medical text, showcasing its adaptability. -
DMSeqNet-mBART: Enhancing mBART for Chinese Short News Text Summarization
ποΈ Authors: K. Cao, Y. Hao, W. Cheng
π Description: Presents DMSeqNet-mBART, a specialized adaptation of mBART for summarizing Chinese short news, enhancing performance on specific linguistic challenges. -
Cross-Lingual Reverse Dictionary Using Multilingual mBART
ποΈ Authors: A. Mangal, S. S. Rathore, K. V. Arya
π Description: Demonstrates the use of mBART for cross-lingual reverse dictionary tasks, highlighting its effectiveness in multilingual semantic understanding.
Datasets play a crucial role in training and evaluating NLP models. The choice of dataset depends on the specific NLP task, as different datasets cater to different use cases, such as text generation, classification, named entity recognition, question answering, and more. Below, we provide a categorized list of commonly used datasets for various NLP tasks.
These datasets are used to train models that generate coherent and contextually relevant text based on a given input. Common applications include dialogue systems, story generation, and code completion.
-
Scigen: A Dataset for Reasoning-Aware Text Generation from Scientific Tables
ποΈ Authors: N.S. Moosavi, A. RΓΌcklΓ©, D. Roth
π Description: Introduces SciGen, a dataset designed for text generation tasks requiring reasoning capabilities using scientific tables. It enables the evaluation of reasoning-aware generation models. -
MRED: A Meta-Review Dataset for Structure-Controllable Text Generation
ποΈ Authors: C. Shen, L. Cheng, R. Zhou, L. Bing, Y. You
π Description: Presents MRED, a dataset aimed at enabling controllable text generation, particularly for summarizing and generating structured meta-reviews. -
ToTTo: A Controlled Table-to-Text Generation Dataset
ποΈ Authors: A.P. Parikh, X. Wang, S. Gehrmann, M. Faruqui
π Description: Proposes ToTTo, a dataset designed for controlled table-to-text generation tasks. It emphasizes generating text grounded on structured data. -
SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation
ποΈ Authors: H. Chen, H. Takamura, H. Nakayama
π Description: Introduces SciXGen, a dataset that facilitates the development of models for context-aware scientific paper generation. -
DART: Open-Domain Structured Data Record to Text Generation
ποΈ Authors: L. Nan, D. Radev, R. Zhang, A. Rau, A. Sivaprasad
π Description: Presents DART, a dataset for transforming structured data records into coherent text, applicable in open-domain tasks. -
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
ποΈ Authors: B.Y. Lin, W. Zhou, M. Shen, P. Zhou
π Description: Introduces CommonGen, a dataset for testing constrained generative commonsense reasoning by generating coherent sentences grounded on given concepts. -
Evaluation of Text Generation: A Survey
ποΈ Authors: A. Celikyilmaz, E. Clark, J. Gao
π Description: Surveys various text generation datasets, models, and evaluation methods, offering insights into the current state and challenges of text generation.
Text classification datasets help train models to categorize text into predefined labels. These datasets are used in applications like sentiment analysis, spam detection, and topic classification.
-
NADA: New Arabic Dataset for Text Classification
ποΈ Authors: N. Alalyani, S. L. Marie-Sainte
π Description: Introduces NADA, a structured and standardized dataset for Arabic text classification, addressing gaps in Arabic NLP datasets. -
Incremental Few-Shot Text Classification with Multi-Round New Classes: Formulation, Dataset and System
ποΈ Authors: C. Xia, W. Yin, Y. Feng, P. Yu
π Description: Proposes a new benchmark dataset for incremental few-shot text classification, enabling evaluation of multi-round new class additions. -
Large-Scale Multi-Label Text Classification on EU Legislation
ποΈ Authors: I. Chalkidis, M. Fergadiotis, P. Malakasiotis
π Description: Releases a new dataset of 57k legislative documents from EUR-LEX annotated with βΌ4.3k labels for multi-label classification tasks. -
LSHTC: A Benchmark for Large-Scale Text Classification
ποΈ Authors: I. Partalas, A. Kosmopoulos, N. Baskiotis
π Description: Introduces LSHTC, a benchmark dataset for hierarchical text classification, supporting tasks with hundreds of thousands of classes. -
Benchmarking Zero-Shot Text Classification: Datasets, Evaluation and Entailment Approach
ποΈ Authors: W. Yin, J. Hay, D. Roth
π Description: Presents datasets tailored for zero-shot text classification with a standardized evaluation framework and entailment-based methods.
Named Entity Recognition (NER) datasets are used for extracting named entities such as persons, locations, organizations, and dates from text. These datasets are crucial for tasks like information retrieval and knowledge extraction.
-
Multimodal Named Entity Recognition for Short Social Media Posts
ποΈ Authors: S. Moon, L. Neves, V. Carvalho
π Description: Introduces a dataset for multimodal named entity recognition (MNER) in social media, leveraging both text and visual data for more robust recognition. -
MultiCoNER: A Large-Scale Multilingual Dataset for Complex Named Entity Recognition
ποΈ Authors: S. Malmasi, A. Fang, B. Fetahu
π Description: Presents MultiCoNER, a dataset designed to challenge NER models with fine-grained and complex entity recognition in a multilingual context. -
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
ποΈ Authors: L. Derczynski, E. Nichols, M. Van Erp
π Description: Proposes a dataset for recognizing novel and emerging entities, emphasizing adaptability in dynamic domains like social media. -
Creating a Dataset for Named Entity Recognition in the Archaeology Domain
ποΈ Authors: A. Brandsen, S. Verberne, M. Wansleeben
π Description: Develops a domain-specific NER dataset tailored to archaeological texts, annotated with six custom entity types. -
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
ποΈ Authors: N. Ringland, X. Dai, B. Hachey, S. Karimi, C. Paris
π Description: Introduces NNE, a large-scale dataset for nested entity recognition, pushing models to handle hierarchical structures in newswire data. -
CLUENER2020: Fine-Grained Named Entity Recognition Dataset and Benchmark for Chinese
ποΈ Authors: L. Xu, Q. Dong, Y. Liao, C. Yu
π Description: Presents CLUENER2020, a challenging dataset for fine-grained NER in Chinese, incorporating new entity types and samples. -
Crosslingual Named Entity Recognition for Clinical De-Identification Applied to a COVID-19 Italian Dataset
ποΈ Authors: R. Catelli, F. Gargiulo, V. Casola, G. De Pietro
π Description: Creates a new dataset of Italian COVID-19 clinical records for cross-lingual NER, focusing on de-identification and anonymization.
Question Answering (QA) datasets enable models to generate answers based on a given question and context. These datasets are widely used in search engines, virtual assistants, and automated customer support systems.
-
WikiQA: A Challenge Dataset for Open-Domain Question Answering
ποΈ Authors: Y. Yang, W. Yih, C. Meek
π Description: Introduces WikiQA, a dataset for open-domain question answering, constructed from natural and realistic queries on Wikipedia. -
GQA: A New Dataset for Compositional Question Answering Over Real-World Images
ποΈ Authors: D.A. Hudson, C.D. Manning
π Description: Proposes GQA, a dataset for visual reasoning and compositional question answering, designed to address key shortcomings of visual QA datasets. -
HotpotQA: A Dataset for Diverse, Explainable Multi-Hop Question Answering
ποΈ Authors: Z. Yang, P. Qi, S. Zhang, Y. Bengio, W.W. Cohen
π Description: Introduces HotpotQA, a dataset emphasizing diverse and explainable multi-hop reasoning tasks using Wikipedia as its knowledge base. -
ToolQA: A Dataset for Question Answering with External Tools
ποΈ Authors: Y. Zhuang, Y. Yu, K. Wang, H. Sun
π Description: Proposes ToolQA, a dataset for exploring the integration of external tools with question answering systems. -
QASC: A Dataset for Question Answering via Sentence Composition
ποΈ Authors: T. Khot, P. Clark, M. Guerquin, P. Jansen
π Description: Introduces QASC, a dataset focusing on multi-hop reasoning through sentence composition to answer multiple-choice questions. -
What Do Models Learn from Question Answering Datasets?
ποΈ Authors: P. Sen, A. Saffari
π Description: Explores generalizability across question answering datasets and highlights challenges with impossible questions in dataset design. -
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
ποΈ Authors: A. Rogers, M. Gardner, I. Augenstein
π Description: Analyzes the proliferation of question answering datasets, providing a taxonomy of more than 80 resources in QA and reading comprehension.
Fill Mask datasets are used for training masked language models (MLMs) where a model learns to predict missing words in a given sentence. These datasets help improve contextualized word representations.
-
The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
ποΈ Authors: L. Lin, B. Wang, X. Wang, Z.X. Wang, A. WiΕniowski
π Description: Introduces FMAT, a dataset designed to measure semantic probabilities in natural language using fill-mask tasks for evaluating language models. -
Performance Implications of Using Unrepresentative Corpora in Arabic NLP
ποΈ Authors: S. Alshahrani, N. Alshahrani, S. Dey
π Description: Creates a dataset for evaluating fill-mask tasks in Arabic, addressing the challenges posed by unrepresentative corpora in language modeling. -
Automated Distractor Generation for Fill-in-the-Blank Items Using a Prompt-Based Learning Approach
ποΈ Authors: J. Zu, I. Choi, J. Hao
π Description: Proposes a new dataset for fill-in-the-blank tasks, leveraging prompt-based learning to generate distractors automatically. -
DarkBERT: A Language Model for the Dark Side of the Internet
ποΈ Authors: Y. Jin, E. Jang, J. Cui, J.W. Chung, Y. Lee
π Description: Presents a dataset tailored for cybersecurity tasks, with evaluations on fill-mask and synonym inference capabilities. -
We Understand Elliptical Sentences, and Language Models Should Too
ποΈ Authors: D. Testa, E. Chersoni, A. Lenci
π Description: Creates a dataset for studying ellipsis and its interaction with thematic fit, focusing on fill-mask tasks to predict missing verbs. -
Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
ποΈ Authors: H.T. Kesgin, M.F. Amasyali
π Description: Proposes a dataset and methodology for iterative mask filling, designed to augment text effectively through masked language modeling. -
Efficient and Thorough Anonymizing of Dutch Electronic Health Records
ποΈ Authors: S. Verkijk, P. Vossen
π Description: Develops a dataset for anonymizing Dutch electronic health records using fill-mask tasks as part of the de-identification process.
Machine translation datasets provide parallel corpora for training models to translate text between different languages. These datasets are fundamental in developing multilingual NLP systems.
-
The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language
ποΈ Authors: L. Lin, B. Wang, X. Wang, Z.X. Wang, A. WiΕniowski
π Description: Introduces FMAT, a dataset designed to measure semantic probabilities in natural language using fill-mask tasks for evaluating language models. -
Performance Implications of Using Unrepresentative Corpora in Arabic NLP
ποΈ Authors: S. Alshahrani, N. Alshahrani, S. Dey
π Description: Creates a dataset for evaluating fill-mask tasks in Arabic, addressing the challenges posed by unrepresentative corpora in language modeling. -
Automated Distractor Generation for Fill-in-the-Blank Items Using a Prompt-Based Learning Approach
ποΈ Authors: J. Zu, I. Choi, J. Hao
π Description: Proposes a new dataset for fill-in-the-blank tasks, leveraging prompt-based learning to generate distractors automatically. -
DarkBERT: A Language Model for the Dark Side of the Internet
ποΈ Authors: Y. Jin, E. Jang, J. Cui, J.W. Chung, Y. Lee
π Description: Presents a dataset tailored for cybersecurity tasks, with evaluations on fill-mask and synonym inference capabilities. -
We Understand Elliptical Sentences, and Language Models Should Too
ποΈ Authors: D. Testa, E. Chersoni, A. Lenci
π Description: Creates a dataset for studying ellipsis and its interaction with thematic fit, focusing on fill-mask tasks to predict missing verbs. -
Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language Modeling
ποΈ Authors: H.T. Kesgin, M.F. Amasyali
π Description: Proposes a dataset and methodology for iterative mask filling, designed to augment text effectively through masked language modeling. -
Efficient and Thorough Anonymizing of Dutch Electronic Health Records
ποΈ Authors: S. Verkijk, P. Vossen
π Description: Develops a dataset for anonymizing Dutch electronic health records using fill-mask tasks as part of the de-identification process.
Vietnamese NLP presents unique challenges due to the language's lack of word boundaries, tonal nature, and rich morphology. This section provides a collection of papers, tools, and datasets specifically tailored for Vietnamese NLP research and applications.
Vietnamese text preprocessing involves tasks such as tokenization, stopword removal, and diacritic normalization. Due to the lack of explicit word boundaries, word segmentation is a critical preprocessing step in Vietnamese NLP.
-
Vietnamese Text Classification with Textrank and Jaccard Similarity Coefficient
ποΈ Authors: HT Huynh, N Duong-Trung, DQ Truong
π Description: Proposes a preprocessing pipeline for Vietnamese text classification using Textrank for keyword extraction and Jaccard similarity for feature selection. -
Vietnamese Short Text Classification via Distributed Computation
ποΈ Authors: HX Huynh, LX Dang, N Duong-Trung
π Description: Explores preprocessing techniques for Vietnamese short text classification, focusing on distributed computation approaches. -
DaNangNLP Toolkit for Vietnamese Text Preprocessing and Word Segmentation
ποΈ Authors: KD Nguyen, TT Nguyen, DB Nguyen
π Description: Develops a comprehensive toolkit for Vietnamese text preprocessing, including tokenization, word segmentation, and normalization. -
Feature Extraction Using Neural Networks for Vietnamese Text Classification
ποΈ Authors: HH Kha
π Description: Proposes feature extraction techniques for Vietnamese text preprocessing using neural networks to enhance classification accuracy. -
ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
ποΈ Authors: QN Nguyen, TC Phan, DV Nguyen
π Description: Introduces ViSoBERT, a pre-trained model tailored for Vietnamese social media text, focusing on robust preprocessing pipelines. -
SVSD: A Comprehensive Framework for Vietnamese Sentiment Analysis
ποΈ Authors: LT Nhi, DHA Vu, VDT Phong
π Description: Presents preprocessing steps and sentiment analysis methods for Vietnamese text to ensure data uniformity and effective modeling. -
An Empirical Study on POS Tagging for Vietnamese Social Media Text
ποΈ Authors: NX Bach, ND Linh, TM Phuong
π Description: Focuses on part-of-speech tagging as a preprocessing task for Vietnamese social media text, creating a dataset for this task.
Word embeddings and contextualized word representations trained specifically for Vietnamese text improve NLP performance. This includes models like Word2Vec, FastText, and transformer-based embeddings such as PhoBERT.
-
Construction of a VerbNet Style Lexicon for Vietnamese
ποΈ Authors: H.M. Linh, N.T.M. Huyen
π Description: Develops a lexicon for Vietnamese verbs using word2vec representations on a large corpus, enabling applications in parsing and semantic tasks. -
Comparing Different Criteria for Vietnamese Word Segmentation
ποΈ Authors: Q. Nguyen, N.L.T. Nguyen, Y. Miyao
π Description: Explores criteria for Vietnamese word segmentation and its impact on the quality of word representations in downstream tasks. -
Improving Vietnamese Dependency Parsing Using Distributed Word Representations
ποΈ Authors: C. Vu-Manh, A.T. Luong, P. Le-Hong
π Description: Investigates how distributed word embeddings improve dependency parsing for Vietnamese, achieving significant accuracy improvements. -
A Study of Word Representation in Vietnamese Sentiment Analysis
ποΈ Authors: H.Q. Nguyen, L. Vu, Q.U. Nguyen
π Description: Evaluates various word representation methods for sentiment analysis, focusing on Vietnamese corpora and sentiment tasks. -
Leveraging Semantic Representations Combined with Contextual Word Representations for Vietnamese Textual Entailment
ποΈ Authors: Q.L. Duong, D.V. Nguyen
π Description: Combines semantic and contextual representations to improve performance on Vietnamese textual entailment tasks. -
Vietnamese Document Representation and Classification
ποΈ Authors: G.S. Nguyen, X. Gao, P. Andreae
π Description: Proposes document-level representation techniques for Vietnamese, including bag-of-words and semantic embeddings. -
Deep Neural Networks Algorithm for Vietnamese Word Segmentation
ποΈ Authors: K. Zheng, W. Zheng
π Description: Presents a deep neural network-based approach for Vietnamese word segmentation, leveraging contextualized embeddings for superior accuracy.
Named Entity Recognition (NER) identifies entities such as names, organizations, and locations within Vietnamese text. Challenges include handling ambiguous entity boundaries and diacritic variations.
-
Named Entity Recognition in Vietnamese Documents
ποΈ Authors: QT Tran, TXT Pham, QH Ngo, D Dinh
π Description: Explores techniques for recognizing named entities in Vietnamese documents with a focus on extracting relations and tracking entities across texts. -
A Feature-Rich Vietnamese Named Entity Recognition Model
ποΈ Authors: PQ Nhat Minh
π Description: Presents a feature-rich NER model for Vietnamese that achieves state-of-the-art accuracy by combining multiple NLP toolkits and advanced chunking methods. -
On the Vietnamese Name Entity Recognition: A Deep Learning Method Approach
ποΈ Authors: NC LΓͺ, NY Nguyen, AD Trinh
π Description: Investigates the application of deep learning methods to Vietnamese NER, demonstrating state-of-the-art performance using contextual embeddings. -
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
ποΈ Authors: TH Pham, P Le-Hong
π Description: Highlights the role of syntactic features in improving Vietnamese NER, utilizing an automatic feature extraction approach. -
Vietnamese Named Entity Recognition on Medical Topics
ποΈ Authors: DP Van, DN Tien, TTT Minh, TD Minh
π Description: Proposes a new NER dataset for Vietnamese medical texts, including newly defined entity types and extensive annotations. -
COVID-19 Named Entity Recognition for Vietnamese
ποΈ Authors: TH Truong, MH Dao, DQ Nguyen
π Description: Develops a COVID-19 domain-specific dataset for Vietnamese NER, incorporating novel entity types and robust annotations. -
ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese
ποΈ Authors: P Van Duong, TD Trinh, MT Nguyen
π Description: Introduces ViMedNER, a dataset focused on medical entity recognition, specifically tailored for Vietnamese texts.
Part-of-Speech (POS) tagging in Vietnamese requires models to correctly classify words into grammatical categories despite the languageβs complex morphology and word segmentation issues.
-
A Semi-Supervised Learning Method for Vietnamese Part-of-Speech Tagging
ποΈ Authors: BN Xuan, CN Viet, MPQ Nhat
π Description: Proposes a semi-supervised learning approach for Vietnamese POS tagging, combining perceptron-based and tagging-style models. -
Comparative Study of Vietnamese Part-of-Speech Tagging Tools
ποΈ Authors: LD Quach, D Do Thanh, DC Tran
π Description: Presents a comparative analysis of existing Vietnamese POS tagging tools and evaluates their accuracy and efficiency. -
An Empirical Study of Maximum Entropy Approach for Part-of-Speech Tagging of Vietnamese Texts
ποΈ Authors: P Le-Hong, A Roussanaly, TMH Nguyen
π Description: Explores the application of the maximum entropy model for Vietnamese POS tagging, leveraging a wide range of linguistic features. -
PhoNLP: A Joint Multi-Task Learning Model for Vietnamese POS Tagging, NER, and Dependency Parsing
ποΈ Authors: LT Nguyen, DQ Nguyen
π Description: Introduces PhoNLP, a joint model for POS tagging, named entity recognition, and dependency parsing, demonstrating state-of-the-art performance. -
An Empirical Study on POS Tagging for Vietnamese Social Media Text
ποΈ Authors: NX Bach, ND Linh, TM Phuong
π Description: Focuses on adapting POS tagging to handle the unique challenges of Vietnamese social media text. -
A Hybrid Approach to Vietnamese Word Segmentation Using POS Tags
ποΈ Authors: GB Tran, SB Pham
π Description: Develops a hybrid approach integrating POS tagging to improve Vietnamese word segmentation techniques. -
Dual Decomposition for Vietnamese Part-of-Speech Tagging
ποΈ Authors: NX Bach, K Hiraishi, N Le Minh, A Shimazu
π Description: Proposes a dual decomposition method for Vietnamese POS tagging, addressing limitations in existing models.
Vietnamese dependency parsing and constituency parsing help analyze sentence structures, enabling downstream applications like machine translation and question answering.
-
Prosodic Phrasing Modeling for Vietnamese TTS Using Syntactic Information
ποΈ Authors: NTT Trang, A Rilliard, T Do Dat
π Description: Explores the interface between syntax and prosody in Vietnamese text-to-speech (TTS) systems, leveraging syntactic information to improve phrasing. -
Semantic Parsing for Vietnamese: A Cross-Lingual Approach
ποΈ Authors: T Pham
π Description: Presents a cross-lingual approach to semantic parsing for Vietnamese, focusing on syntactic and semantic challenges. -
Vietnamese Parsing Applying the PCFG Model
ποΈ Authors: HA Viet, DTP Thu, HQ Thang
π Description: Investigates the use of probabilistic context-free grammar (PCFG) for Vietnamese syntax parsing, enhancing parsing accuracy. -
Building a Treebank for Vietnamese Syntactic Parsing
ποΈ Authors: NT Quy
π Description: Develops a Vietnamese treebank and evaluates different parsing methods, identifying sources of parsing errors. -
Semantic Parsing of Simple Sentences in Unification-Based Vietnamese Grammar
ποΈ Authors: DT Nguyen, KD Nguyen, HT Le
π Description: Explores unification-based grammar for semantic parsing of simple Vietnamese sentences, emphasizing taxonomy and grammar development. -
An Experimental Study on Constituency Parsing for Vietnamese
ποΈ Authors: L Nguyen-Thi, P Le-Hong
π Description: Analyzes constituency parsing for Vietnamese using syntax-annotated corpora, presenting empirical results and model performance. -
Using Syntax and Shallow Semantic Analysis for Vietnamese Question Generation
ποΈ Authors: P Tran, DK Nguyen, T Tran, B Vo
π Description: Applies syntax and shallow semantic analysis to Vietnamese question generation, addressing limitations in existing models.
Machine translation between Vietnamese and other languages (e.g., English, French, Chinese) is an active research area. Transformer-based models like MarianMT and multilingual BERT-based models improve translation quality.
-
ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pairs
ποΈ Authors: HV Tran, MQ Nguyen, VV Nguyen
π Description: This study evaluates bidirectional machine translation between Vietnamese-Chinese and Vietnamese-Lao, focusing on fluency and accuracy." -
Are LLMs Good for Low-resource Vietnamese and Other Translations?
ποΈ Authors: VV Nguyen, H Nguyen-Tien, P Nguyen-Ngoc
π Description: Investigates the performance of large language models (LLMs) in low-resource translation tasks, including Vietnamese." -
Handling Imbalanced Resources and Loanwords in Vietnamese-Bahnaric Neural Machine Translation
ποΈ Authors: LNH Bui, HTP Nguyen, MK Le
π Description: Focuses on neural machine translation for the Vietnamese-Bahnaric language pair, tackling issues of imbalanced data and loanwords." -
Constructing a Chinese-Vietnamese Bilingual Corpus from Subtitle Websites
ποΈ Authors: PN Nguyen, P Tran
π Description: Explores using subtitle data to build a high-quality Vietnamese-Chinese parallel corpus." -
Exploring Low-Resource Machine Translation: Case Study of Lao-Vietnamese Translation
ποΈ Authors: QD Tran
π Description: Develops a machine translation system for the low-resource Vietnamese-Lao language pair." -
Neural Network Translations for Building SentiWordNets
ποΈ Authors: KN Lam, TP Le, KC Ngu, KT Le, PM Le
π Description: Uses machine translation to create a Vietnamese version of the SentiWordNet lexical resource." -
Evaluating the Feasibility of Machine Translation for Patient Education in Vietnamese
ποΈ Authors: M Ugas, MA Calamia, J Tan, B Umakanthan
π Description: Assesses Google Translate for translating patient education materials into Vietnamese." -
Improving Chinese-Vietnamese Neural Machine Translation with Irrelevant Word Detection
ποΈ Authors: T Wang, Z Yu, W Yu, W Sun
π Description: Introduces a method to filter irrelevant words to improve Vietnamese-Chinese machine translation."
Question Answering (QA) systems in Vietnamese involve answering questions based on structured or unstructured text. QA models require high-quality annotated datasets for accurate responses.
-
Building a Website to Sell Electronic Devices Store Integrated with Chatbot AI and VNPay Payment Gateway
ποΈ Authors: TT Nguyen, VN Nguyen
π Description: This study explores the integration of AI chatbots in e-commerce, specifically within Vietnamese electronic stores using VNPay." -
Top 2 at ALQAC 2024: Large Language Models (LLMs) for Legal Question Answering
ποΈ Authors: HQ Pham, Q Van Nguyen, DQ Tran
π Description: Analyzes the use of large language models (LLMs) for legal question answering in Vietnamese law." -
Critical Discourse Analysis of Judicial Conversations in Vietnam: A Case Study
ποΈ Authors: PT Ly
π Description: Examines the structure and discourse of judicial question-answer interactions in Vietnamese courts." -
Vietnamese Young People and the Reactive Public Sphere
ποΈ Authors: VT Le, TM Ly-Le, L Ha
π Description: Investigates how young Vietnamese individuals engage in public discourse and answer political questions in online spaces." -
[Four Important Characteristics of Women in Confucianism and Its Contribution to the Implementation o](Gender Equality in Vietnam",https://ejournals.epublishing.ekt.gr/index.php/Conatus/article/view/35243)
ποΈ Authors: D Van Vo
π Description: Discusses how Confucianism has shaped gender roles and question-answer dynamics in Vietnamese society." -
Man in a Hurry: Murray MacLehose and Colonial Autonomy in Hong Kong
ποΈ Authors: P Roberts
π Description: Explores how Vietnamese refugees' legal and political questions were addressed in colonial Hong Kong." -
Integrating Theatrical Arts into Storytelling Instruction in Primary Education
ποΈ Authors: QV Tran, YN Tran
π Description: Examines how question-answer techniques in storytelling can be improved with theatrical methods in Vietnamese schools." -
Buddhism: A Journey through History
ποΈ Authors: DS Lopez
π Description: Explores how Buddhism has historically answered philosophical and religious questions in Vietnam."
Text summarization generates concise and informative summaries from long Vietnamese documents. Extractive and abstractive summarization techniques are commonly used for this task.
-
Vietnamese Online Newspapers Summarization Using Pre-Trained Model
ποΈ Authors: T Le Ngoc
π Description: Presents a model for summarizing Vietnamese online newspapers using pre-trained deep learning techniques." -
Graph-based and Generative Approaches to Multi-Document Summarization
ποΈ Authors: TD Thanh, TM Nguyen, TB Nguyen, HT Nguyen
π Description: Introduces a hybrid approach combining graph-based and generative methods for Vietnamese multi-document summarization." -
THASUM: Transformer for High-Performance Abstractive Summarizing Vietnamese Large-scale Dataset
ποΈ Authors: TH Nguyen, TN Do
π Description: Develops a transformer-based abstractive summarization model trained on a large-scale Vietnamese dataset." -
Pre-Training Clustering Models to Summarize Vietnamese Texts
ποΈ Authors: TH Nguyen, TN Do
π Description: Proposes a clustering-based pre-training approach for single-document extractive summarization in Vietnamese." -
Vietnamese Online Newspapers Summarization Using LexRank
ποΈ Authors: LEN THANG, LEQ MINH
π Description: Applies the LexRank algorithm for Vietnamese news summarization using graph-based sentence ranking." -
Feature-Based Unsupervised Method for Salient Sentence Ranking in Text Summarization Task
ποΈ Authors: MP Nguyen, TA Le
π Description: Develops an unsupervised sentence ranking model for Vietnamese text summarization." -
Paraphrasing with Large Language Models
ποΈ Authors: CT Nguyen, DHP Pham, CT Dang, TH Le
π Description: Explores the use of large language models for Vietnamese text paraphrasing and summarization." -
Resource-Efficient Vietnamese Text Summarization
ποΈ Authors: HD Nguyen Pham, DT Nguyen
π Description: Enhances the efficiency of Vietnamese text summarization using data filtering and low-memory deep learning techniques."
A collection of open-source tools, frameworks, and datasets for Vietnamese NLP, including word segmentation tools, language models, and benchmark datasets.
-
Automatically Generating a Dataset for Natural Language Inference Systems from a Knowledge Graph
ποΈ Authors: DV Vo, P Do
π Description: Presents a dataset for Vietnamese Natural Language Inference (NLI) using a knowledge graph, contributing to NLP research and model evaluation." -
Neural Network Translations for Building SentiWordNets
ποΈ Authors: KN Lam, TP Le, KC Ngu, KT Le, PM Le
π Description: Explores neural network-based translation for creating Vietnamese SentiWordNet, enhancing sentiment analysis resources." -
Updated Activities on Resources Development for Vietnamese Speech and NLP
ποΈ Authors: LC Mai
π Description: Reviews recent developments in Vietnamese NLP and speech resources, including government initiatives and industry collaborations."
Discusses the key challenges in Vietnamese NLP, such as handling tonal variations, segmentation difficulties, data scarcity, and the need for high-quality annotated datasets.
-
Evaluating the Effectiveness of Commonly Used Sentiment Analysis Models for the Second Indochina War
ποΈ Authors: A Chakraborty
π Description: Examines challenges in applying sentiment analysis models to Vietnamese historical texts, highlighting limitations in existing NLP approaches." -
Machine Learning Approach for Suicide and Depression Identification with Corrected Unsupervised Labels
ποΈ Authors: M Badki
π Description: Discusses the challenges of identifying mental health-related text in Vietnamese using machine learning models with unsupervised labels." -
Building A Job Portal Website Integrating AI Technology
ποΈ Authors: PT Nguyen, THH Nguyen
π Description: Explores NLP challenges in building AI-powered job search platforms for Vietnamese users." -
ViSoLex: An Open-Source Repository for Vietnamese Social Media Lexical Normalization
ποΈ Authors: ATH Nguyen, DH Nguyen, K Van Nguyen
π Description: Addresses lexical normalization issues in Vietnamese social media text processing." -
A Weakly Supervised Data Labeling Framework for Machine Lexical Normalization in Vietnamese Social Media
ποΈ Authors: DH Nguyen, ATH Nguyen, K Van Nguyen
π Description: Proposes a weakly supervised labeling approach to tackle low-resource challenges in Vietnamese NLP." -
VNLegalEase: A Vietnamese Legal Query Chatbot
ποΈ Authors: PTX Hien, NTT Vy, HD Ngo
π Description: Discusses NLP difficulties in legal document understanding and chatbot development for Vietnamese law." -
Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges
ποΈ Authors: N Dinh, T Dang, LT Nguyen
π Description: Investigates dialectal variation in Vietnamese and its impact on NLP tasks and model performance." -
Contextual Emotional Transformer-Based Model for Comment Analysis in Mental Health Case Prediction
ποΈ Authors: AOJ Ibitoye, OO Oladosu
π Description: Explores the challenges of contextual emotion detection in Vietnamese NLP for mental health prediction."