A comprehensive list of ML and AI acronyms and abbreviations. Feel free to ⭐ it!
Machine learning is rapidly growing, creating more mysterious acronyms and abbreviations that might be challenging to follow, especially for beginners. This abbreviations list was created when I collected all acronyms from my Ph.D. thesis. Surprised by the enormous number, I searched through the web to copy and paste them to save time on writing. I found a few lists, but none covered all I needed. I decided to gather all this info in a single Table to make it easier to fellow ML enthusiasts.
- Contributors knowledge
- A Comprehensive Survey on Machine Learning for Networking Evolution Applications and Research Opportunities
- Deep learning acronym cheatsheet
- Machine learning acronyms list
- Awesome deep learning music
- Hearai.pl/paperslang/
Feel free to:
- add any ML-related abbreviation,
- add the definition alone,
- add an issue.
Currently, ~30% of abbreviations have descriptions, so feel free to add them! It should be a brief and concise one-liner rather than explain the whole subject. The purpose is to quickly find the meaning of an abbreviation, and the given definition helps to understand if it matches the context. Abbreviations should be in alphabetical order.
I have added a link to the online doc with all abbreviations to make it easier for you to contribute. Feel free to add a new one and sort the table automatically. You can copy the table from Google Sheets to the markdown table generator: https://www.tablesgenerator.com/markdown_tables.
| Acronym | Description | Definition |
|---|---|---|
| ACC | ACCuracy | Accuracy is a metric for evaluating classification models. |
| ACE | Alternating conditional expectation (ACE) algorithm | An algorithm to find the optimal transformations between the response variable and predictor variables in regression analysis. |
| ADA | AdaBoosted Decision Trees | Using AdaBoost to improve performance in decision trees. |
| AdaBoost | Adaptive Boosting | A statistical classification meta-algorithm that can be used in conjunction with many other types of learning algorithms to improve performance. |
| AdR | AdaBoostRegressor | Using AdaBoost to improve performance in regression. |
| ADT | Automatic Drum Transcription | Methods that aim to detect drum events in polyphonic music |
| AE | AutoEncoder | A type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning) |
| AGI | Artificial General Intelligence | The hypothetical ability of an intelligent agent to understand or learn any intellectual task that a human being can |
| AI | Artificial Intelligence | The simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. |
| AIWPSO | Adaptive Inertia Weight Particle Swarm Optimization | An optimization algorithm using an individual search ability (ISA) to indicate whether each particle lacks global exploration or local exploitation abilities in each dimension. |
| AM | Activation Maximization | A method to visualize neural networks and aims to maximize the activation of certain neurons. |
| AMT | Automatic Music Transcription | Computational algorithms that convert acoustic music signals into some form of music notation |
| ANN | Artificial Neural Network | A collection of connected computational units or nodes called neurons arranged in multiple computational layers. |
| AR | Augmented Reality | An interactive experience of a real-world environment where the objects that reside in the real world are enhanced by computer-generated perceptual information sometimes across multiple sensory modalities. |
| ARNN | Anticipation Recurrent Neural Network | A type of RNN designed to predict future inputs or states in sequential data. |
| AUC | Area Under the (ROC) Curve | Probability of confidence in a model to accurately predict positive outcomes for actual positive instances |
| BDT | Boosted Decision Tree | An ensemble learning method combining multiple decision trees, typically using boosting algorithms like AdaBoost or Gradient Boosting. |
| BERT | Bidirectional Encoder Representation from Transformers | Commonly used transformer-based language model. |
| BiFPN | Bidirectional Feature Pyramid Network | An efficient multi-scale feature fusion method used in object detection, allowing bidirectional (top-down and bottom-up) information flow. |
| BILSTM | Bidirectional Long Short-Term Memory | A bidirectional recurrent neural network architecture utilizing LSTM units (see LSTM). |
| BLEU | Bilingual Evaluation Understudy | A score of the effectiveness of translating one language into another one. |
| BN | Bayesian Network | A probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). |
| BNN | Bayesian Neural Network | A type of artificial neural network built by introducing random variations into the network either by giving the network's artificial neurons stochastic transfer functions or by giving them stochastic weights |
| BP | BackPropagation | A widely used algorithm for training feedforward neural networks by propagating errors backward through the network. |
| BPMF | Bayesian Probabilistic Matrix Factorization | A probabilistic approach to matrix factorization, often used in recommender systems, incorporating Bayesian inference. |
| BPTT | Backpropagation Through Time | A gradient-based technique for training certain types of recurrent neural networks (e.g., LSTMs) by unrolling the network through time steps. |
| BQML | Big Query Machine Learning | Google Cloud service enabling creation and execution of ML models in BigQuery using standard SQL queries. |
| BRNN | Bidirectional Recurrent Neural Network | An RNN variant that processes sequence data in both forward and backward directions, capturing context from past and future elements. |
| BRR | Bayesian Ridge Regression | A regression technique that incorporates Bayesian methods with Ridge Regression (L2 regularization). |
| CAE | Contractive AutoEncoder | An autoencoder variant that adds a penalty term to the loss function to encourage robustness of the learned representation to small input variations. |
| CALA | Continuous Action-set Learning Automata | A type of reinforcement learning agent operating in environments with continuous (non-discrete) action spaces. |
| CART | Classification And Regression Tree | An algorithm used to build decision trees for both classification and regression tasks by recursively partitioning the data space. |
| CAV | Concept Activation Vectors | Explainability method that provides an interpretation of a neural net's internal state in terms of human-friendly concepts. |
| CBI | Counterfactual Bias Insertion | A technique potentially used in fairness research to test model robustness against specific biases by inserting counterfactual examples. |
| CBOW | Continuous Bag of Words | A neural network model architecture (part of Word2Vec) used for learning word embeddings by predicting a target word from its surrounding context words. |
| CDBN | Convolutional Deep Belief Networks | A type of deep artificial neural network composed of multiple layers of convolutional restricted Boltzmann machines stacked together. |
| CE | Cross-Entropy | A common loss function used in classification tasks, measuring the difference between predicted probability distributions and the true distribution. |
| CEC | Constant Error Carousel | A key component within LSTM units that allows error signals to propagate back through time without vanishing or exploding gradient issues. |
| CF | Collaborative Filtering | Technique used in recommendation systems predicting user preferences based on patterns from similar users or items. |
| CLNN | ConditionaL Neural Networks | Neural networks whose output or internal processing is dependent on an auxiliary conditional input. |
| CMAC | Cerebellar Model Articulation Controller | A type of neural network inspired by the mammalian cerebellum, often used for function approximation and control tasks, using associative memory principles. |
| CMMs | Conditional Markov Model | A graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. Also known as maximum-entropy Markov model (MEMM). |
| CNN | Convolutional Neural Network | A class of artificial neural network (ANN), typically using convolutional layers, most commonly applied to analyze visual imagery. |
| ConvNet | Convolutional Neural Network | A class of artificial neural network (ANN), typically using convolutional layers, most commonly applied to analyze visual imagery. (Synonym for CNN) |
| CRBM | Conditional Restricted Boltzmann Machine | An extension of the Restricted Boltzmann Machine where the visible and/or hidden units are conditioned on additional input variables. |
| CRFs | Conditional Random Fields | A class of statistical modeling methods often used for structured prediction tasks like sequence labeling (e.g., in NLP), modeling conditional probabilities. |
| CRNN | Convolutional Recurrent Neural Network | A hybrid neural network architecture combining Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), typically for spatio-temporal data. |
| CTC | Connectionist Temporal Classification | A loss function used for training sequence models (like RNNs) on tasks where the alignment between input and output sequences is variable or unknown (e.g., speech). |
| CTR | Collaborative Topic Regression | A recommendation model that integrates collaborative filtering with topic modeling (like LDA) to leverage item content information. |
| CV | Coefficient of Variation | Intra-cluster similarity to measure the accuracy of unsupervised classification models based on clusters |
| CV | Computer Vision | A field of AI enabling computers to "see" and interpret information from digital images or videos. |
| CV | Cross Validation | Resampling method for training, validation and testing a model across different iterations on portions of the full data set. |
| CSLR | Continuous Sign Language Recognition | Sign language recognition and understanding (continuous using not only single words but whole phrases) getting knowledge about the meaning of signs essential for SLT. |
| DAAF | Data Augmentation and Auxiliary Feature | A technique possibly involving using auxiliary features alongside data augmentation to improve model training. |
| DAE | Denoising AutoEncoder or Deep AutoEncoder | An autoencoder trained to reconstruct clean input from corrupted versions (Denoising AE), often with multiple hidden layers (Deep AE). |
| DBM | Deep Boltzmann Machine | An undirected probabilistic graphical model (like RBM) with multiple layers of hidden variables, allowing for more complex representations. |
| DBN | Deep Belief Network | A generative graphical model composed of multiple layers of latent variables ("beliefs"), typically trained greedily layer-by-layer using RBMs. |
| DBSCAN | Density-Based Spatial Clustering of Applications with Noise | A density-based clustering algorithm that groups together points closely packed together, marking outliers as noise. |
| DCGAN | Deep Convolutional Generative Adversarial Network | A type of GAN that uses convolutional and convolutional-transpose layers in its discriminator and generator, respectively, primarily for image generation. |
| DCMDN | Deep Convolutional Mixture Density Network | Combines CNNs with Mixture Density Networks to model complex conditional probability distributions, often for image generation or regression tasks with uncertainty. |
| DE | Differential Evolution | A metaheuristic optimization algorithm belonging to the family of evolutionary algorithms, used for finding global optima, particularly in continuous spaces. |
| DeconvNet | DeConvolutional Neural Network | A neural network architecture often utilizing transposed convolutions (sometimes called deconvolutions) for tasks like image segmentation or visualization of CNN features. |
| DeepLIFT | Deep Learning Important FeaTures | An explainability method for deep learning models that attributes prediction differences to input feature differences based on a reference input. |
| DL | Deep Learning | A subfield of machine learning based on artificial neural networks with multiple layers (deep architectures) enabling learning of complex patterns. |
| DNN | Deep Neural Network | An artificial neural network (ANN) with multiple hidden layers between the input and output layers. |
| DQN | Deep Q-Network | A reinforcement learning algorithm that uses a deep neural network to approximate the Q-value (action-value) function. |
| DR | Detection Rate | Represents the sensitivity or detection rate of a model (synonym for True Positive Rate or Recall). |
| DSN | Deep Stacking Network | A deep learning architecture based on stacking blocks of simple modules (like MLPs) trained sequentially, layer by layer. |
| DT | Decision Tree | A supervised learning model using a tree-like structure of decisions and their possible consequences to classify or regress data. |
| DTD | Deep Taylor Decomposition | An explainability technique that decomposes the prediction of a neural network based on Taylor series expansion, related to Layer-wise Relevance Propagation (LRP). |
| DWT | Discrete Wavelet Transform | A mathematical transform used for signal processing and feature extraction, decomposing signals into different frequency components at multiple scales. |
| ELECTRA | Efficiently Learning an Encoder that Classifies Token Replacements Accurately | A transformer-based pre-training method that learns by distinguishing real input tokens from plausible fake tokens generated by another small network (discriminator task). |
| ELM | Extreme Learning Machine | A feedforward neural network training algorithm where hidden node parameters are randomly assigned and only output weights are learned analytically, often very fast. |
| ELMo | Embeddings from Language Models | Contextual word embedding technique generating deep, character-based representations that vary based on the sentence context. |
| ELU | Exponential Linear Unit | An activation function similar to ReLU but with negative values, which can help push mean activations closer to zero, potentially speeding up learning. |
| EM | Expectation maximization | An iterative method for finding maximum likelihood or MAP estimates of parameters in statistical models with latent (unobserved) variables. |
| EMD | Entropy Minimization Discretization | A method for discretizing continuous features by finding split points that minimize the class information entropy within the resulting intervals. |
| ERNIE | Enhanced Representation through kNowledge IntEgration | A transformer-based language model (often associated with Baidu) that incorporates external knowledge (e.g., knowledge graph facts) during pre-training. |
| ETL Pipeline | Extract Transform Load Pipeline | A data integration process involving extracting data from sources, transforming it into a proper format, and loading it into a target system (like a data warehouse). |
| EXT | Extremely Randomized Trees | An ensemble learning method similar to Random Forests, but introduces more randomness in selecting node splits (both attribute and split point). |
| F1 Score | Harmonic Precision-Recall Mean | The harmonic mean of precision and recall, used as a performance metric for classification tasks, especially with imbalanced datasets. |
| FALA | Finite Action-set Learning Automata | A type of reinforcement learning agent operating in environments with a finite number of discrete actions. |
| FC | Fully-Connected | Layers where all the inputs from one layer are connected to every activation unit of the next layer. |
| FC-CNN | Fully Convolutional Convolutional Neural Network | A neural network architecture consisting entirely of convolutional layers (and pooling/upsampling), without any fully-connected layers. |
| FC-LSTM | Fully Connected Long Short-Term Memory | An LSTM network where connections between time steps or layers might involve fully connected transformations, combining sequential and dense processing. |
| FCM | Fuzzy C-Means | A clustering algorithm allowing data points to belong to multiple clusters with varying degrees of membership (fuzziness). |
| FCN | Fully Convolutional Network | A neural network that only performs convolution (and subsampling or upsampling) operations, often used for semantic segmentation. (Similar to FC-CNN) |
| FFT | Fast Fourier transform | An efficient algorithm to compute the Discrete Fourier Transform (DFT) and its inverse, widely used in signal processing and feature engineering. |
| FLOP | Floating Point Operations | A unit of measure of the amount of mathematical computations (like additions, multiplications) often used to describe the complexity of a neural network model. |
| FLOPS | Floating Point Operations Per Second | A unit of measure of computer performance, indicating how many floating-point operations a processor can perform per second. |
| FNN | Feedforward Neural Network | An artificial neural network where connections between nodes do not form a cycle; information moves only forward from input to output layers. |
| FNR | False Negative Rate | Proportion of actual positives predicted as negatives (1 - Recall/TPR). |
| FPN | Feature Pyramid Network | A neural network component, common in object detection, that builds multi-scale feature representations with rich semantics at all levels via lateral connections. |
| FPR | False Positive Rate | Proportion of actual negatives predicted as positives. |
| FST | Finite state transducer | A finite automaton with two tapes (input and output), used for modeling sequence-to-sequence transformations (e.g., in NLP/speech). |
| FWIoU | Frequency Weighted Intersection over Union | Metric in segmentation/object detection tasks. Weighted average of IoU's over classes, where weights depend on class frequency. |
| GA | Genetic Algorithm | A metaheuristic optimization algorithm inspired by natural selection, using concepts like mutation, crossover, and selection to evolve solutions. |
| GALE | Global Aggregations of Local Explanations | An explainability technique that aims to derive global insights about a model's behavior by aggregating multiple local explanations (e.g., SHAP, LIME) from individual predictions. |
| GAM | Generalized Additive Model | A regression model where the output variable depends linearly on unknown smooth functions of predictor variables, allowing for non-linear relationships. |
| GAM | Global Attribution Mapping | An explainability method, often used with CNNs, to identify which input regions (e.g., pixels in an image) contribute most significantly to a specific output class. |
| GAMLSS | Generalized Additive Models for Location, Scale and Shape | An extension of GAMs allowing not just the mean (location) but also other distribution parameters (like scale/variance and shape/skewness) to be modeled with additive predictors. |
| GAN | Generative Adversarial Network | A deep-learning-based generative model using "indirect" training through the discriminator another neural network that is able to tell how much an input is "realistic" which itself is also being updated dynamically. |
| GAP | Global Average Pooling | A pooling operation often used in CNNs before the final classification layer, reducing each feature map to a single value by averaging, which helps reduce overfitting and enforces correspondence between feature maps and categories. |
| GBRCN | Gradient-Boosting Random Convolutional Network | A model likely combining gradient boosting techniques with randomly initialized convolutional features, possibly for time-series or image analysis. |
| GD | Gradient Descent | An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. |
| GEBI | Global Explanation for Bias Identification | Explainability method that aggregates local explanations (of single prediction) into a global explanation with the goal of finding biases and systematic errors in decision making. |
| GFNN | Gradient Frequency Neural Networks | Neural networks possibly designed to better learn or represent high-frequency components in data, potentially by manipulating gradients during training. |
| GLCM | Gray Level Co-occurrence Matrix | A statistical method for examining texture that considers the spatial relationship of pixels, used for feature extraction in image analysis. |
| Gloss2Text | A task of transforming raw glosses into meaningful sentences. | In sign language processing, the task of converting a sequence of sign glosses (word-level representations) into a grammatically correct spoken language sentence. |
| GloVE | Global Vectors | An unsupervised learning algorithm for obtaining vector representations for words, trained on aggregated global word-word co-occurrence statistics from a corpus. |
| GMM | Gaussian mixture model | A probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. |
| GPR | Gaussian Process Regression | A non-parametric, Bayesian approach to regression where the model learns a distribution over functions, providing uncertainty estimates along with predictions. |
| GPT | Generative Pre-trained Transformer | An autoregressive language model that uses deep learning to produce human-like text. |
| GradCAM | GRADient-weighted Class Activation Mapping | A visualization technique for CNNs that uses the gradients flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the input image for predicting the concept. |
| HamNoSys | Hamburg Sign Language Notation System | An annotation system that describes sign language symbols. |
| HAN | Hierarchical Attention Network | A neural network architecture, typically used for document classification, employing attention mechanisms at both word and sentence levels to capture important information hierarchically. |
| HCA | Hierarchical Clustering Analysis | A method of cluster analysis which seeks to build a hierarchy of clusters, either agglomerative (bottom-up) or divisive (top-down). |
| HDP | Hierarchical Dirichlet process | A non-parametric Bayesian approach for modeling grouped data, often used in topic modeling to allow for an infinite number of topics shared across groups. |
| HHDS | HipHop Dataset | Likely refers to a specific dataset focused on Hip Hop music, used for tasks like music information retrieval (MIR), genre classification, or beat tracking. |
| hLDA | Hierarchical Latent Dirichlet allocation | An extension of LDA that organizes topics into a hierarchy, allowing documents to be associated with paths of topics at different levels of granularity. |
| HMM | Hidden Markov Model | A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states, commonly used for sequential data like speech or NLP. |
| HNN | Hopfield Neural Network | A form of recurrent artificial neural network popularized by John Hopfield, serving as content-addressable ("associative") memory systems with binary threshold nodes. |
| i.i.d | Independent and Identically Distributed | A fundamental assumption in many statistical and machine learning models, stating that random variables in a sequence have the same probability distribution and are mutually independent. |
| ID3 | Iterative Dichotomiser 3 | An early algorithm used to generate a decision tree from a dataset, using information gain to select the best attribute at each step. |
| IDR | Input dependence rate | A metric possibly measuring how much a model's output or internal state depends on its input features, potentially used in explainability or sensitivity analysis. |
| IIR | Input independence rate | A metric likely measuring the degree to which a model's output is independent of its input features, possibly related to robustness or fairness evaluation. |
| INFD | Explanation Infidelity | A metric used in XAI to measure how poorly an explanation (e.g., feature attributions) reflects the actual behavior of the model when inputs are perturbed. |
| IoU | Jaccard index (intersection over union) | Metric in segmentation/object detection tasks. Ratio of areas of intersection and union of two (segmentation) boxes, corresponding to e.g. prediction and label. |
| ISIC | International Skin Imaging Collaboration | An academia-industry partnership focused on creating digital skin imaging standards and datasets for melanoma research, often used in computer vision challenges. |
| k-NN | k-Nearest Neighbor | A non-parametric, instance-based learning algorithm where classification or regression is based on the majority vote or average of the 'k' nearest neighbors in the feature space. |
| KAN | Kolmogorov-Arnold Networks | Ref. https://arxiv.org/abs/2404.19756v1 - A novel neural network architecture inspired by the Kolmogorov-Arnold representation theorem, potentially offering better interpretability and scaling properties compared to MLPs by using learnable activation functions on edges instead of fixed ones on nodes. |
| KDE | Kernel Density Estimation | A non-parametric way to estimate the probability density function of a random variable by placing kernels (usually Gaussian) over each data point. |
| KL | Kullback Leibler (KL) divergence | A measure of how one probability distribution diverges from a second, expected probability distribution; often used as a loss or regularization term (e.g., in VAEs). |
| kNN | k-Nearest Neighbours | A non-parametric supervised learning method used for classification and regression. (Synonym for k-NN) |
| KRR | Kernel Ridge Regression | A combination of Ridge Regression (L2-regularized linear regression) with the kernel trick, allowing it to learn non-linear functions in high-dimensional spaces. |
| LDA | Latent Dirichlet Allocation | A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. |
| LDA | Linear Discriminant Analysis | A dimensionality reduction technique also used for classification, which aims to find a linear combination of features that characterizes or separates two or more classes. |
| LDADE | Latent Dirichlet Allocation Differential Evolution | Likely a hybrid approach combining LDA for topic modeling with Differential Evolution, possibly for optimizing LDA parameters or using topics within the DE process. |
| LightGBM | Light Gradient-Boosting Machine | Gradient boosting framework that uses tree based learning algorithms, originally developed by Microsoft. Known for efficiency and speed. |
| LIME | Local Interpretable Model-agnostic Explanations | An XAI technique that explains individual predictions of any black-box classifier by learning a simpler, interpretable model locally around the prediction. |
| LLM | Large Language Model | A deep learning model trained on vast amounts of text data, capable of understanding and generating human-like text for various NLP tasks. |
| LRP | Layer-wise Relevance Propagation | An XAI technique for deep neural networks that decomposes the output prediction backward through the layers to assign relevance scores to input features. |
| LSA | Latent semantic analysis | A technique in NLP using singular value decomposition (SVD) to analyze relationships between documents and terms, identifying latent semantic structures. |
| LSI | Latent Semantic Indexing | An indexing and retrieval method using LSA (SVD) to identify patterns in term-document relationships, improving information retrieval by handling synonymy and polysemy. (Often used interchangeably with LSA). |
| LSTM | Long Short-Term Memory | A recurrent neural network can process not only single data points (such as images) but also entire sequences of data (such as speech or video). |
| LTR | Learning To Rank | Application of machine learning to construct ranking models for information retrieval systems, ordering items based on relevance. |
| LVQ | Learning Vector Quantization | A prototype-based supervised classification algorithm, related to Self-Organizing Maps (SOM), that uses competitive learning to move prototypes towards or away from training instances based on class labels. |
| MADE | Masked Autoencoder for Distribution Estimation | An autoregressive model based on autoencoders, using carefully constructed masks to ensure that reconstructions respect autoregressive constraints, allowing for tractable density estimation. |
| MAE | Mean Absolute Error | Average of the absolute error between the actual and predicted values. |
| MAF | Masked Autoregressive Flows | A type of normalizing flow model for density estimation that uses masked autoregressive transformations (like MADE) to ensure invertibility and efficient computation. |
| MAP | Maximum A Posteriori (MAP) Estimation | A method for estimating unknown parameters in Bayesian statistics, finding the mode (peak) of the posterior distribution, incorporating prior knowledge. |
| MAPE | Mean Absolute Prediction Error | Percentage of the error between the actual and predicted values (often expressed as a percentage). |
| MARS | Multivariate Adaptive Regression Spline | Non-parametric regression technique, extends linear models. Note that the name is trademarked, open source implementations are often called "EARTH". |
| MART | Multiple Additive Regression Tree | Another name for Gradient Boosted Decision Trees (GBDT), particularly associated with Friedman's original work, emphasizing the additive nature of the tree ensemble. |
| MaxEnt | Maximum Entropy | Entropy a scientific concept as well as a measurable physical property that is most commonly associated with a state of disorderrandomnessor uncertainty. |
| MCLNN | Masked ConditionaL Neural Networks | Conditional neural networks where masking techniques might be applied, possibly to control information flow or enforce specific dependencies based on the condition. |
| MCMC | Markov Chain Monte Carlo | A class of algorithms for sampling from a probability distribution by constructing a Markov chain that has the desired distribution |
| MCS | Model contrast score | |
| MDL | Minimum description length (MDL) principle | |
| MDN | Mixture Density Network | |
| MDP | Markov Decision Process | |
| MDRNN | Multidimensional recurrent neural network | |
| MER | Music Emotion Recognition | |
| MINT | Mutual Information based Transductive Feature Selection | |
| MIoU | Mean Intersection over Union | Metric in segmentation/object detection tasks. Mean of IoU's over classes. |
| ML | Machine Learning | The study of computer algorithms that can improve automatically through experience and by the use of data. |
| MLE | Maximum Likelihood Estimation | |
| MLM | Music Language Models | |
| MLP | Multi-Layer Perceptron | A fully connected class of feedforward artificial neural network |
| MPA | Mean Pixel Accuracy | Metric in segmentation/object detection tasks. Average ratio of correctly classified pixels by class. |
| MRR | Mean Reciprocal Rank | |
| MRS | Music Recommender System | |
| MSDAE | Modified Sparse Denoising Autoencoder | |
| MSE | Mean Squared Error | Average of the squares of the error between the actual and predicted values |
| MSR | Music Style Recognition | |
| NAS | Neural Architecture Search | A technique for automating the design of artificial neural networks. |
| NB | Na ̈ıve Bayes | |
| NBKE | Na ̈ıve Bayes with Kernel Estimation | |
| NER | Named Entity Recognition | |
| NERQ | Named Entity Recognition in Query | |
| NF | Normalizing Flow | |
| NFL | No Free Lunch (NFL) theorem | |
| NLP | Natural Language Processing | |
| NLT | Neural Machine Translation | An approach to translation with the use of a neural network to predict a sequence of words. |
| NMS | Non Maximum Suppression | A technique used in Object Detection for removing redundand overlapping bounding boxes |
| NN | Neural Network | |
| NNMODFF | Neural Network based Multi-Onset Detection Function Fusion | |
| NPE | Neural Physical Engine | |
| NRMSE | Normalized RMSE | Cross-entropy Metric based on the logistic function that measures the error between the actual and predicted values. |
| NST | Neural Style Transfer | A method that uses of deep neural networks for transfering style. |
| NTM | Neural Turing Machine | |
| ODF | Onset Detection Function | |
| OLR | Ordinary Linear Regression | |
| OLS | Ordinary Least Squares | |
| PA | Pixel Accuracy | Metric in segmentation/object detection tasks. Ratio of correctly classified over total number of pixels. |
| PACO | Poisson Additive Co-Clustering | |
| PCA | Principal Component Analysis | The process of computing the principal components and using them to perform a change of basis on the data sometimes using only the first few principal components and ignoring the rest. |
| PEGASUS | Pre-training with Extracted Gap-Sentences for Abstractive Summarization | |
| PLSI | Probabilistic Latent Semantic Indexing | |
| PM | Project Manager | |
| PMF | Probabilistic Matrix Factorization | |
| PMI | Pointwise Mutual Information | |
| PNN | Probabilistic Neural Network | |
| POC | Proof of Concept | |
| POMDP | Partially Observable Markov Decision Process | |
| POS | Part of Speech (POS) Tagging | |
| PPMI | Positive Pointwise Mutual Information | |
| PReLU | Parametric Rectified Linear Unit-Yor Topic Modeling | |
| PU | Positive Unlabaled | Machine learning paradigma to learn from only positive and unlabeled data. |
| PYTM | Pitman | |
| RandNN | Random Neural Network | |
| RANSAC | RANdom SAmple Consensus | |
| RBF | Radial Basis Function | |
| RBFNN | Radial Basis Function Neural Network | |
| RBM | Restricted Boltzmann Machine | |
| ReLU | Rectified Linear Unit | An activation function that allow fast and effective training of deep neural architectures on large and complex datasets. |
| REPTree | Reduced Error Pruning Tree | |
| RF | Random Forest | |
| RGB | Red Green Blue color model | An additive color model used for display of images |
| RICNN | Rotation Invariant Convolutional Neural Network | |
| RIM | Recurrent Interence Machines | |
| RIPPER | Repeated Incremental Pruning to Produce Error Reduction | |
| RL | Reinforcement Learning | |
| RLFM | Regression based latent factors | |
| RLHF | Reinforcement learning from human feedback | |
| RMSE | Root MSE | Squared root of MSE |
| RNN | Recurrent Neural Network | |
| RNNLM | Recurrent Neural Network Language Model (RNNLM) | |
| RoBERTa | Robustly Optimized BERT Pretraining Approach | Commonly used transformer-based language model. |
| ROC | Received Operating Characteristic | Curve that plots TPR versus FPR at different parameter settings |
| ROI | Region Of Interest | |
| RR | Ridge Regression | |
| RTRL | Real-Time Recurrent Learning | |
| SAE | Stacked AE | |
| SARSA | State-Action-Reward-State-Action | |
| SBM | Stochastic block model | |
| SBO | Structured Bayesian optimization | |
| SBSE | Search-based software engineering | |
| SCH | Stochastic convex hull | |
| SDAE | Stacked DAE | |
| seq2seq | Sequence to Sequence Learning | Desribes training approach to convert sequences from one domain (e.g. sentences in English) to sequences in another domain (e.g. the same sentences translated to French). |
| SER | Sentence Error Rate | |
| SGBoost | Stochastic Gradient Boosting | |
| SGD | Stochastic Gradient Descent | |
| SGVB | Stochastic Gradient Variational Bayes | |
| SHAP | SHapley Additive exPlanation | |
| SHLLE | Supervised Hessian Locally Linear Embedding | |
| Sign2(Gloss+Text) | Sign to Gloss and Text | A two-step process requires joint learning of sign language recognition and translation. |
| Sign2Gloss | A one to one translation from the single sign to the single gloss. | |
| Sign2Text | A task of full translation from the sign language into the spoken one | grammar and syntax are included. |
| SLP | Single-Layer Perceptron | |
| SLRT | Sign Language Recognition Transformer | an encoder transformer model trained to predict sign gloss sequences it takes spatial embeddings and learns spatio-temporal representations. |
| SLT | Sign Language Translation | A full translation of signs to a spoken language. |
| SLTT | Sign Language Translation Transformer | an autoregressive transformer decoder model trained on output from SLRT to predict one word at a time to generate the corresponding spoken language sentence. |
| SMBO | Sequential Model-Based Optimization | |
| SOM | Self-Organizing Map | A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data |
| SpRay | Spectral Relevance Analysis | Global explainability method using spectral clustering and local explanations (LRP). |
| SSD | Single-Shot Detector | A type of object detector that consists of a single stage. Some examples are YOLO RetinaNet and EfficientDet. |
| SSL | Self-Supervised Learning | |
| SSVM | Smooth support vector machine | |
| ST | Style Transfer | An algorithm that allows to tranfer properties of one object to another (i.e. transfer painitning style to a photography). |
| STDA | Style Transfer Data Augmentation | A method using style transfer to augment dataset. |
| STL | Selt-Taught Learning | |
| SVD | Singing Voice Detection | |
| SVD | Singular Value Decomposition | |
| SVM | Support Vector Machine | Supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. |
| SVR | Support Vector Regression | Supervised learning models with associated learning algorithms that analyze data for regression analysis. |
| SVS | Singing Voice Separation | |
| t-SNE | t-distributed stochastic neighbor embedding | |
| T5 | Text-To-Text Transfer Transformer | Transformer based language model that uses a text-to-text approach. |
| TD | Temporal Difference | |
| TDA | Targeted Data Augmentation | |
| TGAN | Temporal Generative Adversarial Network | |
| THAID | THeta Automatic Interaction Detection | |
| TINT | Tree-Interpreter | |
| TLFN | Time-Lagged Feedforward Neural Network | |
| TNR | True Negative Rate | Proportion of actual negatives that are correctly predicted |
| TPR | True Positive Rate | Proportion of actual positives that are correctly predicted |
| TRPO | Trust Region Policy Optimization | |
| ULMFiT | Universal Language Model Fine-Tuning | |
| V-Net | Volumetric Convolutional neural network | 3D image segmentation based on a volumetric fully convolutional neural network |
| VAD | Voice Activity Detection | |
| VAE | Variational AutoEncoder | An artificial neural network architecture belonging to the families of probabilistic graphical models and variational Bayesian methods. |
| VGG | Visual Geometry Group | Popular deep convolutional model designed for classification. |
| VPNN | Vector Product Neural Network | |
| VQ-VAE | Vector Quantized Variational Autoencoders | |
| VR | Virtual Reality | |
| WER | Word Error Rate | metric to measure performance used in NLP solutions e.g. in automatic speech recognition (ASR). |
| WFST | Weighted finite-state transducer (WFST) | |
| WMA | Weighted Majority Algorithm | |
| WPE | Weighted Prediction Error | |
| XAI | Explainable Artificial Intelligence | A set of processes and methods to make machine learning algorithms and its results more interpretable. |
| XGBoost | eXtreme Gradient Boosting | |
| YOLO | You Only Look Once | Fast object detection algorithm. |
