This repository contains:
- A regularly updated paper list for Diffusion Large Language Models.
- A tutorial for Diffusion Large Language Models.
- A nano code snippet for Diffusion Large Language Models.
We have released a simple implementation in code folder. It contains two types of diffusion language models, presented as Jupyter notebooks:
continuous_diff.ipynb: This notebook demonstrates a Continuous Diffusion Language Model, illustrating how diffusion modeling and sampling operate in a continuous space, such as word vector space. It could be a start point for grasping the fundamental principles of continuous diffusion language models.masked_diff.ipynb: This notebook implements a Mask-based Discrete Diffusion Language Model, which works in the discrete token space and supports text generation tasks. We draw inspiration from GUIDELINES.md of LLaDA for parts of our implementation. Thanks for their valuable contributions!
- Gemini Diffusion blog
- Mercury: Ultra-Fast Language Models Based on Diffusion tech report
- Dream7B blog
- LaViDa: A Large Diffusion Language Model for Multimodal Understanding
- MMaDA: Multimodal Large Diffusion Language Models
- LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
- Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding (arxiv)
- Unified Multimodal Discrete Diffusion (arxiv)
[ CODE ]
- DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models (arxiv)
[ CODE ]
- Esoteric Language Models (arxiv)
[ CODE ]
- Accelerating Diffusion LLMs via Adaptive Parallel Decoding (arxiv)
- DINGO: Constrained Inference for Diffusion LLMs (arxiv)
- Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding (arxiv)
[ CODE ]
- LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models (arxiv)
- Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion (arxiv)
- Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective (arxiv)
- dKV-Cache: The Cache for Diffusion Language Models (arxiv)
- CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation (arxiv)
- Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions (arxiv)
- d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning (arxiv)
- Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion (arxiv)
- Remasking Discrete Diffusion Models with Inference-Time Scaling (arxiv)
- Large Language Diffusion Models (arxiv)
code
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models (ICLR 2025 Oral) - [[💻]] Code
- Beyond Autoregression: Fast LLMs via Self-Distillation Through Time (ICLR 2025)
- Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data (ICLR 2025)
- Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling (ICLR 2025)
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models (ICLR 2025)
- Scaling up Masked Diffusion Models on Text (ICLR 2025)
- Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion (NAACL 2025)
- Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (ICML 2024 best paper)
- Discrete Flow Matching (NeurIPS 2024)
- Simple and Effective Masked Diffusion Language Models (NeurIPS 2024)
- Simplified and Generalized Masked Diffusion for Discrete Data (NeurIPS 2024)
- A Reparameterized Discrete Diffusion Model for Text Generation (COLM 2024)
- Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation (EACL 2024)
- Diffusion Glancing Transformer for Parallel Sequence-to-Sequence Learning (NAACL 2024)
- Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning (arxiv 2024)
- DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models (ACL 2023)
- DiffusER: Discrete Diffusion via Edit-based Reconstruction (ICLR 2023)
- Likelihood-Based Diffusion Language Models (NeurIPS 2023)
- A Cheaper and Better Diffusion Language Model with Soft-Masked Noise (EMNLP 2023)
- A Continuous Time Framework for Discrete Denoising Models (NeurIPS 2022)
- Structured Denoising Diffusion Models in Discrete State-Spaces (NeurIPS 2021)
- Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions (NeurIPS 2021)
- Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes (ACL2025)
- TESS 2: A Large-Scale Generalist Diffusion Language Model (arxiv)
- Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation (NAACL2024)
- Diffusion Guided Language Modeling (ACL 2024)
- TESS: Text-to-Text Self-Conditioned Simplex Diffusion (EACL2024)
- Transfer Learning for Text Diffusion Models (arxiv 2024)
- Latent Diffusion for Language Generation (NeurIPS 2023)
- AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation (NeurIPS 2023)
- Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise (ICML 2023)
- DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models (EMNLP 2023)
- DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM (EMNLP 2023)
- How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data? (ECAI 2023)
- SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control (ACL 2023)
- Glyphdiffusion: Text generation as image generation (arxiv 2023)
- Dinoiser: Diffused conditional sequence learning by manipulating noises (arxiv 2023)
- Diffusion-LM Improves Controllable Text Generation (NeurIPS 2022)
- Latent Diffusion Energy-Based Model for Interpretable Text Modeling (ICML2022)
- DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models (ICLR 2022)
- Continuous diffusion for categorical data (arxiv 2022)
- Seqdiffuseq: Text diffusion with encoder-decoder transformers (arxiv 2022)
- Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models (arxiv)
- Table-to-Text Generation with Pretrained Diffusion Models (IEEE 2024)
- Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models (NeurIPS 2024)
- DiffusionNER: Boundary Diffusion for Named Entity Recognition (ACL 2023)
- Fine-grained Text Style Transfer with Diffusion-Based Language Models (RepL4NLP 2023)