If you like our project, please give us a star โญ on GitHub for the latest update.
This repository provides the papers mentioned in the survey "A Survey on Latent Reasoning ".
If you find our survey useful for your research, please consider citing the following paper:
@article {map2025latent ,
title ={ A Survey on Latent Reasoning} ,
author ={ M-A-P} ,
journal ={ arxiv} ,
year ={ 2025}
}
๐ก We also have other generative projects that may interest you โจ.
๐ Scaling Latent Reasoning via Looped Language Models
Rui-Jie Zhu, Zixuan Wang, Kai Hua etc. Page Model
[2025-07-08] We have released the arxiv: A Survey on Latent Reasoning .
[2025-07-04] We have initialed the repository.
๐ Explicit Reasoning vs. Latent Reasoning
We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community.
We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers or update any preprint arXiv papers that have been accepted to some venue.
If you want to add your work or model to this list, please do not hesitate to email ridger@ucsc.edu or pull requests.
Markdown format:
* | ** Paper Name** | Name of Conference or Journal + Year | Release Date | [ Paper] ( link ) - [ Code] ( link ) |
๐ Activation-based Recurrent Methods
๐งฑ Architectural Recurrence
Title
Venue
Date
Links
Universal Transformers
ICLR 2019
Jul 2018
Paper - Code
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
ICLR 2025
Oct 2023
Paper
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures
TMLR 2025
Feb 2024
Paper - Code
Relaxed recursive transformers: Effective parameter sharing with layer-wise Lora
ICLR 2025
Oct 2024
Paper
Byte Latent Transformer: Patches Scale Better Than Tokens
ACL 2025 Outstanding Paper
Dec 2024
Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
ICLR 2025
Feb 2025
Paper - Code
LLM Pretraining with Continuous Concepts
arXiv
Feb 2025
Paper - Code
Pretraining Language Models to Ponder in Continuous Space
arXiv
May 2025
Paper - Code
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
arXiv
Jul 2025
Paper - Code
๐๏ธ Training-induced Recurrence
Title
Venue
Date
Links
Think before you speak: Training Language Models With Pause Tokens
ICLR 2024
Oct 2023
Paper
Guiding Language Model Reasoning with Planning Tokens
COLM 2024
Oct 2023
Paper - Code
Let's Think Dot by Dot: Hidden computation in transformer language models
COLM 2024
Apr 2024
Paper - Code
Disentangling memory and reasoning ability in large language models
ACL 2025 (main)
Nov 2024
Paper - Code
Training Large Language Models to Reason in a Continuous Latent Space
arXiv
Dec 2024
Paper - Code
Compressed chain of thought: Efficient reasoning through dense representations
arXiv
Dec 2024
Paper
Multimodal Latent Language Modeling with Next-Token Diffusion
arXiv
Dec 2024
Paper - Page
Efficient Reasoning with Hidden Thinking
arXiv
Jan 2025
Paper - Code
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
ICML 2025
Feb 2025
Paper
Lightthinker: Thinking step-by-step compression
arXiv
Feb 2025
Paper - Code
Codi: Compressing chain-of-thought into continuous space via self-distillation
arXiv
Feb 2025
Paper - Code
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
arXiv
May 2025
Paper
Hybrid Latent Reasoning via Reinforcement Learning
NeurIPS 2025
May 2025
Paper - Code
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
arXiv
Jun 2025
Paper - Code
Parallel Continuous Chain-of-Thought with Jacobi Iteration
arXiv
Jun 2025
Paper - Code
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
arXiv
Jun 2025
Paper - Code
SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought
arXiv
Aug 2025
Paper
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
arXiv
Oct 2025
Paper
Parallel Loop Transformer for Efficient Test-Time Computation Scaling
arXiv
Oct 2025
Paper
Scaling Latent Reasoning via Looped Language Models
arXiv
Oct 2025
Paper - Page - Code
๐ฏ Training Strategies for Recurrent Reasoning
Title
Venue
Date
Links
From explicit cot to implicit cot: Learning to internalize cot step by step
arXiv
May 2024
Paper
On the inductive bias of stacking towards improving reasoning
NeurIPS 2024
Jun 2024
Paper
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
arXiv
Nov 2024
Paper - Code
Training large language models to reason in a continuous latent space
COLM 2025
Dec 2024
Paper
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
arXiv
Feb 2025
Paper
Reasoning with latent thoughts: On the power of looped transformers
arXiv
Feb 2025
Paper
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
arXiv
May 2025
Paper - Code - Project
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
arXiv
May 2025
Paper Code
SIM-CoT: Supervised Implicit Chain-of-Thought
arXiv
Sep 2025
Paper Code
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
arXiv
Oct 2025
Paper Code Project
โจ Applications and Capabilities
Title
Venue
Date
Links
Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks
NeurIPS 2021
Oct 2021
Paper - Code
Looped transformers as programmable computers
ICML 2023
Jun 2023
Paper - Code
Simulation of graph algorithms with looped transformers
arXiv
Feb 2024
Paper - Code
Guiding Language Model Reasoning with Planning Tokens
CoLM 2024
Feb 2024
Paper - Code
Can looped transformers learn to implement multi-step gradient descent for in-context learning?
arXiv
Oct 2024
Paper
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent
arXiv
Oct 2024
Paper
Disentangling memory and reasoning ability in large language models
arXiv
Nov 2024
Paper
LatentPrompt: Optimizing Promts in Latent Space
arXiv
Aug 2025
Paper
โณ Temporal Hidden-state Methods
๐ฆ Hidden-state based methods
Title
Venue
Date
Links
Gated linear attention transformers with hardware-efficient training
arXiv
Dec 2023
Paper - Code
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence
arXiv
Apr 2024
Paper - Code
Hgrn2: Gated linear rnns with state expansion
arXiv
Apr 2024
Paper - Code
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality
arXiv
May 2024
Paper - Code
Parallelizing linear transformers with the delta rule over sequence length
arXiv
Jun 2024
Paper - Code
โ๏ธ Optimization-based State Evolution
Title
Venue
Date
Links
Learning to (learn at test time): Rnns with expressive hidden states
arXiv
Jul 2024
Paper
Gated Delta Networks: Improving Mamba2 with Delta Rule
arXiv
Dec 2024
Paper - Code
Titans: Learning to memorize at test time
arXiv
Jan 2025
Paper
Lattice: Learning to efficiently compress the memory
arXiv
Apr 2025
Paper
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
arXiv
Apr 2025
Paper
Atlas: Learning to optimally memorize the context at test time
arXiv
May 2025
Paper
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
arXiv
May 2025
Paper
| Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers | SSRN | May 2025 | Paper - Code |
๐ญ Training-induced Hidden-State Conversion
Title
Venue
Date
Links
Linearizing large language models
arXiv
May 2024
Paper
Transformers to ssms: Distilling quadratic knowledge to subquadratic models
NeurIPS 2024
Jun 2024
Paper
LoLCATs: On Low-Rank Linearizing of Large Language Models
ICLR 2025
Oct 2024
Paper
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
arXiv
Feb 2025
Paper - Code
Liger: Linearizing Large Language Models to Gated Recurrent Structures
arXiv
Mar 2025
Paper - Code
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
arXiv
Jul 2025
Paper
๐ฌ Mechanistic Interpretability
๐ง Do Layer Stacks Reflect Latent CoT?
Title
Venue
Date
Links
Towards a mechanistic interpretation of multi-step reasoning capabilities of language models
arXiv
Oct 2023
Paper - Code
Iteration head: A mechanistic study of chain-of-thought
NeurIPS 2024
Jun 2024
Paper
Towards understanding how transformer perform multi-step reasoning with matching operation
arXiv
Jun 2024
Paper
Do LLMs Really Think Step-by-step In Implicit Reasoning?
arXiv
Nov 2024
Paper
Back attention: Understanding and enhancing multi-hop reasoning in large language models
arXiv
Feb 2025
Paper
How Do LLMs Perform Two-Hop Reasoning in Context?
arXiv
Feb 2025
Paper
Reasoning with latent thoughts: On the power of looped transformers
arXiv
Feb 2025
Paper
A little depth goes a long way: The expressive power of log-depth transformers
arXiv
Mar 2025
Paper
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
arXiv
Jul 2025
Paper - Code
๐ ๏ธ Mechanisms of Latent CoT in Layer Representation
Title
Venue
Date
Links
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting
NeurIPS 2019
Jul 2019
Paper
Transformer feed-forward layers are key-value memories
EMNLP 2021
Dec 2020
Paper
Interpretability in the wild: a circuit for indirect object identification in GPT-2 small
arXiv
Nov 2022
Paper
micse: Mutual information contrastive learning for low-shot sentence embeddings
arXiv
Nov 2022
Paper - Code
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
NeurIPS 2023
May 2023
Paper
A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis
EMNLP 2023
May 2023
Paper - Code
Why lift so heavy? slimming large language models by cutting off the layers
arXiv
Feb 2024
Paper
Do large language models latently perform multi-hop reasoning?
EACL 2024
Feb 2024
Paper
Understanding and Patching Compositional Reasoning in LLMs
ACL 2024 (Finding)
Feb 2024
Paper - Code
How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
ICLR 2024
Feb 2024
Paper
The Unreasonable Ineffectiveness of the Deeper Layers
arXiv
Mar 2024
Paper
Inheritune: Training Smaller Yet More Attentive Language Models
arXiv
Apr 2024
Paper - Code
Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization
ICML 2024
May 2024
Paper - Code
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning
NeurIPS 2024
May 2024
Paper - Code
Loss landscape geometry reveals stagewise development of transformers
Hi-DL 2024
Jun 2024
Paper
Hopping too late: Exploring the limitations of large language models on multi-hop queries
arXiv
Jun 2024
Paper
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning
arXiv
Jun 2024
Paper
Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons
arXiv
Aug 2024
Paper
Unveiling induction heads: Provable training dynamics and feature learning in transformers
arXiv
Sep 2024
Paper
Investigating layer importance in large language models
arXiv
Sep 2024
Paper
Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations
arXiv
Oct 2024
Paper - Code
Understanding Layer Significance in LLM Alignment
arXiv
Oct 2024
Paper
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
ICLR 2025
Oct 2024
Paper - Code
Does representation matter? exploring intermediate layers in large language models
arXiv
Dec 2024
Paper
Layer by Layer: Uncovering Hidden Representations in Language Models
ICML 2025 (oral)
Feb 2025
Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
arXiv
Feb 2025
Paper - Code
The Curse of Depth in Large Language Models
arXiv
Feb 2025
Paper - Code
Back attention: Understanding and enhancing multi-hop reasoning in large language models
arXiv
Feb 2025
Paper
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis
arXiv
Feb 2025
Paper
An explainable transformer circuit for compositional generalization
arXiv
Feb 2025
Paper
Emergent Abilities in Large Language Models: A Survey
arXiv
Mar 2025
Paper
Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights
arXiv
May 2025
Paper
Do Language Models Use Their Depth Efficiently?
arXiv
May 2025
Paper
Void in Language Models
arXiv
May 2025
Paper
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
arXiv
Aug 2025
Paper
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
COLM Workshop
Sep 2025
Paper Code
๐ป Turing Completeness of Layer-Based Latent CoT
Title
Venue
Date
Links
On the computational power of neural nets
JCSS
1995
Paper
Long Short-Term Memory
Neural Computation
1997
Paper
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
EMNLP 2014
Jun 2014
Paper
On the turing completeness of modern neural network architectures
IJCNN 2021
Jan 2019
Paper
Recurrent memory transformer
NeurIPS 2022
Jul 2022
Paper
Looped transformers as programmable computers
ICML 2023
Jun 2023
Paper
On limitations of the transformer architecture
CoLM 2024
Nov 2023
Paper
Investigating Recurrent Transformers with Dynamic Halt
arXiv
Feb 2024
Paper
Chain of thought empowers transformers to solve inherently serial problems
ICLR 2024
Feb 2024
Paper
Quiet-star: Language models can teach themselves to think before speaking
arXiv
Mar 2024
Paper
Ask, and it shall be given: On the Turing completeness of prompting
arXiv
Nov 2024
Paper
Reinforcement Pre-Training
arXiv
Jun 2025
Paper
Constant Bit-size Transformers Are Turing Complete
arXiv
Jun 2025
Paper
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
arXiv
Jul 2025
Paper - Code
โพ๏ธ Towards Infinite-depth Reasoning
๐ Spatial Infinite Reasoning: Text Diffusion Models
โฌ Masked Diffusion Models (Temporal-only)
Title
Venue
Date
Links
Structured denoising diffusion models in discrete state-spaces
NeurIPS 2021
Jul 2021
Paper
Discrete diffusion modeling by estimating the ratios of the data distribution
ICML 2024
June 2024
Paper
Your absorbing discrete diffusion secretly models the conditional distributions of clean data
arXiv
Jun 2024
Paper
Learning Iterative Reasoning through Energy Diffusion
ICML 2024
Jun 2024
Paper - Project
Simplified and generalized masked diffusion for discrete data
NeurIPS 2024
Jun 2024
Paper -Project
Simple and effective masked diffusion language models
NeurIPS 2024
Jun 2024
Paper - Code
Scaling up Masked Diffusion Models on Text
arXiv
Oct 2024
Paper - Project
MMaDa: Multimodal large diffusion language models
arXiv
May 2025
Paper - Project
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
arXiv
Aug 2025
Paper - Code
โฌ Masked Diffusion Models (With Cache)
Title
Venue
Date
Links
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models
ICLR 2024
Feb 2024
Paper - Project
Large Language Diffusion Models
ICLR 2025 Workshop
Feb 2025
Paper - Project
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
ICLR 2025
Feb 2025
Paper - Project
dKV-Cache: The Cache for Diffusion Language Models
arXiv
May 2025
Paper - Project
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
arXiv
May 2025
Paper - Project
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
arXiv
May 2025
Paper
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
arXiv
May 2025
Paper - Project
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
arXiv
June 2025
Paper - Project
Diffusion Beats Autoregressive in Data-Constrained Settings
arXiv
July 2025
Paper - Project - Code
๐ Embedding-based Diffusion Models
Title
Venue
Date
Links
Diffusion-LM Improves Controllable Text Generation
NeurIPS 2022
May 2022
Paper - Project
Continuous di๏ฌusion for categorical data
arXiv
Dec 2022
Paper
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
ICLR 2023
Mar 2023
Paper - Project
Likelihood-Based Diffusion Language Models
NeurIPS 2023
May 2023
Paper - Project
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models
ICLR 2024
Feb 2024
Paper - Project
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
EACL 2024
Feb 2024
Paper - Project
TESS 2: A Large-Scale Generalist Diffusion Language Model
arXiv
Feb 2025
Paper
๐งฌ Hybrid AR-Diffusion Models
Title
Venue
Date
Links
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
ICLR 2025
Oct 2024
Paper - Project
Large Language Models to Diffusion Finetuning
ICML 2025
Jan 2025
Paper - Code
Dream 7B: a large diffusion language model
Blog
Apr 2025
Paper - Code
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Technical Report
May 2025
Paper
Mercury: Ultra-Fast Language Models Based on Diffusion
arXiv
June 2025
Paper - Page
๐ธ๏ธ Towards an 'Infinitely Long' Optimiser Network
Title
Venue
Date
Links
MEMORYLLM: Towards Self-Updatable Large Language Models
ICML 2024
Feb 2024
Paper - Code
Leave No Context Behind: Efficient infinite context transformers with infini-attention
arXiv
Apr 2024
Paper - Project
Learning to (learn at test time): Rnns with expressive hidden states
arXiv
Jul 2024
Paper
Titans: Learning to memorize at test time
arXiv
Jan 2025
Paper
Atlas: Learning to optimally memorize the context at test time
arXiv
May 2025
Paper
M+: Extending MemoryLLM with Scalable Long-Term Memory
ICML 2025
May 2025
Paper - Code
๐ Implicit Fixed Point RNNs
Title
Venue
Date
Links
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
ICML 2025
Feb 2025
Paper - Code
Title
Venue
Date
Links
A Survey of diffusion models in natural language processing
TACL
May 2023
Paper
Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis
CVPR 2025 (oral)
Dec 2024
Paper - Code
Large Language Diffusion Models
ICLR 2025 Workshop
Feb 2025
Paper - Project - Code