GitHub - multimodal-art-projection/LatentCoT-Horizon: 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.

LatentCoT-Horizon

If you like our project, please give us a star ⭐ on GitHub for the latest update.

This repository provides the papers mentioned in the survey "A Survey on Latent Reasoning".

📑 Citation

If you find our survey useful for your research, please consider citing the following paper:

@article{map2025latent,
  title={A Survey on Latent Reasoning},
  author={M-A-P},
  journal={arxiv},
  year={2025}
}

💡 We also have other generative projects that may interest you ✨.

🚀 Scaling Latent Reasoning via Looped Language Models
Rui-Jie Zhu, Zixuan Wang, Kai Hua etc. Page Model

📣 Update News

[2025-07-08] We have released the arxiv: A Survey on Latent Reasoning.

[2025-07-04] We have initialed the repository.

🆚 Explicit Reasoning vs. Latent Reasoning

⚡ Contributing

We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community. We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers or update any preprint arXiv papers that have been accepted to some venue.

If you want to add your work or model to this list, please do not hesitate to email ridger@ucsc.edu or pull requests.

Markdown format:

* | **Paper Name** | Name of Conference or Journal + Year | Release Date | [Paper](link) - [Code](link) |

📜 Papers

🧠 Latent CoT Reasoning

🔄 Activation-based Recurrent Methods

🧱 Architectural Recurrence

Title	Venue	Date	Links
Universal Transformers	ICLR 2019	Jul 2018	Paper - Code
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference	ICLR 2025	Oct 2023	Paper
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures	TMLR 2025	Feb 2024	Paper - Code
Relaxed recursive transformers: Effective parameter sharing with layer-wise Lora	ICLR 2025	Oct 2024	Paper
Byte Latent Transformer: Patches Scale Better Than Tokens	ACL 2025 Outstanding Paper	Dec 2024	Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach	ICLR 2025	Feb 2025	Paper - Code
LLM Pretraining with Continuous Concepts	arXiv	Feb 2025	Paper - Code
Pretraining Language Models to Ponder in Continuous Space	arXiv	May 2025	Paper - Code
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation	arXiv	Jul 2025	Paper - Code

🏋️ Training-induced Recurrence

Title	Venue	Date	Links
Think before you speak: Training Language Models With Pause Tokens	ICLR 2024	Oct 2023	Paper
Guiding Language Model Reasoning with Planning Tokens	COLM 2024	Oct 2023	Paper - Code
Let's Think Dot by Dot: Hidden computation in transformer language models	COLM 2024	Apr 2024	Paper - Code
Disentangling memory and reasoning ability in large language models	ACL 2025 (main)	Nov 2024	Paper - Code
Training Large Language Models to Reason in a Continuous Latent Space	arXiv	Dec 2024	Paper - Code
Compressed chain of thought: Efficient reasoning through dense representations	arXiv	Dec 2024	Paper
Multimodal Latent Language Modeling with Next-Token Diffusion	arXiv	Dec 2024	Paper - Page
Efficient Reasoning with Hidden Thinking	arXiv	Jan 2025	Paper - Code
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning	ICML 2025	Feb 2025	Paper
Lightthinker: Thinking step-by-step compression	arXiv	Feb 2025	Paper - Code
Codi: Compressing chain-of-thought into continuous space via self-distillation	arXiv	Feb 2025	Paper - Code
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts	arXiv	May 2025	Paper
Hybrid Latent Reasoning via Reinforcement Learning	NeurIPS 2025	May 2025	Paper - Code
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	arXiv	Jun 2025	Paper - Code
Parallel Continuous Chain-of-Thought with Jacobi Iteration	arXiv	Jun 2025	Paper - Code
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains	arXiv	Jun 2025	Paper - Code
SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought	arXiv	Aug 2025	Paper
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts	arXiv	Oct 2025	Paper
Parallel Loop Transformer for Efficient Test-Time Computation Scaling	arXiv	Oct 2025	Paper
Scaling Latent Reasoning via Looped Language Models	arXiv	Oct 2025	Paper - Page - Code

🎯 Training Strategies for Recurrent Reasoning

Title	Venue	Date	Links
From explicit cot to implicit cot: Learning to internalize cot step by step	arXiv	May 2024	Paper
On the inductive bias of stacking towards improving reasoning	NeurIPS 2024	Jun 2024	Paper
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding	arXiv	Nov 2024	Paper - Code
Training large language models to reason in a continuous latent space	COLM 2025	Dec 2024	Paper
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning	arXiv	Feb 2025	Paper
Reasoning with latent thoughts: On the power of looped transformers	arXiv	Feb 2025	Paper
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space	arXiv	May 2025	Paper - Code - Project
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space	arXiv	May 2025	Paper Code
SIM-CoT: Supervised Implicit Chain-of-Thought	arXiv	Sep 2025	Paper Code
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs	arXiv	Oct 2025	Paper Code Project

✨ Applications and Capabilities

Title	Venue	Date	Links
Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks	NeurIPS 2021	Oct 2021	Paper - Code
Looped transformers as programmable computers	ICML 2023	Jun 2023	Paper - Code
Simulation of graph algorithms with looped transformers	arXiv	Feb 2024	Paper - Code
Guiding Language Model Reasoning with Planning Tokens	CoLM 2024	Feb 2024	Paper - Code
Can looped transformers learn to implement multi-step gradient descent for in-context learning?	arXiv	Oct 2024	Paper
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent	arXiv	Oct 2024	Paper
Disentangling memory and reasoning ability in large language models	arXiv	Nov 2024	Paper
LatentPrompt: Optimizing Promts in Latent Space	arXiv	Aug 2025	Paper

⏳ Temporal Hidden-state Methods

📦 Hidden-state based methods

Title	Venue	Date	Links
Gated linear attention transformers with hardware-efficient training	arXiv	Dec 2023	Paper - Code
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence	arXiv	Apr 2024	Paper - Code
Hgrn2: Gated linear rnns with state expansion	arXiv	Apr 2024	Paper - Code
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality	arXiv	May 2024	Paper - Code
Parallelizing linear transformers with the delta rule over sequence length	arXiv	Jun 2024	Paper - Code

⚙️ Optimization-based State Evolution

Title	Venue	Date	Links
Learning to (learn at test time): Rnns with expressive hidden states	arXiv	Jul 2024	Paper
Gated Delta Networks: Improving Mamba2 with Delta Rule	arXiv	Dec 2024	Paper - Code
Titans: Learning to memorize at test time	arXiv	Jan 2025	Paper
Lattice: Learning to efficiently compress the memory	arXiv	Apr 2025	Paper
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization	arXiv	Apr 2025	Paper
Atlas: Learning to optimally memorize the context at test time	arXiv	May 2025	Paper
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration	arXiv	May 2025	Paper

| Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers| SSRN | May 2025 | Paper - Code |

🎭 Training-induced Hidden-State Conversion

Title	Venue	Date	Links
Linearizing large language models	arXiv	May 2024	Paper
Transformers to ssms: Distilling quadratic knowledge to subquadratic models	NeurIPS 2024	Jun 2024	Paper
LoLCATs: On Low-Rank Linearizing of Large Language Models	ICLR 2025	Oct 2024	Paper
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing	arXiv	Feb 2025	Paper - Code
Liger: Linearizing Large Language Models to Gated Recurrent Structures	arXiv	Mar 2025	Paper - Code
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows	arXiv	Jul 2025	Paper

🔬 Mechanistic Interpretability

🧐 Do Layer Stacks Reflect Latent CoT?

Title	Venue	Date	Links
Towards a mechanistic interpretation of multi-step reasoning capabilities of language models	arXiv	Oct 2023	Paper - Code
Iteration head: A mechanistic study of chain-of-thought	NeurIPS 2024	Jun 2024	Paper
Towards understanding how transformer perform multi-step reasoning with matching operation	arXiv	Jun 2024	Paper
Do LLMs Really Think Step-by-step In Implicit Reasoning?	arXiv	Nov 2024	Paper
Back attention: Understanding and enhancing multi-hop reasoning in large language models	arXiv	Feb 2025	Paper
How Do LLMs Perform Two-Hop Reasoning in Context?	arXiv	Feb 2025	Paper
Reasoning with latent thoughts: On the power of looped transformers	arXiv	Feb 2025	Paper
A little depth goes a long way: The expressive power of log-depth transformers	arXiv	Mar 2025	Paper
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer	arXiv	Jul 2025	Paper - Code

🛠️ Mechanisms of Latent CoT in Layer Representation

Title	Venue	Date	Links
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting	NeurIPS 2019	Jul 2019	Paper
Transformer feed-forward layers are key-value memories	EMNLP 2021	Dec 2020	Paper
Interpretability in the wild: a circuit for indirect object identification in GPT-2 small	arXiv	Nov 2022	Paper
micse: Mutual information contrastive learning for low-shot sentence embeddings	arXiv	Nov 2022	Paper - Code
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model	NeurIPS 2023	May 2023	Paper
A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis	EMNLP 2023	May 2023	Paper - Code
Why lift so heavy? slimming large language models by cutting off the layers	arXiv	Feb 2024	Paper
Do large language models latently perform multi-hop reasoning?	EACL 2024	Feb 2024	Paper
Understanding and Patching Compositional Reasoning in LLMs	ACL 2024 (Finding)	Feb 2024	Paper - Code
How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning	ICLR 2024	Feb 2024	Paper
The Unreasonable Ineffectiveness of the Deeper Layers	arXiv	Mar 2024	Paper
Inheritune: Training Smaller Yet More Attentive Language Models	arXiv	Apr 2024	Paper - Code
Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization	ICML 2024	May 2024	Paper - Code
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning	NeurIPS 2024	May 2024	Paper - Code
Loss landscape geometry reveals stagewise development of transformers	Hi-DL 2024	Jun 2024	Paper
Hopping too late: Exploring the limitations of large language models on multi-hop queries	arXiv	Jun 2024	Paper
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning	arXiv	Jun 2024	Paper
Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons	arXiv	Aug 2024	Paper
Unveiling induction heads: Provable training dynamics and feature learning in transformers	arXiv	Sep 2024	Paper
Investigating layer importance in large language models	arXiv	Sep 2024	Paper
Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations	arXiv	Oct 2024	Paper - Code
Understanding Layer Significance in LLM Alignment	arXiv	Oct 2024	Paper
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation	ICLR 2025	Oct 2024	Paper - Code
Does representation matter? exploring intermediate layers in large language models	arXiv	Dec 2024	Paper
Layer by Layer: Uncovering Hidden Representations in Language Models	ICML 2025 (oral)	Feb 2025	Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach	arXiv	Feb 2025	Paper - Code
The Curse of Depth in Large Language Models	arXiv	Feb 2025	Paper - Code
Back attention: Understanding and enhancing multi-hop reasoning in large language models	arXiv	Feb 2025	Paper
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis	arXiv	Feb 2025	Paper
An explainable transformer circuit for compositional generalization	arXiv	Feb 2025	Paper
Emergent Abilities in Large Language Models: A Survey	arXiv	Mar 2025	Paper
Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights	arXiv	May 2025	Paper
Do Language Models Use Their Depth Efficiently?	arXiv	May 2025	Paper
Void in Language Models	arXiv	May 2025	Paper
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking	arXiv	Aug 2025	Paper
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer	COLM Workshop	Sep 2025	Paper Code

💻 Turing Completeness of Layer-Based Latent CoT

Title	Venue	Date	Links
On the computational power of neural nets	JCSS	1995	Paper
Long Short-Term Memory	Neural Computation	1997	Paper
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation	EMNLP 2014	Jun 2014	Paper
On the turing completeness of modern neural network architectures	IJCNN 2021	Jan 2019	Paper
Recurrent memory transformer	NeurIPS 2022	Jul 2022	Paper
Looped transformers as programmable computers	ICML 2023	Jun 2023	Paper
On limitations of the transformer architecture	CoLM 2024	Nov 2023	Paper
Investigating Recurrent Transformers with Dynamic Halt	arXiv	Feb 2024	Paper
Chain of thought empowers transformers to solve inherently serial problems	ICLR 2024	Feb 2024	Paper
Quiet-star: Language models can teach themselves to think before speaking	arXiv	Mar 2024	Paper
Ask, and it shall be given: On the Turing completeness of prompting	arXiv	Nov 2024	Paper
Reinforcement Pre-Training	arXiv	Jun 2025	Paper
Constant Bit-size Transformers Are Turing Complete	arXiv	Jun 2025	Paper
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought	arXiv	Jul 2025	Paper - Code

♾️ Towards Infinite-depth Reasoning

🌀 Spatial Infinite Reasoning: Text Diffusion Models

⬛ Masked Diffusion Models (Temporal-only)

Title	Venue	Date	Links
Structured denoising diffusion models in discrete state-spaces	NeurIPS 2021	Jul 2021	Paper
Discrete diffusion modeling by estimating the ratios of the data distribution	ICML 2024	June 2024	Paper
Your absorbing discrete diffusion secretly models the conditional distributions of clean data	arXiv	Jun 2024	Paper
Learning Iterative Reasoning through Energy Diffusion	ICML 2024	Jun 2024	Paper - Project
Simplified and generalized masked diffusion for discrete data	NeurIPS 2024	Jun 2024	Paper -Project
Simple and effective masked diffusion language models	NeurIPS 2024	Jun 2024	Paper - Code
Scaling up Masked Diffusion Models on Text	arXiv	Oct 2024	Paper - Project
MMaDa: Multimodal large diffusion language models	arXiv	May 2025	Paper - Project
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models	arXiv	Aug 2025	Paper - Code

⬛ Masked Diffusion Models (With Cache)

Title	Venue	Date	Links
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models	ICLR 2024	Feb 2024	Paper - Project
Large Language Diffusion Models	ICLR 2025 Workshop	Feb 2025	Paper - Project
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning	ICLR 2025	Feb 2025	Paper - Project
dKV-Cache: The Cache for Diffusion Language Models	arXiv	May 2025	Paper - Project
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching	arXiv	May 2025	Paper - Project
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	arXiv	May 2025	Paper
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	arXiv	May 2025	Paper - Project
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning	arXiv	June 2025	Paper - Project
Diffusion Beats Autoregressive in Data-Constrained Settings	arXiv	July 2025	Paper - Project - Code

🔗 Embedding-based Diffusion Models

Title	Venue	Date	Links
Diffusion-LM Improves Controllable Text Generation	NeurIPS 2022	May 2022	Paper - Project
Continuous diﬀusion for categorical data	arXiv	Dec 2022	Paper
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning	ICLR 2023	Mar 2023	Paper - Project
Likelihood-Based Diffusion Language Models	NeurIPS 2023	May 2023	Paper - Project
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models	ICLR 2024	Feb 2024	Paper - Project
TESS: Text-to-Text Self-Conditioned Simplex Diffusion	EACL 2024	Feb 2024	Paper - Project
TESS 2: A Large-Scale Generalist Diffusion Language Model	arXiv	Feb 2025	Paper

🧬 Hybrid AR-Diffusion Models

Title	Venue	Date	Links
Scaling Diffusion Language Models via Adaptation from Autoregressive Models	ICLR 2025	Oct 2024	Paper - Project
Large Language Models to Diffusion Finetuning	ICML 2025	Jan 2025	Paper - Code
Dream 7B: a large diffusion language model	Blog	Apr 2025	Paper - Code
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities	Technical Report	May 2025	Paper
Mercury: Ultra-Fast Language Models Based on Diffusion	arXiv	June 2025	Paper - Page

🕸️ Towards an 'Infinitely Long' Optimiser Network

Title	Venue	Date	Links
MEMORYLLM: Towards Self-Updatable Large Language Models	ICML 2024	Feb 2024	Paper - Code
Leave No Context Behind: Efficient infinite context transformers with infini-attention	arXiv	Apr 2024	Paper - Project
Learning to (learn at test time): Rnns with expressive hidden states	arXiv	Jul 2024	Paper
Titans: Learning to memorize at test time	arXiv	Jan 2025	Paper
Atlas: Learning to optimally memorize the context at test time	arXiv	May 2025	Paper
M+: Extending MemoryLLM with Scalable Long-Term Memory	ICML 2025	May 2025	Paper - Code

📌 Implicit Fixed Point RNNs

Title	Venue	Date	Links
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity	ICML 2025	Feb 2025	Paper - Code

💬 Discussion

Title	Venue	Date	Links
A Survey of diffusion models in natural language processing	TACL	May 2023	Paper
Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis	CVPR 2025 (oral)	Dec 2024	Paper - Code
Large Language Diffusion Models	ICLR 2025 Workshop	Feb 2025	Paper - Project - Code

👍 Acknowledgement

Awesome-Latent-CoT: a curated list of papers exploring latent chain-of-thought reasoning in LLMs.
Awesome-Efficient-Reasoning: a curated list of works on making LLM reasoning cheaper and faster.
Efficient Reasoning Models: A Survey: the companion repo to the survey, aggregating methods/benchmarks for “shorter, smaller, faster” reasoning models.
Implicit Reasoning in Large Language Models: A Comprehensive Survey: a curated list of works on implicit reasoning in LLMs.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
src		src
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LatentCoT-Horizon

📑 Citation

📣 Update News

🆚 Explicit Reasoning vs. Latent Reasoning

⚡ Contributing

💼 Contents

📜 Papers

🧠 Latent CoT Reasoning

🔄 Activation-based Recurrent Methods

🧱 Architectural Recurrence

🏋️ Training-induced Recurrence

🎯 Training Strategies for Recurrent Reasoning

✨ Applications and Capabilities

⏳ Temporal Hidden-state Methods

📦 Hidden-state based methods

⚙️ Optimization-based State Evolution

🎭 Training-induced Hidden-State Conversion

🔬 Mechanistic Interpretability

🧐 Do Layer Stacks Reflect Latent CoT?

🛠️ Mechanisms of Latent CoT in Layer Representation

💻 Turing Completeness of Layer-Based Latent CoT

♾️ Towards Infinite-depth Reasoning

🌀 Spatial Infinite Reasoning: Text Diffusion Models

⬛ Masked Diffusion Models (Temporal-only)

⬛ Masked Diffusion Models (With Cache)

🔗 Embedding-based Diffusion Models

🧬 Hybrid AR-Diffusion Models

🕸️ Towards an 'Infinitely Long' Optimiser Network

📌 Implicit Fixed Point RNNs

💬 Discussion

👍 Acknowledgement

♥️ Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages