RL Theory in Lean

Installation

It is recommended to use this project with the Lean 4 extension of VSCode. After cloning this project to a local folder and opening the folder inside VSCode, open RLTheory.lean and select Restart Server and Restart File in the Lean 4 extension. The project should then compile. See https://lean-lang.org/install/ for a detailed tutorial on setting up Lean.

Paper

Towards Formalizing Reinforcement Learning Theory

@article{zhang2025formalizing,
    title={Towards Formalizing Reinforcement Learning Theory},
    author={Shangtong Zhang},
    year={2025},
    journal={arXiv preprint arXiv:2511.03618}
}

Main Results

This project is organized by mirroring the structure of Mathlib so the counterparts can be upstreamed to Mathlib easily. Below are some main results.

Algorithm
- LinearTD.lean
  - [LinearTDIterates] - linear TD algorithm.
  - [DecreaseAlong] - $\ell_2$ norm qualifies a Lyapunov function for linear TD.
  - ae_tendsto_of_linearTD_iid — almost sure convergence of linear TD with i.i.d. samples.
  - ae_tendsto_of_linearTD_markov — almost sure convergence of linear TD with Markovian samples.
- QLearning.lean
  - [QLearningIterates] — Q-learning algorithm.
  - [DecreaseAlong] - $\ell_p$ norm with a sufficiently large $p$ qualifies a Lyapunov function for Q-learning.
  - ae_tendsto_of_QLearning_iid — almost sure convergence of Q-learning with i.i.d. samples.
  - ae_tendsto_of_QLearning_markov — almost sure convergence of Q-learning with Markovian samples.
StochasticApproximation
- Iterates.lean - stochastic approximation (SA) iterates and non-uniform bounds.
  - [Iterates] — a general form of SA iterates with Martingale difference noise and additional noise.
  - [IteratesOfResidual] — a specific form of SA iterates driven by residual of an operator.
  - [AdaptedOnSamplePath] — SA iterates are completely determined by available information until current time step on every sample path.
- CondExp.lean
  - condExp_iterates_update — computes the conditional expectation of SA iterates in the Ionescu-Tulcea probability space.
- Lyapunov.lean
  - [LyapunovFunction] — desired properties for a function to be used as a Lyapunov function to analyze SA iterates.
- LpSpace.lean - squared $\ell_p$ norm qualifies a Lyapunov function
  - smooth_half_sq_Lp — squared $\ell_p$ norm is $p - 1$ smooth.
- Pathwise.lean
  - fundamental_inequality — recursive bounds of errors based on a Lyapunov function on an individual sample path.
- DiscreteGronwall.lean — discrete Gronwall inequalities.
- MartingaleDifference.lean
  - ae_tendsto_of_iterates_mds_noise — convergence of SA iterates with Martingale difference noise.
- IIDSamples.lean
  - ae_tendsto_of_iterates_iid_samples — convergence of SA iterates with i.i.d. samples.
- StepSize.lean
  - anchors_of_inv_poly — inverse polynomial step size qualifies the skeleton iterates technique that can convert Markov noise to Martingale difference noise.
- MarkovSamples.lean
  - ae_tendsto_of_iterates_markov_samples — convergence of SA iterates with Markovian samples.
  - ae_tendsto_of_iterates_markov_samples_of_inv_poly — specializes to inverse polynomial step size schedules.
Data
- Int/GCD.lean
  - all_but_finite_of_closed_under_add_and_gcd_one — if a set has gcd 1 and is close under addition, it then contains all large enough integers.
- Matrix/PosDef.lean — positive definiteness (p.d.) for asymmetric real matrices.
  - posDefAsymm_iff' — a lower bound for the quadratic form of a p.d. matrix.
- Matrix/Stochastic.lean — stochastic vectors and row stochastic matrices.
  - smat_minorizable_with_large_pow — an irreducible and aperiodic stochastic matrix is Doeblin minorizable after sufficient powers.
  - smat_contraction_in_simplex - Doeblin minorization implies contraction in simplex
  - stationary_distribution_exists - produces one stationary distribution via Cesaro limit.
  - stationary_distribution_uniquely_exists — stationary distribution uniquely exists under irreducibility and aperiodicity.
  - [GeometricMixing] - geometric mixing under irreducibility and aperiodicity.
Probability
- MarkovChain/Defs.lean — basics of time homogeneous Markov chains.
- MarkovChain/Finite/Defs.lean — matrix representation of finite Markov chains
- MarkovChain/Trajectory.lean
  - traj_prob — constructs the sample path space probability measure for Markov chains by the Ionescu-Tulcea theorem.
- Martingale/Convergence.lean
  - ae_tendsto_zero_of_almost_supermartingale — Robbins-Siegmund theorem for the convergence of almost supermartingales.
MarkovDecisionProcess
- MarkovRewardProcess.lean — basics of finite MRPs.
  - [NegDefAsymm MRP.K] — negative definiteness of the matrix $D(\gamma P - I)$
- MarkovDecisionProcess.lean — basics of finite MDPs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
RLTheory		RLTheory
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RLTheory.lean		RLTheory.lean
lake-manifest.json		lake-manifest.json
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Theory in Lean

Installation

Paper

Main Results

About

Uh oh!

Releases

Packages

Languages

License

ShangtongZhang/rl-theory-in-lean

Folders and files

Latest commit

History

Repository files navigation

RL Theory in Lean

Installation

Paper

Main Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages