Stepik: https://stepik.org/a/272346
Research-oriented course and implementation of the L-BFGS / LBFGSB quasi-Newton optimization algorithm.
The project combines:
- Mathematical derivation from first principles
- Convergence theory and Wolfe line search analysis
- Two-loop recursion implementation
- Production-style Python solver
- Large-scale numerical experiments
Designed for researchers, ML engineers, and students studying second-order optimization methods.
Status: Active development.
Core solver implementation and theoretical documentation are stable.
Future improvements include:
- Extended benchmarking framework
- Additional test coverage
- Performance profiling
- Visualization of convergence dynamics
Contributions are welcome. See CONTRIBUTING.md for guidelines.
A mathematically rigorous course on the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm, covering theoretical foundations, convergence properties, numerical stability, and large-scale implementations.
The Limited-memory BFGS (L-BFGS) method is a quasi-Newton optimization algorithm designed for large-scale unconstrained and box-constrained smooth optimization problems. It achieves superlinear convergence under standard assumptions while requiring only linear memory in the number of variables.
This course presents:
- A full derivation of the BFGS update from the secant condition
- Proof of positive definiteness preservation
- Two-loop recursion derivation for L-BFGS
- Wolfe line-search theory
- Global convergence analysis
- Practical implementation details (including L-BFGS-B)
The treatment follows the theoretical framework established in:
- Numerical Optimization
- Convex Optimization
We study smooth optimization problems of the form:
where:
-
$f \in C^1$ (continuously differentiable), -
$\nabla f$ is Lipschitz continuous, - optionally
$f$ is strongly convex.
Extensions to box constraints:
are treated via the L-BFGS-B framework as implemented in:
- SciPy
- Second-order Taylor approximation
- Newton direction
- Limitations of exact Hessian computation
- Motivation for Hessian approximation
We derive the quasi-Newton equation:
where:
We show:
- Why the update must satisfy symmetry
- Why positive definiteness requires
$y_k^T s_k > 0$ - How the BFGS rank-two update arises from minimal correction principles
Derivation:
with:
We prove:
- Symmetry preservation
- Positive definiteness preservation
- Superlinear convergence under standard assumptions
For large
L-BFGS stores only the last
The two-loop recursion computes:
without forming
Memory complexity:
Time complexity per iteration:
We provide formal treatment of:
- Weak Wolfe conditions
- Strong Wolfe conditions
- Curvature condition
- Global convergence theorems
Under Wolfe conditions:
which ensures stability of the BFGS update.
Under:
- Lipschitz continuous gradient
- Strong convexity
- Wolfe line search
BFGS and L-BFGS achieve:
- Global convergence
- Local superlinear convergence
We discuss:
- Failure cases in nonconvex problems
- Saddle point behavior
- Curvature degeneracy
The repository contains:
- Pure Python implementation
- NumPy-based vectorized routines
- Explicit two-loop recursion
- Wolfe line-search implementation
- L-BFGS-B box constraint handling
Reference comparison with:
- SciPy
- Original Fortran L-BFGS-B implementation by Nocedal et al.
lbfgs-research/
│
├── theory/
│ ├── newton_method.md
│ ├── secant_condition.md
│ ├── bfgs_proof.md
│ ├── lbfgs_two_loop.md
│ ├── wolfe_conditions.md
│ └── convergence_analysis.md
│
├── implementation/
│ ├── lbfgs.py
│ ├── line_search.py
│ ├── lbfgsb.py
│ └── utils.py
│
├── experiments/
│ ├── quadratic_tests.ipynb
│ ├── logistic_regression.ipynb
│ └── large_scale_benchmark.ipynb
│
└── README.md
We benchmark:
- Quadratic functions (known Hessian)
- Ill-conditioned problems
- Logistic regression
- High-dimensional synthetic datasets
Metrics:
- Iteration count
- Gradient norm decay
- Function value reduction
- Wall-clock time
Comparisons include:
- Gradient Descent
- Newton’s Method
- Conjugate Gradient
- Linear algebra (spectral decomposition, positive definiteness)
- Multivariable calculus
- Numerical optimization theory
- Familiarity with quasi-Newton methods
L-BFGS is widely used in:
- Statistical estimation
- Maximum likelihood problems
- Inverse problems
- Scientific computing
- Large-scale machine learning
Framework adoption includes:
- SciPy
- PyTorch
- TensorFlow
If you use this repository for academic or research purposes, please cite:
@misc{lbfgs_course,
author = {Ruslan Senatorov},
title = {L-BFGS: Theory, Convergence, and Implementation},
year = {2026},
}MIT License
R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization, (1995), SIAM Journal on Scientific and Statistical Computing , 16, 5, pp. 1190-1208.
C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, Vol 23, Num. 4, pp. 550 - 560.
J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (2011), to appear in ACM Transactions on Mathematical Software.