Fisher Flow (FF) is a unified framework for sequential parameter estimation that propagates Fisher information rather than probability distributions. It provides uncertainty quantification with 10-100x speedup over Bayesian methods.
Instead of tracking all possible parameter values and their probabilities (expensive!), Fisher Flow tracks just two things:
- Your best parameter estimate
- The Fisher Information Matrix (how confident you are)
When new data arrives, both update with simple matrix arithmetic—no integration required!
- Unified Framework: Reveals that Adam, Natural Gradient, and Elastic Weight Consolidation are all special cases
- Fast Uncertainty: Get confidence intervals without MCMC or variational inference
- Streaming Ready: Process data sequentially with bounded memory
- Distributed: Information matrices add across workers
- Theoretically Grounded: Proven convergence and efficiency guarantees
import numpy as np
from fisher_flow import DiagonalFF, KroneckerFF, FullFF
# Online logistic regression with uncertainty
model = DiagonalFF(dim=784)
for batch in data_stream:
# Update with new data
estimate, uncertainty = model.update(batch)
# Get confidence intervals
ci_lower, ci_upper = model.confidence_interval(0.95)
# Make predictions with uncertainty
pred_mean, pred_std = model.predict(x_new)Modern ML needs methods that can:
- Process streaming data efficiently
- Quantify uncertainty in predictions
- Scale to billions of parameters
- Combine information from distributed sources
Bayesian inference handles uncertainty but doesn't scale. SGD scales but lacks uncertainty.
Fisher Flow bridges this gap by propagating Fisher information—a quadratic approximation to the log-posterior curvature.
We didn't invent new math—we recognized a pattern. Many successful methods are implicitly doing Fisher Flow:
| Method | What It Actually Is |
|---|---|
| Adam | Diagonal Fisher Flow |
| Natural Gradient | Full Fisher Flow |
| K-FAC | Kronecker Fisher Flow |
| Elastic Weight Consolidation | Fisher Flow with memory |
| Kalman Filter | Linear-Gaussian Fisher Flow |
By naming this pattern, we can:
- Design new algorithms systematically
- Understand why existing methods work
- Choose approximations principled
# From PyPI (coming soon)
pip install fisher-flow
# From source
git clone https://github.com/yourusername/fisher-flow.git
cd fisher-flow
pip install -e .Choose your approximation based on your needs:
ScalarFF: One learning rate for all (SGD-like)DiagonalFF: Per-parameter learning rates (Adam-like)BlockFF: Groups share information (layer-wise)KroneckerFF: For matrix parameters (K-FAC-like)FullFF: Complete information matrix (Natural Gradient)
StationaryFF: Accumulate foreverWindowedFF: Recent data onlyExponentialFF: Gradual forgettingAdaptiveFF: Detect and adapt to changes
Benchmark results on standard tasks:
| Method | Accuracy | Calibration (ECE) | Time (s) | Memory |
|---|---|---|---|---|
| SGD | 75.4% | 0.082 | 1.2 | O(d) |
| Adam | 76.1% | 0.071 | 1.8 | O(d) |
| Fisher Flow (Diagonal) | 76.3% | 0.048 | 2.1 | O(d) |
| Fisher Flow (Block) | 76.8% | 0.041 | 4.5 | O(d) |
| Variational Bayes | 76.5% | 0.045 | 45.3 | O(d²) |
Fisher Flow updates follow the natural gradient on statistical manifolds:
# Information accumulation
I_t = I_{t-1} + F(batch_t)
# Parameter update
θ_t = I_t^{-1} (I_{t-1} θ_{t-1} + F(batch_t) θ_batch)
Where F(batch) is the Fisher Information from the batch. This simple update rule:
- ✅ Is invariant to reparameterization
- ✅ Achieves Cramér-Rao efficiency bound
- ✅ Combines information optimally
- ✅ Scales to streaming settings
- Blog Post: Fisher Flow in Plain English (coming soon)
- Tutorial Notebook: From SGD to Fisher Flow
- Video: The Information Geometry of Learning
- Simple: Online Linear Regression
- Intermediate: Neural Network Training
- Advanced: Continual Learning
- Research: Custom Fisher Flow Variants
We welcome contributions! Fisher Flow is a general pattern with many unexplored variants.
- Sparse Fisher Flow for high-dimensional models
- Fisher Flow for graph neural networks
- Hardware-optimized implementations
- Fisher Flow for reinforcement learning
- Non-parametric extensions
See CONTRIBUTING.md for guidelines.
If you use Fisher Flow in your research, please cite:
@article{towell2025fisherflow,
title={Fisher Flow: Information-Geometric Sequential Inference},
author={Towell, Alex},
journal={arXiv preprint arXiv:2025.xxxxx},
year={2025}
}- Basic Fisher Flow implementations
- Standard benchmarks
- PyTorch/JAX/TensorFlow backends
- Documentation and tutorials
- Integration with popular ML libraries
- Uncertainty quantification toolkit
- Continual learning framework
- Distributed training support
- Moment propagation beyond Fisher
- Causal Fisher Flow
- Fisher Flow for scientific computing
- AutoML for choosing approximations
Fisher Flow isn't just another optimization algorithm—it's a new lens for understanding learning:
All learning is information propagation with different carriers, metrics, dynamics, and objectives.
This perspective unifies:
- Supervised learning → Propagate label information to parameters
- Unsupervised learning → Propagate structure information to representations
- Meta-learning → Propagate task information to priors
- Transfer learning → Propagate domain information across tasks
- Author: Alex Towell ([email protected])
- Issues: GitHub Issues
- Discussions: GitHub Discussions
MIT License - see LICENSE file for details.
"Sometimes the biggest contribution isn't inventing something new—it's recognizing what's already there and giving it a name."