This repository provides a minimal, self-contained reference implementation of the Forward–Momentum Scaling Law proposed in:
Scaling and Transferability of Annealing Strategies in Large Language Model Training
AAAI 2026 (Main Track, Poster)
Siqi Wang, Zhengyu Chen, Teng Xiao, Zheqi Lv, Jinluan Yang, Xunliang Cai, Jingang Wang, Xiaomeng Li
This repository implements the key components introduced in our paper, including:
Computation of the cumulative forward learning-rate effect, characterized by the integral
A practical proxy for the kinetic effect of learning-rate decay, defined via a momentum-style update:
capturing both the rate and magnitude of decay during annealing.
A unified scaling formulation:
Robust Huber-loss optimization with L-BFGS-B for stable estimation of scaling parameters.
- annealing_scaling_law.py
- README.md
This code is intentionally minimal. It provides only the components required to reproduce the annealing-related scaling behavior discussed in the paper.
from annealing_scaling_law import compute_S, compute_M, fit_and_evaluate_lr_mom
S = compute_S(steps, learning_rates)
M = compute_M(steps, learning_rates)
y_fit, r2, mse, params = fit_and_evaluate_lr_mom(
y=loss_values,
x=S,
t=M,
n=model_size,
initial_params=[...]
)