You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/minimization.md
+104-9Lines changed: 104 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,111 @@
7
7
The `solver_name` argument in the `minimize` function accepts the following:
8
8
9
9
### Recommended
10
-
***`active_set`**: **Best for noisy maps.**Uses a projected gradient method with active set constraints. Robust against noise but might be slower on very clean data.
11
-
***`optax_lbfgs`**: **Best for noiseless runs.** L-BFGS with zoom linesearch (Strong Wolfe conditions). Very fast and accurate for smooth, noise-free landscapes.
10
+
***`ADABK0`**: **Best for noisy maps.**Active-set method with AdaBelief direction and Top-K constraint release (K=0, i.e. one constraint released per iteration). Very robust in low-SNR regions. See [How ADABK Works](#how-adabk-works) below.
11
+
***`optax_lbfgs`**: **Best for noiseless runs (systematics).** L-BFGS with zoom linesearch (Strong Wolfe conditions). Very fast and accurate for smooth, noise-free landscapes.
12
12
13
13
### Other Options
14
-
*`optax_lbfgs`: L-BFGS.
15
-
*`adam`: Simple Adam optimizer (good for stochastic settings).
16
-
*`scipy_tnc`: Wrapper for SciPy's Truncated Newton (TNC).
17
-
*`optimistix_bfgs`: Standard BFGS from Optimistix.
18
-
*`optimistix_lbfgs`: Standard L-BFGS from Optimistix.
*`ADABK{N}` — AdaBelief + Top-K active set. `N * 0.1` = fraction of constraints released per step. `ADABK0` releases 1 constraint/step (most stable), `ADABK5` releases up to 50%. (see [How ADABK Works](#how-adabk-works) and paper for more info)
18
+
*`active_set` — Active set with Adam direction.
19
+
*`active_set_sgd` — Active set with SGD direction.
20
+
*`active_set_adabelief` — Active set with AdaBelief direction.
21
+
*`active_set_adaw` — Active set with AdamW direction.
22
+
23
+
**Optax L-BFGS:**
24
+
25
+
*`optax_lbfgs` — L-BFGS with zoom linesearch (default) or backtracking.
ADABK (Adaptive AdaBelief with Top-K Active Set, also called **AdaTopK** in the paper) is a JAX-native optimizer that combines the TNC active-set constraint strategy with the AdaBelief adaptive gradient method.
52
+
53
+
### Internal parameter space
54
+
55
+
Physical parameters **x** (bounded by **l**, **u**) are mapped to a normalized [0, 1] representation via an affine transform:
56
+
57
+
**y** = (**x** − **l**) / (**u** − **l**)
58
+
59
+
This normalizes the optimization landscape and ensures consistent step sizes across parameters with different physical scales.
60
+
61
+
### Active set and pivot vector
62
+
63
+
Each parameter has a pivot value p_i:
64
+
65
+
* p_i = −1: parameter is at the lower bound (active constraint)
66
+
* p_i = +1: parameter is at the upper bound (active constraint)
67
+
* p_i = 0: parameter is free
68
+
69
+
Only free parameters (p_i = 0) are optimized at each iteration.
70
+
71
+
### Top-K constraint release
72
+
73
+
At each iteration, a release score is computed for every active constraint:
74
+
75
+
score_i = p_i × (−g_i)
76
+
77
+
A positive score means the negative gradient points into the feasible region — releasing this constraint could decrease the objective. The Top-K fraction K controls how many constraints are released per iteration:
78
+
79
+
***K = 0** (`ADABK0`): releases 1 constraint at a time. Most stable, consistently reaches the lowest objective values.
80
+
***K = N** (`ADABK{N}`): releases up to `N × 0.1` fraction of active constraints.
81
+
82
+
### Projected gradient and AdaBelief direction
83
+
84
+
Gradients for active constraints are zeroed out: **g_proj** = **g** ⊙ (p = 0). The projected gradient is then fed to AdaBelief, which adapts step sizes based on gradient variance. This makes it better suited to noisy gradient landscapes (low-SNR regions) than classical quasi-Newton methods (L-BFGS, TNC) which tend to reset their curvature history when gradients are unreliable.
85
+
86
+
### Dynamic state rescaling
87
+
88
+
When the gradient norm falls outside [10⁻¹⁵, 10¹⁵], the cost function and AdaBelief moment estimates are rescaled:
This prevents numerical under/overflow across the extreme dynamic range between the bright Galactic plane and faint high-latitude sky, without resetting the optimizer's momentum.
93
+
94
+
### Bounded line search
95
+
96
+
The step size α is capped at the distance to the nearest bound (α_max), then a line search finds the optimal α in [0, α_max]. If a parameter hits a bound, it becomes an active constraint.
97
+
98
+
## Conditioning
99
+
100
+
Conditioning (preconditioning) transforms the optimization problem to improve convergence. It applies two transformations before optimization:
101
+
102
+
1.**Parameter scaling**: min-max normalization to [0, 1] based on bounds.
103
+
2.**Gradient scaling**: the objective is scaled by 1/‖∇f‖ at initialization (like SciPy TNC's `fscale`), so the initial gradient norm is ≈ 1.
104
+
105
+
### Self-conditioned solvers
106
+
107
+
These solvers handle conditioning internally and ignore the `precondition` flag:
108
+
109
+
***Active set variants** (`active_set`, `active_set_sgd`, `active_set_adabelief`, `active_set_adaw`, `ADABK{N}`) — use internal affine transform + dynamic state rescaling.
All other solvers (`optax_lbfgs`, `adam`, `optimistix_*`, etc.) benefit from external conditioning when dealing with poorly scaled problems. Pass `precondition=True` (or a custom scaling function) to `minimize`.
20
115
21
116
## Minimizing Programmatically
22
117
@@ -34,7 +129,7 @@ final_params, state = minimize(
34
129
)
35
130
```
36
131
37
-
### Advanced: Steping interactively with Solvers
132
+
### Advanced: Stepping Interactively with Solvers
38
133
39
134
Since most solvers (except SciPy) are JAX-compatible, you can step through the optimization process manually. This is useful for custom logging or adaptive strategies.
0 commit comments