Merge pull request #26 from CMBSciPol/docs

ASKabalan · web-flow · commit f0475eb6f68b · 2026-03-25T14:19:42.000+01:00
enhance docs
diff --git a/docs/minimization.md b/docs/minimization.md
@@ -7,16 +7,111 @@
 The `solver_name` argument in the `minimize` function accepts the following:
 
 ### Recommended
-*   **`active_set`**: **Best for noisy maps.** Uses a projected gradient method with active set constraints. Robust against noise but might be slower on very clean data.
-*   **`optax_lbfgs`**: **Best for noiseless runs.** L-BFGS with zoom linesearch (Strong Wolfe conditions). Very fast and accurate for smooth, noise-free landscapes.
+*   **`ADABK0`**: **Best for noisy maps.** Active-set method with AdaBelief direction and Top-K constraint release (K=0, i.e. one constraint released per iteration). Very robust in low-SNR regions. See [How ADABK Works](#how-adabk-works) below.
+*   **`optax_lbfgs`**: **Best for noiseless runs (systematics).** L-BFGS with zoom linesearch (Strong Wolfe conditions). Very fast and accurate for smooth, noise-free landscapes.
 
 ### Other Options
-*   `optax_lbfgs`: L-BFGS.
-*   `adam`: Simple Adam optimizer (good for stochastic settings).
-*   `scipy_tnc`: Wrapper for SciPy's Truncated Newton (TNC).
-*   `optimistix_bfgs`: Standard BFGS from Optimistix.
-*   `optimistix_lbfgs`: Standard L-BFGS from Optimistix.
-*   `optimistix_ncg_*`: Nonlinear Conjugate Gradient variants (`pr`, `hs`, `fr`, `dy`).
+
+**Active set variants** (self-conditioned):
+
+*   `ADABK{N}` — AdaBelief + Top-K active set. `N * 0.1` = fraction of constraints released per step. `ADABK0` releases 1 constraint/step (most stable), `ADABK5` releases up to 50%. (see [How ADABK Works](#how-adabk-works) and paper for more info)
+*   `active_set` — Active set with Adam direction.
+*   `active_set_sgd` — Active set with SGD direction.
+*   `active_set_adabelief` — Active set with AdaBelief direction.
+*   `active_set_adaw` — Active set with AdamW direction.
+
+**Optax L-BFGS:**
+
+*   `optax_lbfgs` — L-BFGS with zoom linesearch (default) or backtracking.
+
+**Optax first-order:**
+
+*   `adam` — Adam optimizer.
+*   `sgd` — SGD with backtracking linesearch.
+*   `adabelief` — AdaBelief optimizer.
+*   `adaw` / `adamw` — AdamW optimizer.
+
+**Optimistix:**
+
+*   `optimistix_bfgs` — Full BFGS.
+*   `optimistix_lbfgs` — Limited-memory BFGS.
+*   `optimistix_ncg_pr` — Nonlinear Conjugate Gradient (Polak-Ribière).
+*   `optimistix_ncg_hs` — Nonlinear Conjugate Gradient (Hestenes-Stiefel).
+*   `optimistix_ncg_fr` — Nonlinear Conjugate Gradient (Fletcher-Reeves).
+*   `optimistix_ncg_dy` — Nonlinear Conjugate Gradient (Dai-Yuan).
+
+**SciPy** (self-conditioned):
+
+*   `scipy_tnc` — Truncated Newton (TNC).
+*   `scipy_cobyqa` — COBYQA (derivative-free constrained optimizer).
+
+
+## How ADABK Works
+
+ADABK (Adaptive AdaBelief with Top-K Active Set, also called **AdaTopK** in the paper) is a JAX-native optimizer that combines the TNC active-set constraint strategy with the AdaBelief adaptive gradient method.
+
+### Internal parameter space
+
+Physical parameters **x** (bounded by **l**, **u**) are mapped to a normalized [0, 1] representation via an affine transform:
+
+**y** = (**x** − **l**) / (**u** − **l**)
+
+This normalizes the optimization landscape and ensures consistent step sizes across parameters with different physical scales.
+
+### Active set and pivot vector
+
+Each parameter has a pivot value p_i:
+
+*   p_i = −1: parameter is at the lower bound (active constraint)
+*   p_i = +1: parameter is at the upper bound (active constraint)
+*   p_i = 0: parameter is free
+
+Only free parameters (p_i = 0) are optimized at each iteration.
+
+### Top-K constraint release
+
+At each iteration, a release score is computed for every active constraint:
+
+score_i = p_i × (−g_i)
+
+A positive score means the negative gradient points into the feasible region — releasing this constraint could decrease the objective. The Top-K fraction K controls how many constraints are released per iteration:
+
+*   **K = 0** (`ADABK0`): releases 1 constraint at a time. Most stable, consistently reaches the lowest objective values.
+*   **K = N** (`ADABK{N}`): releases up to `N × 0.1` fraction of active constraints.
+
+### Projected gradient and AdaBelief direction
+
+Gradients for active constraints are zeroed out: **g_proj** = **g** ⊙ (p = 0). The projected gradient is then fed to AdaBelief, which adapts step sizes based on gradient variance. This makes it better suited to noisy gradient landscapes (low-SNR regions) than classical quasi-Newton methods (L-BFGS, TNC) which tend to reset their curvature history when gradients are unreliable.
+
+### Dynamic state rescaling
+
+When the gradient norm falls outside [10⁻¹⁵, 10¹⁵], the cost function and AdaBelief moment estimates are rescaled:
+
+**m** ← f_scale · **m** ,  **v** ← f_scale² · **v**
+
+This prevents numerical under/overflow across the extreme dynamic range between the bright Galactic plane and faint high-latitude sky, without resetting the optimizer's momentum.
+
+### Bounded line search
+
+The step size α is capped at the distance to the nearest bound (α_max), then a line search finds the optimal α in [0, α_max]. If a parameter hits a bound, it becomes an active constraint.
+
+## Conditioning
+
+Conditioning (preconditioning) transforms the optimization problem to improve convergence. It applies two transformations before optimization:
+
+1.  **Parameter scaling**: min-max normalization to [0, 1] based on bounds.
+2.  **Gradient scaling**: the objective is scaled by 1/‖∇f‖ at initialization (like SciPy TNC's `fscale`), so the initial gradient norm is ≈ 1.
+
+### Self-conditioned solvers
+
+These solvers handle conditioning internally and ignore the `precondition` flag:
+
+*   **Active set variants** (`active_set`, `active_set_sgd`, `active_set_adabelief`, `active_set_adaw`, `ADABK{N}`) — use internal affine transform + dynamic state rescaling.
+*   **SciPy solvers** (`scipy_tnc`, `scipy_cobyqa`) — SciPy handles bounds and scaling internally.
+
+### Externally conditioned solvers
+
+All other solvers (`optax_lbfgs`, `adam`, `optimistix_*`, etc.) benefit from external conditioning when dealing with poorly scaled problems. Pass `precondition=True` (or a custom scaling function) to `minimize`.
 
 ## Minimizing Programmatically
 
@@ -34,7 +129,7 @@ final_params, state = minimize(
 )
 ```
 
-### Advanced: Steping interactively with Solvers
+### Advanced: Stepping Interactively with Solvers
 
 Since most solvers (except SciPy) are JAX-compatible, you can step through the optimization process manually. This is useful for custom logging or adaptive strategies.
 
diff --git a/src/furax_cs/r_analysis/binning.py b/src/furax_cs/r_analysis/binning.py
@@ -28,7 +28,6 @@
 _ALL_PARAM_NAMES = ["beta_dust", "temp_dust", "beta_pl"]
 
 
-
 def _squeeze_patches(arr: np.ndarray) -> np.ndarray:
     """Squeeze n_gridpts=1 leading dim from patch arrays if present."""
     if arr.ndim > 1:
diff --git a/src/furax_cs/r_analysis/caching.py b/src/furax_cs/r_analysis/caching.py
@@ -8,9 +8,8 @@
 from furax import HomothetyOperator
 from furax.obs import negative_log_likelihood, sky_signal
 from furax.obs.stokes import Stokes
-from jaxtyping import Array, Float, Int
-
 from furax_cs.optim import minimize
+from jaxtyping import Array, Float, Int
 
 
 def compute_w(
diff --git a/src/furax_cs/r_analysis/main.py b/src/furax_cs/r_analysis/main.py
@@ -3,7 +3,6 @@
 import datasets
 import matplotlib.pyplot as plt
 import scienceplots  # noqa: F401
-
 from furax_cs.data.instruments import get_instrument
 
 from ..logging_utils import (
diff --git a/src/furax_cs/scripts/bench_bcp.py b/src/furax_cs/scripts/bench_bcp.py
@@ -59,7 +59,6 @@
 from furax import HomothetyOperator
 from furax.obs import negative_log_likelihood
 from furax.obs.landscapes import Stokes
-
 from furax_cs import load_from_cache, minimize, save_to_cache
 from furax_cs.logging_utils import info
 
diff --git a/src/furax_cs/scripts/bench_clusters.py b/src/furax_cs/scripts/bench_clusters.py
@@ -30,7 +30,6 @@
 
 from furax.obs import negative_log_likelihood, spectral_cmb_variance
 from furax.obs.stokes import Stokes
-
 from furax_cs import (
     generate_noise_operator,
     kmeans_clusters,
diff --git a/src/furax_cs/scripts/compute_mr.py b/src/furax_cs/scripts/compute_mr.py
@@ -1,7 +1,6 @@
 import argparse
 
 import jax.numpy as jnp
-
 from furax_cs import get_mask
 from furax_cs.multires_clusters import multires_clusters
 
diff --git a/src/furax_cs/scripts/distributed_gridding.py b/src/furax_cs/scripts/distributed_gridding.py
@@ -70,13 +70,6 @@
 from furax.obs.landscapes import FrequencyLandscape
 from furax.obs.operators import NoiseDiagonalOperator
 from furax.obs.stokes import Stokes
-from jax_grid_search import DistributedGridSearch
-from jax_healpy.clustering import (
-    find_kmeans_clusters,
-    get_cutout_from_mask,
-    normalize_by_first_occurrence,
-)
-
 from furax_cs import (
     MASK_CHOICES,
     dump_default_search_space,
@@ -90,6 +83,12 @@
     sanitize_mask_name,
 )
 from furax_cs.logging_utils import info, success
+from jax_grid_search import DistributedGridSearch
+from jax_healpy.clustering import (
+    find_kmeans_clusters,
+    get_cutout_from_mask,
+    normalize_by_first_occurrence,
+)
 
 jax.config.update("jax_enable_x64", True)
 
diff --git a/src/furax_cs/scripts/fgbuster_model.py b/src/furax_cs/scripts/fgbuster_model.py
@@ -65,8 +65,6 @@
     )
 
 from furax.obs.stokes import Stokes
-from jax_healpy.clustering import get_cutout_from_mask, get_fullmap_from_cutout
-
 from furax_cs import (
     MASK_CHOICES,
     generate_noise_operator,
@@ -79,6 +77,7 @@
     sanitize_mask_name,
 )
 from furax_cs.logging_utils import info, success
+from jax_healpy.clustering import get_cutout_from_mask, get_fullmap_from_cutout
 
 jax.config.update("jax_enable_x64", True)
 
diff --git a/src/furax_cs/scripts/kmeans_model.py b/src/furax_cs/scripts/kmeans_model.py
@@ -72,8 +72,6 @@
     sky_signal,
 )
 from furax.obs.stokes import Stokes
-from jax_healpy.clustering import get_cutout_from_mask, normalize_by_first_occurrence
-
 from furax_cs import (
     MASK_CHOICES,
     generate_noise_operator,
@@ -87,6 +85,7 @@
     sanitize_mask_name,
 )
 from furax_cs.logging_utils import info, success
+from jax_healpy.clustering import get_cutout_from_mask, normalize_by_first_occurrence
 
 jax.config.update("jax_enable_x64", True)
 
diff --git a/src/furax_cs/scripts/ptep_model.py b/src/furax_cs/scripts/ptep_model.py
@@ -13,9 +13,6 @@
 import numpy as np
 from furax.obs import negative_log_likelihood, sky_signal
 from furax.obs.stokes import Stokes
-from jax_healpy.clustering import get_cutout_from_mask
-from tqdm import tqdm
-
 from furax_cs import (
     MASK_CHOICES,
     generate_noise_operator,
@@ -29,6 +26,8 @@
     sanitize_mask_name,
 )
 from furax_cs.logging_utils import info, success
+from jax_healpy.clustering import get_cutout_from_mask
+from tqdm import tqdm
 
 jax.config.update("jax_enable_x64", True)
 
diff --git a/tests/test_binning.py b/tests/test_binning.py
@@ -1,7 +1,6 @@
 """Tests for the binning helper bin_parameter_map."""
 
 import numpy as np
-
 from furax_cs import bin_parameter_map
 
 

Original file line number	Diff line number	Diff line change
`@@ -65,8 +65,6 @@`
`65`	`65`	`)`
`66`	`66`
`67`	`67`	`from furax.obs.stokes import Stokes`
`68`		`-from jax_healpy.clustering import get_cutout_from_mask, get_fullmap_from_cutout`
`69`		`-`
`70`	`68`	`from furax_cs import (`
`71`	`69`	`MASK_CHOICES,`
`72`	`70`	`generate_noise_operator,`
`@@ -79,6 +77,7 @@`
`79`	`77`	`sanitize_mask_name,`
`80`	`78`	`)`
`81`	`79`	`from furax_cs.logging_utils import info, success`
	`80`	`+from jax_healpy.clustering import get_cutout_from_mask, get_fullmap_from_cutout`
`82`	`81`
`83`	`82`	`jax.config.update("jax_enable_x64", True)`
`84`	`83`
Original file line number	Diff line number	Diff line change
`@@ -72,8 +72,6 @@`
`72`	`72`	`sky_signal,`
`73`	`73`	`)`
`74`	`74`	`from furax.obs.stokes import Stokes`
`75`		`-from jax_healpy.clustering import get_cutout_from_mask, normalize_by_first_occurrence`
`76`		`-`
`77`	`75`	`from furax_cs import (`
`78`	`76`	`MASK_CHOICES,`
`79`	`77`	`generate_noise_operator,`
`@@ -87,6 +85,7 @@`
`87`	`85`	`sanitize_mask_name,`
`88`	`86`	`)`
`89`	`87`	`from furax_cs.logging_utils import info, success`
	`88`	`+from jax_healpy.clustering import get_cutout_from_mask, normalize_by_first_occurrence`
`90`	`89`
`91`	`90`	`jax.config.update("jax_enable_x64", True)`
`92`	`91`