Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@
.. autofunction:: furax_cs.optim.solvers.get_solver
```

## Binning

```{eval-rst}
.. autofunction:: furax_cs.binning.bin_parameter_map
```

## Data

```{eval-rst}
Expand Down
28 changes: 28 additions & 0 deletions docs/cli_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ kmeans-model -n 64 -pc 10000 500 500 -m GAL020 -tag c1d1s1
|---|---|---|---|
| `-n`, `--nside` | `int` | `64` | HEALPix resolution |
| `-pc`, `--patch-count` | `int int int` | `10000 500 500` | Cluster counts for $[\beta_d, T_d, \beta_s]$ |
| `-c`, `--clusters` | `str str str` | none | Cluster source for $[\beta_d, T_d, \beta_s]$. Overrides `-pc`. See [below](#cluster-sources-c). |
| `-m`, `--mask` | `str` | `GAL020_U` | Galactic mask (see Masks section below) |
| `-tag`, `--tag` | `str` | `c1d1s1` | Sky simulation tag |
| `-ns`, `--noise-sim` | `int` | `1` | Number of noise realizations |
Expand All @@ -73,6 +74,33 @@ kmeans-model -n 64 -pc 10000 500 500 -m GAL020 -tag c1d1s1
| `-o`, `--output` | `str` | `results` | Output directory |
| `--name` | `str` | auto | Override output folder name |

### Cluster Sources (`-c`)

The `-c` flag provides fine-grained control over how clusters are defined for each spectral parameter. It accepts exactly three values, one for each of $[\beta_d, T_d, \beta_s]$:

| Value | Meaning |
|---|---|
| `true` | Use precomputed pixel subsets derived from true parameter values. Only available for `-tag c1d1s1`. |
| *integer* | Run K-means clustering with this many clusters (same as `-pc`). |
| *path to `.npy`* | Load a full-sky patches file (e.g., produced by [`r_analysis bin`](r_analysis/bin.md)). |

Values can be mixed freely. When `-c` is provided, it overrides `-pc` entirely.

**Examples:**

```bash
# Use precomputed true-parameter subsets for all three parameters
kmeans-model -n 64 -c true true true -m GAL020 -tag c1d1s1

# Use binned patches from r_analysis bin
kmeans-model -n 64 \
-c binned/patches_beta_dust.npy binned/patches_temp_dust.npy binned/patches_beta_pl.npy \
-m GAL020 -tag c1d1s1

# Mix: true for beta_dust, 50 K-means clusters for temp_dust, file for beta_synch
kmeans-model -n 64 -c true 50 binned/patches_beta_pl.npy -m GAL020 -tag c1d1s1
```

---

## `ptep-model`
Expand Down
Binary file added docs/images/binning_original_patches.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
86 changes: 86 additions & 0 deletions docs/r_analysis/bin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# bin — Post-Clustering Parameter Binning

The `bin` subcommand reads result folders produced by `kmeans-model` (or `ptep-model`), bins each spectral-parameter map into equal-width bins, and writes full-sky `.npy` patch files. These files can be fed directly to `kmeans-model -c` to re-run component separation with the binned clustering.

## Basic Usage

```bash
r_analysis bin \
-n 64 \
-r "kmeans_BD10000_TD500_BS500_GAL020" \
-ird results/ \
-o binned_patches/ \
--bin-bd 50 --bin-td 20 --bin-bs 30
```

## How It Works

1. **Load** result folders matching the `-r` pattern (each must contain `results.npz` and `mask.npy`).
2. **Combine** disjoint masks from multiple folders into a single valid-pixel mask.
3. **Select** a reference noise realization (controlled by `--noise-selection`).
4. **Bin** each parameter's pixel-level values into equal-width bins using `bin_parameter_map`.
5. **Write** full-sky `.npy` files to `--output-dir`:
- `patches_beta_dust.npy`
- `patches_temp_dust.npy`
- `patches_beta_pl.npy`
- `mask.npy`

Each `.npy` file is a `float64` array of shape `(npix,)` where valid pixels contain 0-based bin indices and masked pixels are `hp.UNSEEN`.

## Visual Example

The following figures illustrate the binning process on a GAL060 mask with 200/$\beta_d$, 100/$T_d$, 100/$\beta_s$ clusters binned down to 10 bins each.

**Original K-means patches vs binned:**

![Original vs binned patches](../images/binning_original_vs_binned.png)

## Arguments

| Flag | Type | Default | Description |
|---|---|---|---|
| `-o`, `--output-dir` | `str` | *required* | Output directory for `.npy` patch files |
| `--bin-bd` | `int` | none | Number of equal-width bins for $\beta_\mathrm{dust}$ |
| `--bin-td` | `int` | none | Number of equal-width bins for $T_\mathrm{dust}$ |
| `--bin-bs` | `int` | none | Number of equal-width bins for $\beta_\mathrm{synch}$ |
| `--noise-selection` | `str` | `min-value` | Noise realization selection: `min-value`, `min-nll`, or integer index |

At least one `--bin-*` argument is required. Parameters without a `--bin-*` flag are preserved with their original (renumbered) cluster indices.

Plus all [common arguments](index.md#common-arguments) (`-n`, `-r`, `-ird`, `--sky`, `-mi`, `-s`, etc.).

## Workflow: Bin and Re-Run

The typical workflow is to first run a high-resolution component separation, then bin the resulting parameters and re-run with the binned patches:

```bash
# 1. Initial high-resolution run
kmeans-model -n 64 -pc 10000 500 500 -m GAL020 -tag c1d1s1

# 2. Bin the parameters
r_analysis bin \
-n 64 \
-r "kmeans_BD10000_TD500_BS500_GAL020" \
-ird results/ \
-o binned_patches/ \
--bin-bd 50 --bin-td 20 --bin-bs 30

# 3. Re-run with binned patches
kmeans-model -n 64 \
-c binned_patches/patches_beta_dust.npy \
binned_patches/patches_temp_dust.npy \
binned_patches/patches_beta_pl.npy \
-m GAL020 -tag c1d1s1
```

## API Reference

The core binning function is exposed at the package level:

```python
from furax_cs import bin_parameter_map

patch_indices, bin_centers, bin_edges = bin_parameter_map(pixel_values, nbins=50)
```

See the [API documentation](../api/index.md#binning) for the full docstring with a reconstruction example.
8 changes: 7 additions & 1 deletion docs/r_analysis/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ r_analysis <subcommand> [arguments]
| Subcommand | Purpose |
|---|---|
| [`snap`](snap.md) | Compute statistics from result folders and save to `.parquet` files |
| [`bin`](bin.md) | Bin spectral parameters and produce `.npy` patch files for re-running |
| [`plot`](plot.md) | Generate plots from `.parquet` snapshot files |
| [`validate`](validate.md) | Run NLL perturbation analysis on results |
| [`estimate`](estimate.md) | Standalone $r$ estimation from a spectrum or map file |
Expand All @@ -24,6 +25,10 @@ kmeans-model -n 64 -pc 10000 500 500 -m GAL020 -tag c1d1s1
# 2. Compute statistics and save snapshots
r_analysis snap -n 64 -r "kmeans_BD10000" -ird results/ -o snapshots/kmeans_BD10000.parquet

# 2b. (Optional) Bin parameters and re-run with binned patches
r_analysis bin -n 64 -r "kmeans_BD10000" -ird results/ -o binned/ --bin-bd 50 --bin-td 20 --bin-bs 30
kmeans-model -n 64 -c binned/patches_beta_dust.npy binned/patches_temp_dust.npy binned/patches_beta_pl.npy -m GAL020 -tag c1d1s1

# 3. Plot from the snapshots
r_analysis plot --parquet-dir snapshots/ -arc -as -ar

Expand All @@ -33,7 +38,7 @@ r_analysis validate -n 64 -r "kmeans_BD10000" -ird results/ --steps 5

## Common Arguments

The `snap` and `validate` subcommands share these arguments:
The `snap`, `bin`, and `validate` subcommands share these arguments:

| Flag | Type | Default | Description |
|---|---|---|---|
Expand All @@ -50,6 +55,7 @@ The `snap` and `validate` subcommands share these arguments:
:maxdepth: 2

snap
bin
plot
validate
estimate
Expand Down
23 changes: 3 additions & 20 deletions docs/r_analysis/snap.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,9 @@ r_analysis snap -r "kmeans_BD4000" "ptep_BD64" \
by default, the display name is the matched pattern (e.g., `BD4000_TD500_BS500`).
For combined runs, it is recommended to give an explicit name so it easier to match when plotting using [plot](plot.md)

## Post-Clustering Binning

Re-bin spectral parameters after clustering to reduce the number of patches:

```bash
r_analysis snap -r "kmeans_BD10000" -ird results/ \
--bin-bd 500 --bin-td 100 --bin-bs 200 \
-o out.parquet
```

| Flag | Description |
|---|---|
| `--bin-bd` | Number of bins for $\beta_\mathrm{dust}$ |
| `--bin-td` | Number of bins for $T_\mathrm{dust}$ |
| `--bin-bs` | Number of bins for $\beta_\mathrm{synch}$ |

This will recache the systematics using the binned parameter maps and re-compute all statistics.
::::{seealso}
For reducing the number of clusters via post-clustering parameter binning, see [`bin`](bin.md).
::::

## All Arguments

Expand All @@ -118,9 +104,6 @@ This will recache the systematics using the binned parameter maps and re-compute
| `--combine` | flag | `False` | Merge all matched dirs into one entry |
| `--name` | `str` (list) | auto | Display names for run groups |
| `--max-size` | `int` | unlimited | Max entries per parquet file (splits into numbered files) |
| `--bin-bd` | `int` | none | Bins for $\beta_\mathrm{dust}$ post-clustering |
| `--bin-td` | `int` | none | Bins for $T_\mathrm{dust}$ post-clustering |
| `--bin-bs` | `int` | none | Bins for $\beta_\mathrm{synch}$ post-clustering |

Plus all [common arguments](index.md#common-arguments) (`-n`, `-r`, `-ird`, etc.).

Expand Down
Loading
Loading