Skip to content

Add tile-cut stitching follow-up to calculate_tiling_qc#1170

Open
timtreis wants to merge 7 commits into
feature/tiling-qc-v2from
feature/tiling-stitch
Open

Add tile-cut stitching follow-up to calculate_tiling_qc#1170
timtreis wants to merge 7 commits into
feature/tiling-qc-v2from
feature/tiling-stitch

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented May 8, 2026

Summary

Builds on #1157. Adds a follow-up pass that recovers cells which segmentation tiling broke into 2-4 pieces by detecting facing cut edges across tile boundaries and assigning each candidate pair a transparent geometric score. Worst case (4-tile corner) is handled.

Two public functions:

  • squidpy.experimental.tl.stitch_tile_cuts(sdata, labels_key, ...) -- reads is_outlier=True cells from the QC table, extracts cut-edge candidates via bbox-edge alignment, scores each pair with a transparent geometric composite (see below), and assembles confident pairs into 2-4-piece groups via union-find with corner-junction validation. Writes 4 .obs columns to the existing QC table -- stitch_group_id, is_stitched, n_pieces, stitch_confidence -- plus a .uns["tiling_stitch"] audit trail (params, score formula, run summary). The labels element is never mutated.
  • squidpy.experimental.im.make_stitched_labels(sdata, labels_key, ..., merge_strategy="sum", inplace=True) -- opt-in materialisation of a stitched labels element via a lazy dask LUT, plus a collapsed AnnData with one row per unique stitch_group_id (unstitched cells pass through unchanged, stitched groups collapse). Numeric .obs columns and .X aggregate via merge_strategy (sum/min/max/mean/median/first or callable); group-invariant + non-numeric columns always take "first". Preserves .uns, .var, and any user-added obs columns.

calculate_tiling_qc re-runs now warn and drop stale stitch columns when the QC table is overwritten.

How stitch_confidence is computed

For each candidate pair the four geometric / shape-quality features below are averaged into a single score in [0, 1]. No coefficients are fitted or shipped -- the formula is the entire model and is recorded in .uns["tiling_stitch"]["score_formula"].

feature what it captures range
iou 1-D intersection-over-union of the two cut-edge extents along the boundary [0, 1]
endpoint_match how closely the chord endpoints coincide -- true cuts share endpoints, unrelated cells don't [0, 1]
merge_compactness 4*pi*A / P^2 of the union mask after morphologically closing the seam gap. Real cells are reasonably compact; false merges produce weird perimeters [0, 1]
merge_solidity union mask area / convex hull area. Real cells are convex-ish; false merges have concave joins [0, 1]
stitch_confidence = (iou + endpoint_match + merge_compactness + merge_solidity) / 4

A gap_score is also computed but only used as a hard filter (already inside max_gap by construction); it does not enter the score. The two merge_* features are computed by materialising a tight crop around the union of the candidate pieces, closing the gap with a disk(3) structuring element, and running regionprops on the largest connected component.

min_confidence is a threshold on this mean; 0.7 is the default starting point. Tune for your data -- the score is heuristic, dataset-independent, and not a calibrated probability. Review false positives / negatives via make_stitched_labels and the visual test fixture, then adjust.

timtreis and others added 6 commits May 8, 2026 19:20
Adds two public functions building on the tile-boundary QC outliers:

- squidpy.experimental.tl.stitch_tile_cuts: pairs facing cut edges across
  tile boundaries (bbox-edge alignment + IoU + endpoint match), scores
  each pair via a frozen L2 logistic regression on geometric +
  shape-quality features (merge_solidity, merge_compactness), and
  assembles confident pairs into 2-4-piece groups via union-find with
  corner-junction validation. Writes 4 .obs columns to the existing QC
  table (stitch_group_id, is_stitched, n_pieces, stitch_confidence) plus
  a .uns['tiling_stitch'] audit trail. Labels element is never modified.

- squidpy.experimental.im.make_stitched_labels: opt-in materialisation of
  a stitched labels element via a lazy dask LUT, plus a collapsed
  AnnData with one row per unique stitch_group_id. Numeric .obs columns
  and .X aggregate via merge_strategy (sum/min/max/mean/median/first or
  callable, default sum); group-invariant columns and non-numerics use
  first. Preserves the QC table's .uns and any user-added .obs columns.

Also:
- calculate_tiling_qc now warns and drops stale stitch columns when the
  QC table is overwritten on re-run.
- Frozen logistic-regression coefficients trained on 2197 synthetic
  pairs across 50 scenarios; 5-fold CV Brier 0.025; cross-scenario
  precision 0.93+ at threshold 0.9 on held-out dense data.

Tests: 24 new unit tests covering cut-edge contracts, .obs/.uns/X
preservation, merge strategies (str + callable), corner-junction
validation, group-invariant column handling, idempotency, error paths,
and end-to-end QC->stitch->remap flow on the existing fixture.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n, multi-scale

- Extract resolve_labels_array as a shared helper in _tiling_qc.py; both
  stitch_tile_cuts and make_stitched_labels import it instead of carrying
  near-duplicate inline copies.
- Add return type annotations on stitch_tile_cuts (-> AnnData | None) and
  make_stitched_labels (-> dict | None).
- Drop the plans/ reference in stitch_tile_cuts comment; replace with a
  self-contained explanation of the three .obs states.
- Hoist group_sizes definition out of its conditional so the later
  reference is unconditionally defined (was relying on short-circuit).
- Convert n_pieces_distribution dict keys to str so .uns round-trips
  cleanly through zarr.
- Vectorise _aggregate_X for built-in strategies (sum/min/max/mean/median/
  first) using axis=0 numpy reductions; callable strategies still go
  through the per-column pd.Series fallback.
- Validate label_id / stitch_group_id fit in the labels' integer dtype in
  _build_lookup; raise ValueError instead of silent truncation.
- Document group-invariant column handling in make_stitched_labels'
  public docstring.
- Warn when QC-flagged outlier label_ids are missing from the labels
  element (previously silent skip).
- Add inplace=False to make_stitched_labels: returns
  {"labels": ..., "table": ...} without mutating sdata.
- TODO note in _compute_outlier_bboxes for the
  pre-mask-with-isin optimisation when outliers are sparse.

Tests: add multi-scale unit + end-to-end coverage for resolve_labels_array
and the QC -> stitch -> make_stitched_labels chain via Labels2DModel.parse
with scale_factors=[2]. Add inplace=False tests for make_stitched_labels.
76 tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Side-by-side panels of a hardcoded 100x100 px crop centred on the first
horizontal tile seam (y=200) of the existing tile-boundary fixture.
Cells share a stable random-colour palette across panels, so split cells
appear as two different colours in "Before" and unify into one colour in
"After". Dashed white line marks the seam.

The baseline lives at tests/_images/StitchVisual_seam_before_after.png
and is downloaded from CI artifacts (per project convention). The test
fails locally without it; passes once the baseline is in place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous version baked in logistic-regression weights fit on synthetic
disks. Wrong contract: those weights claim a calibration they can't
honour on real data, and we don't want squidpy to ship a model that
silently encodes a synthetic distribution.

Replace with an explicit formula:

    stitch_confidence = mean(iou, endpoint_match, merge_compactness, merge_solidity)

All four features are dataset-independent geometry / shape signals in
[0, 1].  No fitting, no shipped weights.  Default min_confidence drops
from 0.9 to 0.7 to match the new score's distribution; users tune for
their data.

.uns["tiling_stitch"] now records score_features + score_formula
instead of model_version / model_coefficients / model_intercept.

Drop plans/prototype_tiling_stitch.py (untracked scratch script that
trained the now-removed coefficients).

Tests updated; 76 pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ater)

Locally rendered placeholder for TestStitchVisual::test_plot_seam_before_after.
The repo convention is platform-correct baselines downloaded from CI
visual_test_results artifacts; this branch can't get one until either
#1157 merges to main or test.yaml grows a workflow_dispatch trigger.
Once CI runs against this branch, overwrite this PNG with the artifact
version.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…onents

By default make_stitched_labels remaps both pieces of a stitched cell to
the same ID, leaving the cut stripe between them at 0 (background) -- so
the result is a single label across multiple disconnected components.
Some downstream tools (naive contour walks, polygon exporters) expect
one-label-one-component and miscount.

Add join_labels=False (default) for the existing behaviour, join_labels=True
to fill the gap.  When True, single-pass regionprops finds each stitched
group's bbox; binary_closing(disk(close_radius)) on the group mask;
newly-closed pixels are written back only when they were 0 (background)
so other cells are never overwritten.  Forces materialisation of the
labels array; cost is bounded by stitched-group bbox count.

Tests: connected-component count is >1 for some group when join=False
and exactly 1 for every group when join=True; non-stitched cells'
pixels are byte-identical before/after joining.

Visual: TestStitchVisual::test_plot_seam_join_labels -- side-by-side
zoom showing the seam stripe filled when join_labels=True.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@timtreis timtreis requested a review from selmanozleyen May 12, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant