Add tile-cut stitching follow-up to calculate_tiling_qc#1170
Open
timtreis wants to merge 7 commits into
Open
Conversation
Adds two public functions building on the tile-boundary QC outliers: - squidpy.experimental.tl.stitch_tile_cuts: pairs facing cut edges across tile boundaries (bbox-edge alignment + IoU + endpoint match), scores each pair via a frozen L2 logistic regression on geometric + shape-quality features (merge_solidity, merge_compactness), and assembles confident pairs into 2-4-piece groups via union-find with corner-junction validation. Writes 4 .obs columns to the existing QC table (stitch_group_id, is_stitched, n_pieces, stitch_confidence) plus a .uns['tiling_stitch'] audit trail. Labels element is never modified. - squidpy.experimental.im.make_stitched_labels: opt-in materialisation of a stitched labels element via a lazy dask LUT, plus a collapsed AnnData with one row per unique stitch_group_id. Numeric .obs columns and .X aggregate via merge_strategy (sum/min/max/mean/median/first or callable, default sum); group-invariant columns and non-numerics use first. Preserves the QC table's .uns and any user-added .obs columns. Also: - calculate_tiling_qc now warns and drops stale stitch columns when the QC table is overwritten on re-run. - Frozen logistic-regression coefficients trained on 2197 synthetic pairs across 50 scenarios; 5-fold CV Brier 0.025; cross-scenario precision 0.93+ at threshold 0.9 on held-out dense data. Tests: 24 new unit tests covering cut-edge contracts, .obs/.uns/X preservation, merge strategies (str + callable), corner-junction validation, group-invariant column handling, idempotency, error paths, and end-to-end QC->stitch->remap flow on the existing fixture. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n, multi-scale
- Extract resolve_labels_array as a shared helper in _tiling_qc.py; both
stitch_tile_cuts and make_stitched_labels import it instead of carrying
near-duplicate inline copies.
- Add return type annotations on stitch_tile_cuts (-> AnnData | None) and
make_stitched_labels (-> dict | None).
- Drop the plans/ reference in stitch_tile_cuts comment; replace with a
self-contained explanation of the three .obs states.
- Hoist group_sizes definition out of its conditional so the later
reference is unconditionally defined (was relying on short-circuit).
- Convert n_pieces_distribution dict keys to str so .uns round-trips
cleanly through zarr.
- Vectorise _aggregate_X for built-in strategies (sum/min/max/mean/median/
first) using axis=0 numpy reductions; callable strategies still go
through the per-column pd.Series fallback.
- Validate label_id / stitch_group_id fit in the labels' integer dtype in
_build_lookup; raise ValueError instead of silent truncation.
- Document group-invariant column handling in make_stitched_labels'
public docstring.
- Warn when QC-flagged outlier label_ids are missing from the labels
element (previously silent skip).
- Add inplace=False to make_stitched_labels: returns
{"labels": ..., "table": ...} without mutating sdata.
- TODO note in _compute_outlier_bboxes for the
pre-mask-with-isin optimisation when outliers are sparse.
Tests: add multi-scale unit + end-to-end coverage for resolve_labels_array
and the QC -> stitch -> make_stitched_labels chain via Labels2DModel.parse
with scale_factors=[2]. Add inplace=False tests for make_stitched_labels.
76 tests pass; ruff clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Side-by-side panels of a hardcoded 100x100 px crop centred on the first horizontal tile seam (y=200) of the existing tile-boundary fixture. Cells share a stable random-colour palette across panels, so split cells appear as two different colours in "Before" and unify into one colour in "After". Dashed white line marks the seam. The baseline lives at tests/_images/StitchVisual_seam_before_after.png and is downloaded from CI artifacts (per project convention). The test fails locally without it; passes once the baseline is in place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous version baked in logistic-regression weights fit on synthetic
disks. Wrong contract: those weights claim a calibration they can't
honour on real data, and we don't want squidpy to ship a model that
silently encodes a synthetic distribution.
Replace with an explicit formula:
stitch_confidence = mean(iou, endpoint_match, merge_compactness, merge_solidity)
All four features are dataset-independent geometry / shape signals in
[0, 1]. No fitting, no shipped weights. Default min_confidence drops
from 0.9 to 0.7 to match the new score's distribution; users tune for
their data.
.uns["tiling_stitch"] now records score_features + score_formula
instead of model_version / model_coefficients / model_intercept.
Drop plans/prototype_tiling_stitch.py (untracked scratch script that
trained the now-removed coefficients).
Tests updated; 76 pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ater) Locally rendered placeholder for TestStitchVisual::test_plot_seam_before_after. The repo convention is platform-correct baselines downloaded from CI visual_test_results artifacts; this branch can't get one until either #1157 merges to main or test.yaml grows a workflow_dispatch trigger. Once CI runs against this branch, overwrite this PNG with the artifact version. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…onents By default make_stitched_labels remaps both pieces of a stitched cell to the same ID, leaving the cut stripe between them at 0 (background) -- so the result is a single label across multiple disconnected components. Some downstream tools (naive contour walks, polygon exporters) expect one-label-one-component and miscount. Add join_labels=False (default) for the existing behaviour, join_labels=True to fill the gap. When True, single-pass regionprops finds each stitched group's bbox; binary_closing(disk(close_radius)) on the group mask; newly-closed pixels are written back only when they were 0 (background) so other cells are never overwritten. Forces materialisation of the labels array; cost is bounded by stitched-group bbox count. Tests: connected-component count is >1 for some group when join=False and exactly 1 for every group when join=True; non-stitched cells' pixels are byte-identical before/after joining. Visual: TestStitchVisual::test_plot_seam_join_labels -- side-by-side zoom showing the seam stripe filled when join_labels=True. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on #1157. Adds a follow-up pass that recovers cells which segmentation tiling broke into 2-4 pieces by detecting facing cut edges across tile boundaries and assigning each candidate pair a transparent geometric score. Worst case (4-tile corner) is handled.
Two public functions:
squidpy.experimental.tl.stitch_tile_cuts(sdata, labels_key, ...)-- readsis_outlier=Truecells from the QC table, extracts cut-edge candidates via bbox-edge alignment, scores each pair with a transparent geometric composite (see below), and assembles confident pairs into 2-4-piece groups via union-find with corner-junction validation. Writes 4.obscolumns to the existing QC table --stitch_group_id,is_stitched,n_pieces,stitch_confidence-- plus a.uns["tiling_stitch"]audit trail (params, score formula, run summary). The labels element is never mutated.squidpy.experimental.im.make_stitched_labels(sdata, labels_key, ..., merge_strategy="sum", inplace=True)-- opt-in materialisation of a stitched labels element via a lazy dask LUT, plus a collapsed AnnData with one row per uniquestitch_group_id(unstitched cells pass through unchanged, stitched groups collapse). Numeric.obscolumns and.Xaggregate viamerge_strategy(sum/min/max/mean/median/first or callable); group-invariant + non-numeric columns always take "first". Preserves.uns,.var, and any user-added obs columns.calculate_tiling_qcre-runs now warn and drop stale stitch columns when the QC table is overwritten.How
stitch_confidenceis computedFor each candidate pair the four geometric / shape-quality features below are averaged into a single score in
[0, 1]. No coefficients are fitted or shipped -- the formula is the entire model and is recorded in.uns["tiling_stitch"]["score_formula"].iouendpoint_matchmerge_compactness4*pi*A / P^2of the union mask after morphologically closing the seam gap. Real cells are reasonably compact; false merges produce weird perimetersmerge_solidityA
gap_scoreis also computed but only used as a hard filter (already insidemax_gapby construction); it does not enter the score. The twomerge_*features are computed by materialising a tight crop around the union of the candidate pieces, closing the gap with adisk(3)structuring element, and runningregionpropson the largest connected component.min_confidenceis a threshold on this mean;0.7is the default starting point. Tune for your data -- the score is heuristic, dataset-independent, and not a calibrated probability. Review false positives / negatives viamake_stitched_labelsand the visual test fixture, then adjust.