Skip to content

v0.7.0

Latest

Choose a tag to compare

@katosh katosh released this 13 Apr 22:41
· 5 commits to main since this release
5827021

[0.7.0] - 2026-04-13

Breaking changes

  • Drop Python 3.9 support: kompot now requires Python ≥ 3.10 (driven by mellon ≥ 1.7.0 dependency).

New simplified API

  • kompot.de(), kompot.da(), and kompot.smooth_expression() now use Settings dataclasses (GPSettings, FDRSettings, FilterSettings, StorageSettings, OutputSettings) so the common case stays simple while advanced options remain discoverable. The old compute_differential_* and compute_smoothed_expression() functions still work but emit a deprecation warning.
  • dry_run=True on de() prints a resource plan (memory, disk, field overwrites) without running the analysis. Replaces the standalone dry_run_differential_expression().
  • ModelSettings lets you inject pre-fitted predictors into de(), da(), and smooth_expression() to skip fitting or reuse models across runs.

New features

  • Null distribution inspection: return_full_results=True now includes a "null" key in the result dict exposing all null gene data: Mahalanobis distances, smoothed expression, fold changes, z-scores, and standard deviations. A lightweight alternative (OutputSettings(return_null_data=True)) returns only the summary table and metadata (gene indices, names, seed, provenance) without the full expression matrices.
  • External null distributions for FDR: supply your own null distribution instead of relying on column-shuffled null genes.
    • FDRSettings(null_mahalanobis=...): pre-computed null Mahalanobis distances (e.g., from a control-vs-control run).
    • FDRSettings(null_expression=(expr1, expr2)): raw null expression matrices fitted through the same GP model.
    • FDRSettings(combine_with_internal=True): concatenate external and internal null distributions.
  • kompot.compute_fdr(real_mahal, null_mahal): standalone FDR computation from Mahalanobis distances (no AnnData needed). Returns a DataFrame with mahalanobis, pvalue, local_fdr, tail_fdr, is_de.
  • kompot.extract_null_distribution(adata): extract Mahalanobis distances from a DE run for reuse as a null distribution elsewhere.
  • kompot.recompute_fdr(adata, null_mahalanobis): recompute FDR on existing DE results with a new null distribution, updating adata.var in place.
  • DifferentialExpression.compute_fdr(null_mahal): sklearn-like method to compute FDR after predict(compute_mahalanobis=True).
  • Empirical variance (GPSettings(use_empirical_variance=True)): estimates per-gene heteroscedastic noise from GP residuals and adjusts Mahalanobis distances accordingly. Works with or without biological replicates.
  • CenteredLinear kernel for better extrapolation at cell-state boundaries (opt-in via cov_func; default remains Matern52).
  • More accurate uncertainty: density estimators now use mellon 1.7.1's default Laplacian optimizer instead of ADVI.

Run history and reproducibility

  • Run parameters are now stored grouped by Settings dataclass, making them directly reconstructible.
  • RunInfo.call_args() returns a kwargs dict that reproduces the run — edit it and pass to de()/da() to re-run with tweaked parameters.
  • RunInfo.to_settings() returns the Settings objects from a previous run for inspection.

Improvements

  • Input validation at construction time: all Settings dataclasses now validate fields in __post_init__. Invalid values like GPSettings(sigma=-1) or FDRSettings(threshold=1.5) raise immediately with a clear message instead of failing deep inside mellon or JAX. The public API functions (de(), da(), smooth_expression()) also validate AnnData inputs upfront (obsm key shape, condition existence, condition1 != condition2, gene names, landmarks dimensions).
  • Plotting functions return Optional[plt.Figure] (controlled by return_fig) instead of (fig, ax) tuples, and no longer call plt.show().
  • Consistent parameter naming across plot functions: background_color_keycolor, de_columndirection_column, embedding_keybasis.
  • RunInfo HTML display now shows parameters hierarchically by Settings group (gp.sigma, fdr.threshold, …) instead of a flat list.
  • RunComparison shows individual changed fields (e.g. gp.ls_factor: 10.0 → 5.0) instead of opaque dict diffs.
  • kompot smooth CLI command for single-condition GP smoothing from the command line, matching the full Python API (condition selection, gene subsetting, empirical variance, sample variance).
  • --no-progress flag added to the DA CLI; progress bars can now be fully suppressed in both DA and DE.
  • DA CLI now exposes --store-arrays-on-disk, --disk-storage-dir, and --max-memory-ratio, matching the DE CLI's StorageSettings coverage.
  • FDR is disabled by default when sample_col is provided (not yet calibrated for sample variance). Override with FDRSettings(null_genes=...).
  • Remove statsmodels dependency.

Bug fixes

  • Restore shared-landmark precomputation in DE (requires mellon ≥ 1.7.1). Mellon's compute_landmarks had a silent string-vs-enum bug where gp_type="fixed" did not match GaussianProcessType.FIXED, causing the function to return None instead of the documented fall-through. Kompot's shared-landmark precomputation in DifferentialExpression.fit() and the per-condition fallback in ExpressionModel.fit() both routed through this code path, so on every DE call kompot was silently dropping the cross-condition shared landmark grid (each condition ended up with an independent full GP) and ignoring the user-supplied random_state for landmark selection (mellon's internal _compute_landmarks fell back to the hardcoded DEFAULT_RANDOM_SEED=42). Pinning mellon>=1.7.1 enables the fix transparently — no kompot code changes were required.
  • Shared landmarks across conditions in DA. DifferentialAbundance.fit() now passes gp_type="fixed" to compute_landmarks and forwards gp_type="fixed" to the per-condition DensityEstimators. Previously, when either condition had fewer cells than n_landmarks, mellon's auto-selection fell back to gp_type=FULL for that estimator, silently discarding the shared-landmark grid that DA had just computed on the combined data — the two density predictors then used independent full GPs, breaking the symmetry assumption behind the Mahalanobis-style abundance comparison. This brings DA into structural parity with DE.
  • Fix local FDR numerical instability (Grenander estimator replaces statsmodels Poisson GLM).
  • Fix tail FDR: replace Benjamini-Hochberg on empirical p-values (which breaks when n_null << n_genes) with fdrtool-style survival function ratio Fdr(d) = S_null(d) / S_mix(d).
  • Fix cell_filter docs: parameter includes matching cells, not excludes.
  • Fix missing field_mapping in DA run history: append_to_run_history was called before field_mapping was computed, so DA history entries never recorded which fields were written.