Skip to content

Releases: cellgeni/nf-scautoqc

26-052

21 Feb 14:59

Choose a tag to compare

This release is mainly an doublet-gating + QC-threshold reporting update, with workflow cleanup and operational improvements.

✅ sctk-aligned workflow updates

  • The doublet-gating sentinel logic was removed from QC/subset flow, and workflow wiring was updated accordingly.
  • find_doublets can now run consistently after the upstream sctk update, so conditional gate-file checks are no longer needed.
  • This same sctk-driven update is also why the Singularity image was bumped to scautoqc-v0.8.0.sif.

📝 QC threshold reporting changes

  • run_qc now writes per-sample tidy threshold summaries as <sample>_qc_thresholds.csv.
  • The threshold output now includes per-metric bounds plus pass statistics (n_pass, n_total, pass_rate), and an all_metrics summary row.
  • pool_all now gathers *_qc_thresholds.csv files and produces a consolidated qc_thresholds.csv.

📊 QC plotting improvement

  • In multires mode, QC UMAP plotting now includes pass_default, qc_cluster, and consensus_passed_qc overlays for clearer interpretation.

Resume / environment updates

  • RESUME helper scripts now export LSB_DEFAULT_USERGROUP=YourGroup for smoother cluster execution.

26-033

02 Feb 15:32

Choose a tag to compare

This update is mainly a robustness + parameter-wiring release, with small but important fixes in input handling and pooling.

✅ Pipeline behaviour changes / fixes

  • --cell_or_nuclei is now properly propagated through the workflow

    • main.nf now passes --cell_or_nuclei into gather_matrices.py.
    • The workflow now uses params.cell_or_nuclei instead of a hard-coded default ('cell'), so the CLI/config value is actually honoured.
    • nextflow.config explicitly includes cell_or_nuclei = "cell" in params (default remains the same, but now consistent and visible).

🧩 Input handling improvements

  • More robust CellBender gather mode support in gather_matrices.py

    • When gather_mode == 'cellbender', the script now checks whether raw_feature_bc_matrix.h5 exists and reads it if present.
    • If the .h5 file isn’t available, it falls back to reading raw_feature_bc_matrix/ in Matrix Market format (read_10x_mtx).
    • This makes the pipeline tolerate different Cell Ranger output layouts without manual intervention.

🧬 Pooling / feature naming fix

  • Improved handling of gene/feature names in pool_all.py

    • If feature names contain underscores, the script now normalises them (by splitting and keeping the first components) and then calls var_names_make_unique().
    • This reduces collisions / downstream issues caused by prefixed or compound feature IDs.

📝 Examples and resume scripts updated

  • Updated example run commands and RESUME stubs.

Full Changelog: v0.7.2...v0.7.3

25-314

10 Nov 23:16

Choose a tag to compare

This release introduces significant changes to the pipeline.
⚠️ Note: h5ad outputs from this version might not be backward‑compatible with previous scAutoQC releases.

  • Added new input method

    • CSV file with sample IDs and absolute paths (see example_input.csv)
    • The CSV may include a cell_or_nuclei column; its value is propagated to the AnnData (ad.uns['cell_or_nuclei']) and is used to select default QC thresholds. This is an explicit annotation from the metadata, not automatic detection.
  • Many improvements in all steps:

    • gather_matrices:
      • Introduced gather modes:
        • starsolo: uses barcodes from STARsolo + Cellbender (original behavior).
        • cellbender: uses Cellbender barcodes only (new default).
      • STARsolo default folder is now "GeneFull" (was "Gene").
      • Velocyto layers are now included whenever available, regardless of using Gene vs GeneFull (previously GeneFull runs could miss these layers).
      • Reads the cell_or_nuclei annotation from the new CSV input and stores it in the output object for downstream QC thresholding, otherwise it assumes all samples as cell.
      • Removed the local read_cellbender shim; the pipeline now relies on the updated sctk implementation which supports current CellBender outputs.
      • Increased default memory allocation (8 GB → 16 GB) and removed unused flags.
    • run_qc:
      • Added QC modes (select via --qc_mode):
        • original: auto‑QC with mitochondrial thresholds (used in Pan‑GI Atlas).
        • multires: auto‑QC with multi‑resolution clustering and consensus selection.
        • combined: runs the multi‑resolution consensus inside the mitochondrial‑threshold loop.
      • Separate default thresholds for single‑cell vs single‑nuc based on the cell_or_nuclei annotation.
      • New option to choose GMM cutoff strategy: gmm_cutoff = inner|outer.
      • Support for custom thresholds via --metrics_csv.
      • Integrated CellTypist models (use --celltypist_model <tissue> or --celltypist_model /path/to/model.pkl).
      • Refactoring with clearer logging and explicit keys stored in AnnData.obs/uns.
    • subset_object:
      • More robust input discovery
      • The output h5ad name now includes the sample ID.
    • pool_all:
      • Now outputs a consolidated qc_thresholds.csv.
      • The pooled h5ad object is kept internal and passed directly to the next step.
    • add_metadata → finalize_qc:
      • Step renamed for clarity.
      • QC filtering applied according to the selected QC mode.
      • Output changes:
        • scautoqc_pooled.h5ad: unfiltered object with all metadata (moved from pool_all).
        • scautoqc_pooled_filtered.h5ad: renamed from scautoqc_pooled_doubletflagged_metaadded.h5ad; contains the filtered dataset (same criteria as before).
    • add_metadata_basic → finalize_qc_basic:
      • Output changes:
        • scautoqc_pooled_basic.h5ad: unfiltered object with all metadata and doublet scores.
    • integrate:
      • New parameter for the number of top genes used in integration (n_top_genes).
      • New from_scautoqc parameter to allow integrating arbitrary input objects (when false, it does not remove cell‑cycle genes or stringent doublets).
  • Updated Dockerfile and Singularity image for improved stability and compatibility.
  • Smarter memory requests.
  • Added tag support so Nextflow now shows which sample each process runs.
  • Improved output file/folder names.
  • Added timestamps to timeline, report and trace files.

25-101

11 Apr 21:47

Choose a tag to compare

This release includes important bug fixes and enhancements to ensure smoother pipeline execution:

  • qc.py adjustments: Now correctly applies QC metrics based on the presence of spliced/unspliced layers.
  • Robust metadata handling: The add_metadata step no longer fails when metadata input is absent.
  • Improved resumability: The add_metadata_basic process now fully supports resume functionality for interrupted runs.

25-091

01 Apr 08:50

Choose a tag to compare

This update introduces a new workflow and multiple enhancements based on user feedback:

  • Added a new workflow:
    • subset workflow: Enables running the pipeline by subsetting objects using predefined cutoffs instead of the automatic QC workflow (steps 2a-3-4-5a). More details are available in the README.
  • Added support for Cell Ranger outputs in addition to STARsolo.
  • Improvements in nextflow pipeline:
    • Updated the Singularity image for better compatibility.
    • Renamed certain output files for clarity.
    • Optimised the RESUME functionality to improve reliability.
    • Introduced smart memory allocation for the pool_all and add_metadata steps based on input size.
    • Optimised resource allocation for other processes.
    • Enabled the pipeline to work seamlessly with symbolic links in the input.
  • Optimisations in scripts:
    • Removed unused lines, characters, and packages for cleaner code.
    • Fixed hardcoded paths to improve flexibility.
    • Optimised memory usage in the pool_all process.

Full Changelog: v0.5.0...v0.6.0

24-143

22 May 10:57

Choose a tag to compare

  • New workflow: only_qc
    • It is now easier to run the pipeline until the pooling step.
    • This can be used to process different sets of samples in different times, then all the outputs from this workflow can be used together with after_qc mode.
  • Improvements and changes in scripts:
    • It is now possible to run the pipeline without Cellbender output, STARsolo output is used only in this case.
    • Removing metadata columns in integration.py was reverted back
    • New RESUME scripts have been added
    • after_qc workflow can now use the samples in the sample list only rather than all the objects in the folder.
    • Reports folder is now named similar to the results folder.
    • Outputting plots in run_qc step now works as expected.
  • Updates in README
    • The workflow diagram has been recreated.
    • The workflow modes have been described with a new figure.
    • Outputs from each step have been described in detail.
    • Text in some steps were revised.

24-088

28 Mar 16:39

Choose a tag to compare

  • Added support for single-nuc samples
  • Improvements in scripts
    • integration.py now removes the columns which were created in previous steps
    • RESUME scripts have been reorganised

Full Changelog: v0.3.0...v0.4.0

24-064

04 Mar 18:49

Choose a tag to compare

  • New workflow: after_qc
    • It is now easier to work with the samples which has been processed with scAutoQC pipeline before.
  • Improvements in nextflow pipeline and python scripts
    • Created new RESUME script for afterqc workflow.
    • integration.py should now work more efficiently.
    • ss_matrix and covar_keys parameters have been removed (they will be considered for the future releases).

Full Changelog: v0.3.0...v0.2.2

24-060

28 Feb 17:57

Choose a tag to compare

  • Bug fixes

24-059

27 Feb 11:14

Choose a tag to compare

  • Fix typos in README and qc.py