Releases · cellgeni/nf-scautoqc

21 Feb 14:59

cakirb

v0.8.0

0d1968c

26-052 Latest

Latest

This release is mainly an doublet-gating + QC-threshold reporting update, with workflow cleanup and operational improvements.

✅ sctk-aligned workflow updates

The doublet-gating sentinel logic was removed from QC/subset flow, and workflow wiring was updated accordingly.
find_doublets can now run consistently after the upstream sctk update, so conditional gate-file checks are no longer needed.
This same sctk-driven update is also why the Singularity image was bumped to scautoqc-v0.8.0.sif.

📝 QC threshold reporting changes

run_qc now writes per-sample tidy threshold summaries as <sample>_qc_thresholds.csv.
The threshold output now includes per-metric bounds plus pass statistics (n_pass, n_total, pass_rate), and an all_metrics summary row.
pool_all now gathers *_qc_thresholds.csv files and produces a consolidated qc_thresholds.csv.

📊 QC plotting improvement

In multires mode, QC UMAP plotting now includes pass_default, qc_cluster, and consensus_passed_qc overlays for clearer interpretation.

Resume / environment updates

RESUME helper scripts now export LSB_DEFAULT_USERGROUP=YourGroup for smoother cluster execution.

Assets 2

02 Feb 15:32

cakirb

v0.7.3

3b02ffa

26-033

This update is mainly a robustness + parameter-wiring release, with small but important fixes in input handling and pooling.

✅ Pipeline behaviour changes / fixes

--cell_or_nuclei is now properly propagated through the workflow
- main.nf now passes --cell_or_nuclei into gather_matrices.py.
- The workflow now uses params.cell_or_nuclei instead of a hard-coded default ('cell'), so the CLI/config value is actually honoured.
- nextflow.config explicitly includes cell_or_nuclei = "cell" in params (default remains the same, but now consistent and visible).

🧩 Input handling improvements

More robust CellBender gather mode support in gather_matrices.py
- When gather_mode == 'cellbender', the script now checks whether raw_feature_bc_matrix.h5 exists and reads it if present.
- If the .h5 file isn’t available, it falls back to reading raw_feature_bc_matrix/ in Matrix Market format (read_10x_mtx).
- This makes the pipeline tolerate different Cell Ranger output layouts without manual intervention.

🧬 Pooling / feature naming fix

Improved handling of gene/feature names in pool_all.py
- If feature names contain underscores, the script now normalises them (by splitting and keeping the first components) and then calls var_names_make_unique().
- This reduces collisions / downstream issues caused by prefixed or compound feature IDs.

📝 Examples and resume scripts updated

Updated example run commands and RESUME stubs.

Full Changelog: v0.7.2...v0.7.3

Assets 2

10 Nov 23:16

cakirb

v0.7.2

d0a2e5b

25-314

This release introduces significant changes to the pipeline.
⚠️ Note: h5ad outputs from this version might not be backward‑compatible with previous scAutoQC releases.

Added new input method
- CSV file with sample IDs and absolute paths (see example_input.csv)
- The CSV may include a cell_or_nuclei column; its value is propagated to the AnnData (ad.uns['cell_or_nuclei']) and is used to select default QC thresholds. This is an explicit annotation from the metadata, not automatic detection.
Many improvements in all steps:
- gather_matrices:
  - Introduced gather modes:
    - starsolo: uses barcodes from STARsolo + Cellbender (original behavior).
    - cellbender: uses Cellbender barcodes only (new default).
  - STARsolo default folder is now "GeneFull" (was "Gene").
  - Velocyto layers are now included whenever available, regardless of using Gene vs GeneFull (previously GeneFull runs could miss these layers).
  - Reads the cell_or_nuclei annotation from the new CSV input and stores it in the output object for downstream QC thresholding, otherwise it assumes all samples as cell.
  - Removed the local read_cellbender shim; the pipeline now relies on the updated sctk implementation which supports current CellBender outputs.
  - Increased default memory allocation (8 GB → 16 GB) and removed unused flags.
- run_qc:
  - Added QC modes (select via --qc_mode):
    - original: auto‑QC with mitochondrial thresholds (used in Pan‑GI Atlas).
    - multires: auto‑QC with multi‑resolution clustering and consensus selection.
    - combined: runs the multi‑resolution consensus inside the mitochondrial‑threshold loop.
  - Separate default thresholds for single‑cell vs single‑nuc based on the cell_or_nuclei annotation.
  - New option to choose GMM cutoff strategy: gmm_cutoff = inner|outer.
  - Support for custom thresholds via --metrics_csv.
  - Integrated CellTypist models (use --celltypist_model <tissue> or --celltypist_model /path/to/model.pkl).
  - Refactoring with clearer logging and explicit keys stored in AnnData.obs/uns.
- subset_object:
  - More robust input discovery
  - The output h5ad name now includes the sample ID.
- pool_all:
  - Now outputs a consolidated qc_thresholds.csv.
  - The pooled h5ad object is kept internal and passed directly to the next step.
- add_metadata → finalize_qc:
  - Step renamed for clarity.
  - QC filtering applied according to the selected QC mode.
  - Output changes:
    - scautoqc_pooled.h5ad: unfiltered object with all metadata (moved from pool_all).
    - scautoqc_pooled_filtered.h5ad: renamed from scautoqc_pooled_doubletflagged_metaadded.h5ad; contains the filtered dataset (same criteria as before).
- add_metadata_basic → finalize_qc_basic:
  - Output changes:
    - scautoqc_pooled_basic.h5ad: unfiltered object with all metadata and doublet scores.
- integrate:
  - New parameter for the number of top genes used in integration (n_top_genes).
  - New from_scautoqc parameter to allow integrating arbitrary input objects (when false, it does not remove cell‑cycle genes or stringent doublets).

Updated Dockerfile and Singularity image for improved stability and compatibility.
Smarter memory requests.
Added tag support so Nextflow now shows which sample each process runs.
Improved output file/folder names.
Added timestamps to timeline, report and trace files.

Assets 2

11 Apr 21:47

cakirb

v0.6.1

bae014a

25-101

This release includes important bug fixes and enhancements to ensure smoother pipeline execution:

qc.py adjustments: Now correctly applies QC metrics based on the presence of spliced/unspliced layers.
Robust metadata handling: The add_metadata step no longer fails when metadata input is absent.
Improved resumability: The add_metadata_basic process now fully supports resume functionality for interrupted runs.

Assets 2

01 Apr 08:50

cakirb

v0.6.0

e335192

25-091

This update introduces a new workflow and multiple enhancements based on user feedback:

Added a new workflow:
- subset workflow: Enables running the pipeline by subsetting objects using predefined cutoffs instead of the automatic QC workflow (steps 2a-3-4-5a). More details are available in the README.
Added support for Cell Ranger outputs in addition to STARsolo.
Improvements in nextflow pipeline:
- Updated the Singularity image for better compatibility.
- Renamed certain output files for clarity.
- Optimised the RESUME functionality to improve reliability.
- Introduced smart memory allocation for the pool_all and add_metadata steps based on input size.
- Optimised resource allocation for other processes.
- Enabled the pipeline to work seamlessly with symbolic links in the input.
Optimisations in scripts:
- Removed unused lines, characters, and packages for cleaner code.
- Fixed hardcoded paths to improve flexibility.
- Optimised memory usage in the pool_all process.

Full Changelog: v0.5.0...v0.6.0

Assets 2

22 May 10:57

cakirb

v0.5.0

b1b83d7

24-143

New workflow: only_qc
- It is now easier to run the pipeline until the pooling step.
- This can be used to process different sets of samples in different times, then all the outputs from this workflow can be used together with after_qc mode.
Improvements and changes in scripts:
- It is now possible to run the pipeline without Cellbender output, STARsolo output is used only in this case.
- Removing metadata columns in integration.py was reverted back
- New RESUME scripts have been added
- after_qc workflow can now use the samples in the sample list only rather than all the objects in the folder.
- Reports folder is now named similar to the results folder.
- Outputting plots in run_qc step now works as expected.
Updates in README
- The workflow diagram has been recreated.
- The workflow modes have been described with a new figure.
- Outputs from each step have been described in detail.
- Text in some steps were revised.

Assets 2

28 Mar 16:39

cakirb

v0.4.0

6a45e72

24-088

Added support for single-nuc samples
Improvements in scripts
- integration.py now removes the columns which were created in previous steps
- RESUME scripts have been reorganised

Full Changelog: v0.3.0...v0.4.0

Assets 2

04 Mar 18:49

cakirb

v0.3.0

17a5445

24-064

New workflow: after_qc
- It is now easier to work with the samples which has been processed with scAutoQC pipeline before.
Improvements in nextflow pipeline and python scripts
- Created new RESUME script for afterqc workflow.
- integration.py should now work more efficiently.
- ss_matrix and covar_keys parameters have been removed (they will be considered for the future releases).

Full Changelog: v0.3.0...v0.2.2

Assets 2

28 Feb 17:57

cakirb

v0.2.2

36e068c

24-060

Bug fixes

Assets 2

27 Feb 11:14

cakirb

v0.2.1

888df74

24-059

Fix typos in README and qc.py

Assets 2

Releases: cellgeni/nf-scautoqc

26-052

✅ sctk-aligned workflow updates

📝 QC threshold reporting changes

📊 QC plotting improvement

Resume / environment updates

Uh oh!

26-033

✅ Pipeline behaviour changes / fixes

🧩 Input handling improvements

🧬 Pooling / feature naming fix

📝 Examples and resume scripts updated

Uh oh!

25-314

Uh oh!

25-101

Uh oh!

25-091

Uh oh!

24-143

Uh oh!

24-088

Uh oh!

24-064

Uh oh!

24-060

Uh oh!

24-059

Uh oh!