Releases: cellgeni/nf-scautoqc
26-052
This release is mainly an doublet-gating + QC-threshold reporting update, with workflow cleanup and operational improvements.
✅ sctk-aligned workflow updates
- The doublet-gating sentinel logic was removed from QC/subset flow, and workflow wiring was updated accordingly.
find_doubletscan now run consistently after the upstream sctk update, so conditional gate-file checks are no longer needed.- This same sctk-driven update is also why the Singularity image was bumped to
scautoqc-v0.8.0.sif.
📝 QC threshold reporting changes
run_qcnow writes per-sample tidy threshold summaries as<sample>_qc_thresholds.csv.- The threshold output now includes per-metric bounds plus pass statistics (
n_pass,n_total,pass_rate), and anall_metricssummary row. pool_allnow gathers*_qc_thresholds.csvfiles and produces a consolidatedqc_thresholds.csv.
📊 QC plotting improvement
- In multires mode, QC UMAP plotting now includes
pass_default,qc_cluster, andconsensus_passed_qcoverlays for clearer interpretation.
Resume / environment updates
- RESUME helper scripts now export
LSB_DEFAULT_USERGROUP=YourGroupfor smoother cluster execution.
26-033
This update is mainly a robustness + parameter-wiring release, with small but important fixes in input handling and pooling.
✅ Pipeline behaviour changes / fixes
-
--cell_or_nucleiis now properly propagated through the workflowmain.nfnow passes--cell_or_nucleiintogather_matrices.py.- The workflow now uses
params.cell_or_nucleiinstead of a hard-coded default ('cell'), so the CLI/config value is actually honoured. nextflow.configexplicitly includescell_or_nuclei = "cell"inparams(default remains the same, but now consistent and visible).
🧩 Input handling improvements
-
More robust CellBender gather mode support in
gather_matrices.py- When
gather_mode == 'cellbender', the script now checks whetherraw_feature_bc_matrix.h5exists and reads it if present. - If the
.h5file isn’t available, it falls back to readingraw_feature_bc_matrix/in Matrix Market format (read_10x_mtx). - This makes the pipeline tolerate different Cell Ranger output layouts without manual intervention.
- When
🧬 Pooling / feature naming fix
-
Improved handling of gene/feature names in
pool_all.py- If feature names contain underscores, the script now normalises them (by splitting and keeping the first components) and then calls
var_names_make_unique(). - This reduces collisions / downstream issues caused by prefixed or compound feature IDs.
- If feature names contain underscores, the script now normalises them (by splitting and keeping the first components) and then calls
📝 Examples and resume scripts updated
- Updated example run commands and RESUME stubs.
Full Changelog: v0.7.2...v0.7.3
25-314
This release introduces significant changes to the pipeline.
-
Added new input method
- CSV file with sample IDs and absolute paths (see
example_input.csv) - The CSV may include a
cell_or_nucleicolumn; its value is propagated to the AnnData (ad.uns['cell_or_nuclei']) and is used to select default QC thresholds. This is an explicit annotation from the metadata, not automatic detection.
- CSV file with sample IDs and absolute paths (see
-
Many improvements in all steps:
- gather_matrices:
- Introduced gather modes:
starsolo: uses barcodes from STARsolo + Cellbender (original behavior).cellbender: uses Cellbender barcodes only (new default).
- STARsolo default folder is now
"GeneFull"(was"Gene"). - Velocyto layers are now included whenever available, regardless of using Gene vs GeneFull (previously GeneFull runs could miss these layers).
- Reads the
cell_or_nucleiannotation from the new CSV input and stores it in the output object for downstream QC thresholding, otherwise it assumes all samples ascell. - Removed the local
read_cellbendershim; the pipeline now relies on the updatedsctkimplementation which supports current CellBender outputs. - Increased default memory allocation (8 GB → 16 GB) and removed unused flags.
- Introduced gather modes:
- run_qc:
- Added QC modes (select via
--qc_mode):original: auto‑QC with mitochondrial thresholds (used in Pan‑GI Atlas).multires: auto‑QC with multi‑resolution clustering and consensus selection.combined: runs the multi‑resolution consensus inside the mitochondrial‑threshold loop.
- Separate default thresholds for single‑cell vs single‑nuc based on the
cell_or_nucleiannotation. - New option to choose GMM cutoff strategy:
gmm_cutoff = inner|outer. - Support for custom thresholds via
--metrics_csv. - Integrated CellTypist models (use
--celltypist_model <tissue>or--celltypist_model /path/to/model.pkl). - Refactoring with clearer logging and explicit keys stored in
AnnData.obs/uns.
- Added QC modes (select via
- subset_object:
- More robust input discovery
- The output h5ad name now includes the sample ID.
- pool_all:
- Now outputs a consolidated
qc_thresholds.csv. - The pooled h5ad object is kept internal and passed directly to the next step.
- Now outputs a consolidated
- add_metadata → finalize_qc:
- Step renamed for clarity.
- QC filtering applied according to the selected QC mode.
- Output changes:
scautoqc_pooled.h5ad: unfiltered object with all metadata (moved frompool_all).scautoqc_pooled_filtered.h5ad: renamed fromscautoqc_pooled_doubletflagged_metaadded.h5ad; contains the filtered dataset (same criteria as before).
- add_metadata_basic → finalize_qc_basic:
- Output changes:
scautoqc_pooled_basic.h5ad: unfiltered object with all metadata and doublet scores.
- Output changes:
- integrate:
- New parameter for the number of top genes used in integration (
n_top_genes). - New
from_scautoqcparameter to allow integrating arbitrary input objects (when false, it does not remove cell‑cycle genes or stringent doublets).
- New parameter for the number of top genes used in integration (
- gather_matrices:
- Updated Dockerfile and Singularity image for improved stability and compatibility.
- Smarter memory requests.
- Added tag support so Nextflow now shows which sample each process runs.
- Improved output file/folder names.
- Added timestamps to timeline, report and trace files.
25-101
This release includes important bug fixes and enhancements to ensure smoother pipeline execution:
- qc.py adjustments: Now correctly applies QC metrics based on the presence of spliced/unspliced layers.
- Robust metadata handling: The add_metadata step no longer fails when metadata input is absent.
- Improved resumability: The add_metadata_basic process now fully supports resume functionality for interrupted runs.
25-091
This update introduces a new workflow and multiple enhancements based on user feedback:
- Added a new workflow:
subsetworkflow: Enables running the pipeline by subsetting objects using predefined cutoffs instead of the automatic QC workflow (steps 2a-3-4-5a). More details are available in the README.
- Added support for Cell Ranger outputs in addition to STARsolo.
- Improvements in nextflow pipeline:
- Updated the Singularity image for better compatibility.
- Renamed certain output files for clarity.
- Optimised the RESUME functionality to improve reliability.
- Introduced smart memory allocation for the
pool_allandadd_metadatasteps based on input size. - Optimised resource allocation for other processes.
- Enabled the pipeline to work seamlessly with symbolic links in the input.
- Optimisations in scripts:
- Removed unused lines, characters, and packages for cleaner code.
- Fixed hardcoded paths to improve flexibility.
- Optimised memory usage in the
pool_allprocess.
Full Changelog: v0.5.0...v0.6.0
24-143
- New workflow:
only_qc- It is now easier to run the pipeline until the pooling step.
- This can be used to process different sets of samples in different times, then all the outputs from this workflow can be used together with
after_qcmode.
- Improvements and changes in scripts:
- It is now possible to run the pipeline without Cellbender output, STARsolo output is used only in this case.
- Removing metadata columns in integration.py was reverted back
- New RESUME scripts have been added
after_qcworkflow can now use the samples in the sample list only rather than all the objects in the folder.- Reports folder is now named similar to the results folder.
- Outputting plots in
run_qcstep now works as expected.
- Updates in README
- The workflow diagram has been recreated.
- The workflow modes have been described with a new figure.
- Outputs from each step have been described in detail.
- Text in some steps were revised.
24-088
- Added support for single-nuc samples
- Improvements in scripts
- integration.py now removes the columns which were created in previous steps
- RESUME scripts have been reorganised
Full Changelog: v0.3.0...v0.4.0
24-064
- New workflow: after_qc
- It is now easier to work with the samples which has been processed with scAutoQC pipeline before.
- Improvements in nextflow pipeline and python scripts
- Created new RESUME script for afterqc workflow.
- integration.py should now work more efficiently.
- ss_matrix and covar_keys parameters have been removed (they will be considered for the future releases).
Full Changelog: v0.3.0...v0.2.2