You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+18-1Lines changed: 18 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,30 @@
1
1
## Changelog
2
2
3
+
### 25-091
4
+
This update introduces a new workflow and multiple enhancements based on user feedback:
5
+
6
+
* Added a new workflow:
7
+
*`subset` workflow: Enables running the pipeline by subsetting objects using predefined cutoffs instead of the automatic QC workflow (steps 2a-3-4-5a). More details are available in the [README](README.md#Different-steps-for-`subset`-mode).
8
+
* Added support for Cell Ranger outputs in addition to STARsolo.
9
+
* Improvements in nextflow pipeline:
10
+
* Updated the Singularity image for better compatibility.
11
+
* Renamed certain output files for clarity.
12
+
* Optimised the RESUME functionality to improve reliability.
13
+
* Introduced smart memory allocation for the `pool_all` and `add_metadata` steps based on input size.
14
+
* Optimised resource allocation for other processes.
15
+
* Enabled the pipeline to work seamlessly with symbolic links in the input.
16
+
* Optimisations in scripts:
17
+
* Removed unused lines, characters, and packages for cleaner code.
18
+
* Fixed hardcoded paths to improve flexibility.
19
+
* Optimised memory usage in the `pool_all` process.
20
+
3
21
### 25-064
4
22
* Added two new workflows:
5
23
*`until_integrate` workflow makes it easier to run the steps until integration (1-2-3-4-5)
6
24
*`only_integrate` workflow makes it easier to run the integration step only (6)
7
25
* Improvements and changes in scripts:
8
26
* Folder names in the outputs were renamed.
9
27
10
-
11
28
### 24-143
12
29
* <ins>**New workflow:**</ins> `only_qc`
13
30
* It is now easier to run the pipeline until the pooling step.
Copy file name to clipboardExpand all lines: README.md
+19-23Lines changed: 19 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,23 +8,30 @@ The recommended way to use nextflow is to run it in a screen session. These step
8
8
9
9
1. Start a screen session: `screen -S nf_run1`
10
10
2. Start a small interactive job for nextflow: `bsub -G cellgeni -n1 -R"span[hosts=1]" -Is -q long -R"select[mem>2000] rusage[mem=2000]" -M2000 bash`
11
-
3. Modify one of RESUME scripts (pre-made Nextflow run scripts)
11
+
3. Modify one of RESUME scripts in examples folder (pre-made Nextflow run scripts)
12
12
4. Run the RESUME scripts you modified: `./RESUME-scautoqc-all`
13
13
5. You can leave your screen session and let it run in the background: `Ctrl+A, D`
14
14
15
15
## Files:
16
16
17
17
*`main.nf` - the Nextflow pipeline that executes scAutoQC pipeline.
18
18
*`nextflow.config` - the configuration script that allows the processes to be submitted to IBM LSF on Sanger's HPC and ensures correct environment is set via singularity container (this is an absolute path). Global default parameters are also set in this file and some contain absolute paths.
19
-
*`RESUME-scautoqc-all` - an example run script that executes the whole pipeline.
20
-
*`RESUME-scautoqc-afterqc` - an example run script that executes the pipeline after run_qc and find_doublets steps.
21
-
*`RESUME-scautoqc-onlyqc` - an example run script that executes the pipeline after until pooling step.
22
-
*`bin/gather_matrices.py` - a Python script that gathers matrices from STARsolo, Velocyto and Cellbender outputs (used in step 1).
23
-
*`bin/qc.py` - a Python script that runs automatic QC workflow (used in step 2).
24
-
*`bin/flag_doublet.py` - a Python script that runs scrublet to find doublets (used in step 3).
25
-
*`bin/pool_all.py` - a Python script that combines all of the output objects after QC step (used in step 4).
26
-
*`bin/add_scrublet_meta.py` - a Python script that adds scrublet scores (and metadata if available) (used in step 5).
27
-
*`bin/integration.py` - a Python script that runs scVI integration (used in step 6).
19
+
*`examples/` - a folder that includes pre-made Nextflow run scripts for each workflow:
20
+
*`RESUME-scautoqc-all`
21
+
*`RESUME-scautoqc-onlyqc`
22
+
*`RESUME-scautoqc-afterqc`
23
+
*`RESUME-scautoqc-untilintegrate`
24
+
*`RESUME-scautoqc-onlyintegrate`
25
+
*`RESUME-scautoqc-subset`
26
+
*`bin/` - a folder that includes Python scripts used in the pipeline:
27
+
*`gather_matrices.py` - gathers matrices from STARsolo, Velocyto and Cellbender outputs (used in step 1).
28
+
*`qc.py` - runs automatic QC workflow (used in step 2).
29
+
*`subset.py` - subsets the input object (used in step 2a).
30
+
*`flag_doublet.py` - runs scrublet to find doublets (used in step 3).
31
+
*`pool_all.py` - combines all of the output objects after QC step (used in step 4).
32
+
*`add_scrublet_meta.py` - adds scrublet scores (and metadata if available) (used in step 5).
33
+
*`add_scrublet_meta_basic.py` - adds scrubles scores but doesn't remove any cells or samples (used in 5a).
34
+
*`integration.py` - runs scVI integration (used in step 6).
28
35
*`genes_list/` - a folder that includes cell cycle, immunoglobulin and T cell receptor genes.
29
36
*`Dockerfile` - a dockerfile to reproduce the environment used to run the pipeline.
30
37
@@ -192,6 +199,8 @@ This step requires three inputs:
192
199
193
200
`gather_matrices` step combines the matrices from three inputs into one h5ad object with multiple layers for each sample: raw, spliced, unspliced, ambiguous (only raw layer is used if "GeneFull" mode is specified). Main expression matrix, cell and gene metadata are retrieved from Cellbender output. Raw matrix is retrieved from the expression matrix of STARsolo output folder named Gene. Spliced, unspliced and ambiguous matrices are all retrieved from the expression matrices of STARsolo output folder named Velocyto.
194
201
202
+
This step can also use Cell Ranger inputs if `cr_prefix` option is provided instead of `ss_prefix` option, however this won't include Velocyto outputs.
203
+
195
204
This step produces:
196
205
****[output 1]:*** h5ad object with different layers
197
206
@@ -282,19 +291,6 @@ This step requires the h5ad output from the `pool_all` step and the scrublet CSV
282
291
283
292
The `add_metadata_basic` step is also exclusive to the `subset` mode and replaces the `add_metadata` step from the main pipeline. The key differences are that it does not perform QC scoring per sample and does not remove any cells or samples.
284
293
285
-
286
-
## Future plans
287
-
288
-
### Add run_cellbender process
289
-
290
-
* Current version of pipeline assumes that the Cellbender outputs exist.
291
-
* This addition will allow the pipeline to run Cellbender if the inputs do not exist.
292
-
293
-
### Smart memory allocation
294
-
295
-
* This addition will estimate the average memory needed for pool_all step, so it won't need to try multiple times until it runs well.
0 commit comments