openpipelines-bio
diff --git a/‎.lintr‎
Lines changed: 3 additions & 0 deletions b/‎.lintr‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 19 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 19 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 253 additions & 8 deletions b/‎README.md‎
Lines changed: 253 additions & 8 deletions
@@ -0,0 +1,3 @@
+exclusions: list(
+    "README.qmd"
+    )
@@ -2,7 +2,13 @@
 
 ## BREAKING CHANGES
 
-* `download_file` has been deprecated and will be removed in a future release (PR #1015).
+* Removed `split_h5mu_train_test` component (PR #1020).
+
+* `compress_h5mu`: rename `compression` argument to `output_compression` (PR #1017, PR #1018).
+
+* `delimit_fraction`: remove unused `layer` argument (PR #1018).
+
+* `download_file` has been deprecated and will be removed in openpipeline 3.0 (PR #1015).
 
 ## MAJOR CHANGES
 
@@ -12,6 +18,18 @@
 
 * Remove `workflows` directory (PR #993). The workflows which were at one point in this directory were all deprecated and moved to `src/workflows`.
 
+* Move output file compression argument for AnnData and MuData files to a base config file (`src/base/h5_compression_argument.yaml`) (PR #1017).
+
+* Add missing descriptions to components and arguments (PR #1018).
+
+## BUG FIXES
+
+* Bump viash to 0.9.4. This adds support for nextflow versions starting major version 25.01 and fixes an issue where an integer being passed to a argument with `type: double` resulted in an error (PR #1016).
+
+* Fix running `neigbors_leiden_umap` workflow with `-stub` enabled (PR #1026).
+
+* Add missing CUDA enabled `jaxlib` to components that use `scvi-tools` (`scanvi`, `scarches`, `scvi` and `totalvi`) (PR #1028)
+
 # openpipelines 2.1.0
 
 ## BREAKING CHANGES
 
@@ -1,13 +1,258 @@
-OpenPipeline
-================
 
-<!-- README.md is generated by running 'quarto render README.qmd' -->
+
+# OpenPipeline
 
 Extensible single cell analysis pipelines for reproducible and
 large-scale single cell processing using Viash and Nextflow.
 
-The provided pipelines are built using the [Viash
-framework](http://www.viash.io) on top of the nextflow workflow system.
-For more information on Nextflow please visit the [Nextflow github
-page](https://github.com/nextflow-io/nextflow) and the [Nextflow read
-the docs page](https://www.nextflow.io/docs/latest/index.html).
+[![ViashHub](https://img.shields.io/badge/ViashHub-openpipeline-7a4baa.svg)](https://www.viash-hub.com/packages/openpipeline)
+[![GitHub](https://img.shields.io/badge/GitHub-viash--hub%2Fopenpipeline-blue.svg)](https://github.com/openpipelines-bio/openpipeline)
+[![GitHub
+License](https://img.shields.io/github/license/openpipelines-bio/openpipeline.svg)](https://github.com/openpipelines-bio/openpipeline/blob/main/LICENSE)
+[![GitHub
+Issues](https://img.shields.io/github/issues/openpipelines-bio/openpipeline.svg)](https://github.com/openpipelines-bio/openpipeline/issues)
+[![Viash
+version](https://img.shields.io/badge/Viash-v0.9.3-blue.svg)](https://viash.io)
+
+## Documentation
+
+Please find more in-depth documentation on [the
+website](https://openpipelines.bio/).
+
+## Functionality Overview
+
+Openpipelines execute a list of predefined tasks. These descrete steps
+are also provided as standalone components that can be executed
+individually, with a standardized interface. This is especially useful
+when a particular step wraps a tool that you do not necessarily always
+need to execute in a workflow context.
+
+In terms of workflows, the following functionality is provided:
+
+- Demultiplexing: conversion of raw sequencing data to FASTQ objects.
+- [Ingestion](https://openpipelines.bio/fundamentals/architecture.html#sec-ingestion):
+  Read mapping and generating a count matrix.
+- [Single sample
+  processing](https://openpipelines.bio/fundamentals/architecture.html#sec-single-sample):
+  cell filtering and doublet detection.
+- [Multisample
+  processing](https://openpipelines.bio/fundamentals/architecture.html#sec-multisample-processing):
+  Count transformation, normalization, QC metric calulations.
+- [Integration](https://openpipelines.bio/fundamentals/architecture.html#sec-intergration):
+  Clustering, integration and batch correction using single and
+  multimodal methods.
+- Downstream analysis workflows
+
+``` mermaid lang="mermaid"
+flowchart LR
+  demultiplexing["Step 1: Demultiplexing"]
+  ingestion["Step 2: Ingestion"]
+  process_samples["Step 3: Process Samples"]
+  integration["Step 4: Integration"]
+  downstream["Step 5: Downstream"]
+  demultiplexing-->ingestion-->process_samples-->integration-->downstream
+```
+
+## Guided execution using Viash Hub (CLI and Seqera cloud)
+
+Openpipelines is now available on [Viash
+Hub](https://www.viash-hub.com/packages/openpipeline/latest). Viash Hub
+provides a list of components and workflows, together with a graphical
+interface that guides you through the steps of running a workflow or
+standalone component. Intstructions are provided for using a local viash
+or nextflow executable (requires using a linux based OS), but connecting
+to a Seqera cloud instance is also supported.
+
+## Execution using the nextflow executable
+
+Executing a workflow is a bit more involved and requires familiarity
+with the command line interface (CLI).
+
+### Setup
+
+In order to use the workflows in this package on your local computer,
+you’ll need to do the following:
+
+- Install [nextflow](https://www.nextflow.io/docs/latest/install.html)
+- Install a nextflow compatible executor. This workflow provides a
+  profile for [docker](https://docs.docker.com/get-started/).
+
+### Location of the workflow scripts
+
+Nextflow workflow scripts, schema’s and configuration files can be found
+in the `target/nextflow` folder. On the `main` branch however, only the
+source code that needs to be build into the functionning workflows and
+components can be found. Instead, please refer to the `main_build`
+branch or any of the tags to find the `target` folders. Components and
+workflows are organized into namespaces, which can be nested. Workflows
+are located at `target/nextflow/workflows`, while components that
+execute individual workflow steps are
+
+A reference of workflows and modules is also provided in the
+[documentation](https://openpipelines.bio/components/).
+
+### Retrieving a list of a workflow parameters
+
+A list of workflows arguments can be consulted in multiple ways:
+
+- On [Viash Hub](https://www.viash-hub.com/packages/openpipeline/latest)
+- In the [reference
+  documentation](https://openpipelines.bio/components/)
+- The config YAML file lists the argument for each workflow and
+  component
+- In the `target/nextflow` folder, a nextflow schema JSON file
+  (`nextflow_schema.json`) is provided next to each workflow `.nf` file.
+- Using nextflow on the CLI:
+
+``` bash
+nextflow run openpipelines-bio/openpipeline \
+    -r 2.1.1 \
+    -main-script target/nextflow/workflows/ingestion/demux/main.nf \
+    --help
+```
+
+### Resource usage tuning
+
+Nextflow’s labels can be used to specify the amount of resources a
+process can use. This workflow uses the following labels for CPU, memory
+and disk:
+
+- `lowmem`, `lowmem`, `midmem`, `highmem`, `veryhighmem`
+- `lowcpu`, `lowcpu`, `midcpu`, `highcpu`, `veryhighcpu`
+- `lowdisk`, `middisk`, `highdisk`, `veryhighdisk`
+
+The defaults for these labels can be found at
+`src/workflows/utils/labels.config`. Nextflow checks that the specified
+resources for a process do not exceed what is available on the machine
+and will not start if it does. Create your own config file to tune the
+labels to your needs, for example:
+
+    // Resource labels
+    withLabel: verylowcpu { cpus = 2 }
+    withLabel: lowcpu { cpus = 8 }
+    withLabel: midcpu { cpus = 16 }
+    withLabel: highcpu { cpus = 16 }
+
+    withLabel: verylowmem { memory = 4.GB }
+    withLabel: lowmem { memory = 8.GB }
+    withLabel: midmem { memory = 16.GB }
+    withLabel: highmem { memory = 32.GB }
+
+When starting nextflow using the CLI, you can use `-c` to provide the
+file to nextflow and overwrite the defaults.
+
+### Demultiplexing example
+
+Here, generating FASTQ files from raw sequencing data is demonstrated,
+based on data generated using 10X genomic’s protocols. However, BD
+genomics data is also supported by Openpipeline. If you wish to try it
+out yourself, test data is available at
+`s3://openpipelines-data/cellranger_tiny_bcl/bcl`.
+
+``` bash
+nextflow run openpipelines-bio/openpipeline \
+    -r 2.1.1 \
+    -main-script target/nextflow/workflows/ingestion/demux/main.nf \
+    -c "<path to resource config file>" \
+    -profile docker \
+    --publish_dir "<path to output directory>" \
+    --id "cellranger_tiny_bcl" \
+    --input "s3://openpipelines-data/cellranger_tiny_bcl/bcl" \
+    --sample_sheet "s3://openpipelines-data/cellranger_tiny_bcl/bcl/sample_sheet.csv" \
+    --demultiplexer "mkfastq"
+```
+
+### Mapping and read counting
+
+FASTQ files can be mapped to a reference genome and the resulting mapped
+reads can be counted in order to generate a count matrix. Both
+`BD Rhapsody` and `Cell Ranger` are supported. Here, we demonstrate
+using Cell Ranger multi on test data available at
+`s3://openpipelines-data/10x_5k_anticmv`.
+
+In order to facilitate passing multiple argument values, the parameters
+can be specified using a YAML file.
+
+``` yaml
+input:
+    - "s3://openpipelines-data/10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_GEX_*.fastq.gz"
+    - "s3://openpipelines-data/10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_AB_*.fastq.gz"
+    - "s3://openpipelines-data/10x_5k_anticmv/raw/5k_human_antiCMV_T_TBNK_connect_VDJ_*.fastq.gz"
+gex_reference: "s3://openpipelines-data/reference_gencodev41_chr1/reference_cellranger.tar.gz"
+vdj_reference: "s3://openpipelines-data/10x_5k_anticmv/raw/refdata-cellranger-vdj-GRCh38-alts-ensembl-7.0.0.tar.gz"
+feature_reference: "s3://openpipelines-data/10x_5k_anticmv/raw/feature_reference.csv"
+library_id:
+    - "5k_human_antiCMV_T_TBNK_connect_GEX_1_subset"
+    - "5k_human_antiCMV_T_TBNK_connect_AB_subset"
+    - "5k_human_antiCMV_T_TBNK_connect_VDJ_subset"
+library_type:
+    - "Gene Expression"
+    - "Antibody Capture"
+    - "VDJ"
+```
+
+You can pass this file to nextflow using `-params-file`
+
+``` bash
+nextflow run openpipelines-bio/openpipeline \
+    -r 2.1.1 \
+    -main-script target/nextflow/workflows/ingestion/cellranger_multi/main.nf \
+    -c "<path to resource config file>" \
+    -profile docker \
+    -params-file "<path to your parameter YAML file>" \
+    --publish_dir "<path to output directory>"
+```
+
+### Filtering, normalization, clustering, dimensionality reduction and QC calculations (w/o integration)
+
+Once you have an MuData object for each of your samples, you can process
+it into a multisample file that is ready for integration and other
+downstream analyses. This can be done using the `process_samples`
+workflow. Here is an example, but please keep in mind that the exact
+parameters that need to be provided differ depending on you data. A lot
+of functionality for this pipeline can be customized, including the name
+of the output slots where data is being stored.
+
+``` yaml
+param_list:
+    - id: "sample_1"
+      input: "s3://openpipelines-data/concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu"
+      rna_min_counts: 2
+    - id: "sample_2"
+      input: "s3://openpipelines-data/concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu"
+      rna_min_counts: 1
+rna_max_counts: 1000000
+rna_min_genes_per_cell: 1
+rna_max_genes_per_cell: 1000000
+rna_min_cells_per_gene: 1
+rna_min_fraction_mito: 0.0
+rna_max_fraction_mito: 1.0
+```
+
+In order to provide multiple samples to the pipeline, `param_list` is
+used. Using `param_list` it is possible to specify arguments per sample.
+However, it is still possible to define arguments for all samples
+together by listing those outside the `param_list` block.
+
+``` bash
+nextflow run openpipelines-bio/openpipeline \
+    -r 2.1.1 \
+    -main-script target/nextflow/workflows/multiomics/process_samples/main.nf \
+    -c "<path to resource config file>" \
+    -profile docker \
+    -params-file "<path to your parameter YAML file>"
+    --publish_dir "<path to output directory>"
+```
+
+## Executing standalone components using the Viash executable
+
+Another option to execute individual modules on the CLI is to use
+`viash run`. All you need to do is download viash, clone the
+Openpipeline repository and point viash to a config file. However, keep
+in mind that using `viash run` for workflows is currently not supported.
+Please see `viash run --help` for more information on how to use the
+command, but here is an example:
+
+``` bash
+viash run --engine docker src/mapping/cellranger_multi/config.vsh.yaml --help
+```
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+exclusions: list(`
	`2`	`+ "README.qmd"`
	`3`	`+ )`