-
Notifications
You must be signed in to change notification settings - Fork 37
feat: update github actions workflow #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4d7b3a2
1dfa7ad
facf377
0c2c8d1
cf59f11
776b97e
2348055
7a3a40e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ resources/** | |
logs/** | ||
.snakemake | ||
.snakemake/** | ||
.test/results/* |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,27 @@ | ||||||||||||||||||||||||||||||
samplesheet: "config/samples.tsv" | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
get_genome: | ||||||||||||||||||||||||||||||
database: "ncbi" | ||||||||||||||||||||||||||||||
assembly: "GCF_000006785.2" | ||||||||||||||||||||||||||||||
fasta: Null | ||||||||||||||||||||||||||||||
gff: Null | ||||||||||||||||||||||||||||||
gff_source_type: | ||||||||||||||||||||||||||||||
[ | ||||||||||||||||||||||||||||||
"RefSeq": "gene", | ||||||||||||||||||||||||||||||
"RefSeq": "pseudogene", | ||||||||||||||||||||||||||||||
"RefSeq": "CDS", | ||||||||||||||||||||||||||||||
"Protein Homology": "CDS", | ||||||||||||||||||||||||||||||
] | ||||||||||||||||||||||||||||||
Comment on lines
+8
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix duplicate keys in gff_source_type configuration. The current structure has duplicate "RefSeq" keys which will cause only the last entry to be used. This means the "gene" and "pseudogene" entries will be lost. Restructure the configuration to preserve all entries: gff_source_type:
[
- "RefSeq": "gene",
- "RefSeq": "pseudogene",
- "RefSeq": "CDS",
+ {"source": "RefSeq", "type": "gene"},
+ {"source": "RefSeq", "type": "pseudogene"},
+ {"source": "RefSeq", "type": "CDS"},
"Protein Homology": "CDS",
] 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
simulate_reads: | ||||||||||||||||||||||||||||||
read_length: 100 | ||||||||||||||||||||||||||||||
read_number: 100000 | ||||||||||||||||||||||||||||||
random_freq: 0.01 | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
cutadapt: | ||||||||||||||||||||||||||||||
threep_adapter: "-a ATCGTAGATCGG" | ||||||||||||||||||||||||||||||
fivep_adapter: "-A GATGGCGATAGG" | ||||||||||||||||||||||||||||||
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"] | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
multiqc: | ||||||||||||||||||||||||||||||
config: "config/multiqc_config.yml" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
remove_sections: | ||
- samtools-stats |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
sample condition replicate read1 read2 | ||
sample1 wild_type 1 sample1.bwa.read1.fastq.gz sample1.bwa.read2.fastq.gz | ||
sample2 wild_type 2 sample2.bwa.read1.fastq.gz sample2.bwa.read2.fastq.gz |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,122 @@ | ||
# Snakemake workflow: `<name>` | ||
|
||
[](https://snakemake.github.io) | ||
[](https://github.com/<owner>/<repo>/actions?query=branch%3Amain+workflow%3ATests) | ||
|
||
[](https://snakemake.github.io) | ||
[](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml) | ||
[](https://docs.conda.io/en/latest/) | ||
[](https://sylabs.io/docs/) | ||
[](https://snakemake.github.io/snakemake-workflow-catalog) | ||
|
||
A Snakemake workflow for `<description>` | ||
|
||
- [Snakemake workflow: `<name>`](#snakemake-workflow-name) | ||
- [Usage](#usage) | ||
- [Workflow overview](#workflow-overview) | ||
- [Running the workflow](#running-the-workflow) | ||
- [Input data](#input-data) | ||
- [Execution](#execution) | ||
- [Parameters](#parameters) | ||
- [Authors](#authors) | ||
- [References](#references) | ||
- [TODO](#todo) | ||
|
||
## Usage | ||
|
||
The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=<owner>%2F<repo>). | ||
|
||
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) <repo>sitory and its DOI (see above). | ||
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI. | ||
|
||
## Workflow overview | ||
|
||
This workflow is a best-practice workflow for `<detailed description>`. | ||
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps: | ||
|
||
1. Parse sample sheet containing sample meta data (`python`) | ||
2. Simulate short read sequencing data on the fly (`dwgsim`) | ||
3. Check quality of input read data (`FastQC`) | ||
4. Trim adapters from input data (`cutadapt`) | ||
5. Collect statistics from tool output (`MultiQC`) | ||
|
||
## Running the workflow | ||
|
||
### Input data | ||
|
||
This template workflow creates artificial sequencing data in `*.fastq.gz` format. It does not contain actual input data. The simulated input files are nevertheless created based on a mandatory table linked in the `config.yml` file (default: `.test/samples.tsv`). The sample sheet has the following layout: | ||
|
||
| sample | condition | replicate | read1 | read2 | | ||
| ------- | --------- | --------- | -------------------------- | -------------------------- | | ||
| sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz | | ||
| sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz | | ||
|
||
|
||
### Execution | ||
|
||
To run the workflow from command line, change the working directory. | ||
|
||
```bash | ||
cd path/to/snakemake-workflow-name | ||
``` | ||
|
||
Adjust options in the default config file `config/config.yml`. | ||
Before running the entire workflow, you can perform a dry run using: | ||
|
||
```bash | ||
snakemake --dry-run | ||
``` | ||
|
||
To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory. | ||
|
||
```bash | ||
snakemake --cores 3 --sdm conda --directory .test | ||
``` | ||
|
||
To run the workflow with **singularity** / **apptainer**, add a link to a container registry in the `Snakefile`, for example: | ||
`container: "oras://ghcr.io/<user>/<repository>:<version>"` for Github's container registry. Run the workflow with: | ||
|
||
```bash | ||
snakemake --cores 3 --sdm conda apptainer --directory .test | ||
``` | ||
|
||
### Parameters | ||
|
||
This table lists all parameters that can be used to run the workflow. | ||
|
||
| parameter | type | details | default | | ||
| ------------------ | ---- | --------------------------------------- | --------------------------------------------- | | ||
| **samplesheet** | | | | | ||
| path | str | path to samplesheet, mandatory | "config/samples.tsv" | | ||
| **get_genome** | | | | | ||
| database | str | one of `manual`, `ncbi` | `ncbi` | | ||
| assembly | str | RefSeq ID | `GCF_000006785.2` | | ||
| fasta | str | optional path to fasta file | Null | | ||
| gff | str | optional path to gff file | Null | | ||
| gff_source_type | str | list of name/value pairs for GFF source | see config file | | ||
| **simulate_reads** | | | | | ||
| read_length | num | length of target reads in bp | 100 | | ||
| read_number | num | number of total reads to be simulated | 100000 | | ||
| random_freq | num | frequency of random read sequences | 0.01 | | ||
| **cutadapt** | | | | | ||
| threep_adapter | str | sequence of the 3' adapter | `-a ATCGTAGATCGG` | | ||
| fivep_adapter | str | sequence of the 5' adapter | `-A GATGGCGATAGG` | | ||
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 25 `, `-M 100`, `--overlap=5`] | | ||
| **multiqc** | | | | | ||
| config | str | path to multiQC config | `config/multiqc_config.yml` | | ||
|
||
## Authors | ||
|
||
- Firstname Lastname | ||
- Affiliation | ||
- ORCID profile | ||
- home page | ||
|
||
## References | ||
|
||
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. *Sustainable data analysis with Snakemake*. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2. | ||
|
||
# TODO | ||
## TODO | ||
|
||
* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization. | ||
* Replace `<name>` with the workflow name (can be the same as `<repo>`). | ||
* Replace `<description>` with a description of what the workflow does. | ||
* Update the workflow description, parameters, running options, authors and references in the `README.md` | ||
* Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's capability | ||
* The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if `<owner>` and `<repo>` were correctly set. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,82 @@ | ||
Describe how to configure the workflow (using config.yaml and maybe additional files). | ||
All of them need to be present with example entries inside of the config folder. | ||
## Workflow overview | ||
|
||
This workflow is a best-practice workflow for `<detailed description>`. | ||
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps: | ||
|
||
1. Parse sample sheet containing sample meta data (`python`) | ||
2. Simulate short read sequencing data on the fly (`dwgsim`) | ||
3. Check quality of input read data (`FastQC`) | ||
4. Trim adapters from input data (`cutadapt`) | ||
5. Collect statistics from tool output (`MultiQC`) | ||
|
||
## Running the workflow | ||
|
||
### Input data | ||
|
||
This template workflow creates artificial sequencing data in `*.fastq.gz` format. It does not contain actual input data. The simulated input files are nevertheless created based on a mandatory table linked in the `config.yml` file (default: `.test/samples.tsv`). The sample sheet has the following layout: | ||
|
||
| sample | condition | replicate | read1 | read2 | | ||
| ------- | --------- | --------- | -------------------------- | -------------------------- | | ||
| sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz | | ||
| sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz | | ||
|
||
|
||
### Execution | ||
|
||
To run the workflow from command line, change the working directory. | ||
|
||
```bash | ||
cd path/to/snakemake-workflow-name | ||
``` | ||
|
||
Adjust options in the default config file `config/config.yml`. | ||
Before running the entire workflow, you can perform a dry run using: | ||
|
||
```bash | ||
snakemake --dry-run | ||
``` | ||
|
||
To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory. | ||
|
||
```bash | ||
snakemake --cores 3 --sdm conda --directory .test | ||
``` | ||
|
||
To run the workflow with **singularity** / **apptainer**, add a link to a container registry in the `Snakefile`, for example: | ||
`container: "oras://ghcr.io/<user>/<repository>:<version>"` for Github's container registry. Run the workflow with: | ||
|
||
```bash | ||
snakemake --cores 3 --sdm conda apptainer --directory .test | ||
``` | ||
|
||
### Parameters | ||
|
||
This table lists all parameters that can be used to run the workflow. | ||
|
||
| parameter | type | details | default | | ||
| ------------------ | ---- | --------------------------------------- | --------------------------------------------- | | ||
| **samplesheet** | | | | | ||
| path | str | path to samplesheet, mandatory | "config/samples.tsv" | | ||
| **get_genome** | | | | | ||
| database | str | one of `manual`, `ncbi` | `ncbi` | | ||
| assembly | str | RefSeq ID | `GCF_000006785.2` | | ||
| fasta | str | optional path to fasta file | Null | | ||
| gff | str | optional path to gff file | Null | | ||
| gff_source_type | str | list of name/value pairs for GFF source | see config file | | ||
| **simulate_reads** | | | | | ||
| read_length | num | length of target reads in bp | 100 | | ||
| read_number | num | number of total reads to be simulated | 100000 | | ||
| random_freq | num | frequency of random read sequences | 0.01 | | ||
| **cutadapt** | | | | | ||
| threep_adapter | str | sequence of the 3' adapter | `-a ATCGTAGATCGG` | | ||
| fivep_adapter | str | sequence of the 5' adapter | `-A GATGGCGATAGG` | | ||
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 25 `, `-M 100`, `--overlap=5`] | | ||
| **multiqc** | | | | | ||
| config | str | path to multiQC config | `config/multiqc_config.yml` | | ||
|
||
## TODO | ||
|
||
* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization. | ||
* Replace `<name>` with the workflow name (can be the same as `<repo>`). | ||
* Replace `<description>` with a description of what the workflow does. | ||
* Update the workflow parameters and running options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the checkout action version.
The runner of "actions/checkout@v2" action is outdated. Update to v4 for consistency with other jobs.
📝 Committable suggestion
🧰 Tools
🪛 actionlint (1.7.4)
29-29: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)