Skip to content

Commit a0a20b1

Browse files
authored
Merge pull request #46 from matrulda/DEVELOP-1437_seqreports_automatic_tests
DEVELOP-1437: Test data and automatic tests
2 parents aa60649 + 58d7a4b commit a0a20b1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+7351
-86
lines changed

.github/workflows/run_tests.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: Run tests
2+
on: [push]
3+
jobs:
4+
run-tests:
5+
runs-on: ubuntu-20.04
6+
env:
7+
NXF_VER: 21.04.1
8+
NXF_ANSI_LOG: false
9+
steps:
10+
- name: Check out repository code
11+
uses: actions/checkout@v2
12+
13+
- name: Cache singularity images
14+
uses: actions/cache@v2
15+
with:
16+
path: work/singularity
17+
key: singularity-${{ hashFiles('config/nextflow_config/singularity.config') }}
18+
restore-keys: singularity-
19+
20+
- name: Install Singularity
21+
uses: eWaterCycle/setup-singularity@v7
22+
with:
23+
singularity-version: 3.8.3
24+
25+
- name: Install Nextflow
26+
env:
27+
CAPSULE_LOG: none
28+
run: |
29+
curl -s https://get.nextflow.io | bash
30+
sudo mv nextflow /usr/local/bin/
31+
32+
- name: Make Nextflow binary executable
33+
run: chmod +x /usr/local/bin/nextflow
34+
35+
- name: Set up python
36+
uses: actions/setup-python@v2
37+
with:
38+
python-version: 3.9
39+
architecture: x64
40+
41+
- name: Install test requirements
42+
run: pip install -r requirements-dev.txt
43+
44+
- name: Run tests
45+
run: pytest tests
46+
47+
- name: Run Black code formatting check
48+
run: black --check .

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ resources
99
*.simg
1010
*.img
1111
FastQ_Screen_Genomes
12+
venv
13+
__pycache__

README.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ These are the primary config profiles:
2929
- `irma`: Uppmax slurm profile for use on the cluster `irma` (note: The parameter `params.project` must be supplied).
3030
- `snpseq`: Run locally with greater memory available than `dev`.
3131
- `singularity`: Enables singularity and provides container URLs.
32+
- `test`: Run the pipeline using test data
3233

3334
Additional profiles:
3435
- `debug`: prints out the `env` properties before executing processes.
@@ -52,6 +53,35 @@ There are two primary branches of this project:
5253
- `master`: The stable release branch
5354
- `dev`: The development and test branch, to which pull requests should be made.
5455

55-
### Known issues:
56+
Tests are run through GitHub Actions when pushing code to the repo. See instructions below on how to reproduce it locally.
57+
58+
To keep the python parts of the project nice and tidy, we enforce that code should be formatted according to [black](https://github.com/psf/black).
59+
To re-format your code with black, simply run:
60+
```
61+
black .
62+
```
63+
64+
### Running tests locally
65+
66+
Assuming you have installed all pre-requisites (except the fastq screen database: test data comes with a minimal version of it), you can run tests locally by following these steps:
67+
68+
```
69+
# create virtual environment
70+
virtualenv -p python3.9 venv/
71+
72+
# activate venv
73+
source venv/bin/activate
74+
75+
# install dependencies
76+
pip install -r requirements-dev.txt
77+
78+
# run tests
79+
pytest tests/
80+
81+
# perform black formatter check
82+
black --check .
83+
```
84+
85+
## Known issues:
5686

5787
- Unable to download genome indicies using `fastq_screen --get_genomes` as wget within the container does not resolve the address correctly. Fastq Screen must be installed separately (e.g. with conda) and the genomes downloaded prior to running the workflow. The path to the databases must then be given using the `params.fastqscreen_databases` parameter.

bin/get_metadata.py

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,24 @@
88
import json
99

1010

11-
class RunfolderInfo():
12-
11+
class RunfolderInfo:
1312
def __init__(self, runfolder, bcl2fastq_outdir):
1413
self.runfolder = runfolder
1514
self.run_parameters = self.read_run_parameters()
1615
self.stats_json = self.read_stats_json(bcl2fastq_outdir)
1716
self.description_and_identifier = OrderedDict()
18-
self.run_parameters_tags = \
19-
{'RunId': 'Run ID', 'RunID': 'Run ID',
20-
'ApplicationName': 'Control software', 'Application': 'Control software',
21-
'ApplicationVersion': 'Control software version',
22-
'Flowcell': 'Flowcell type', 'FlowCellMode': 'Flowcell type',
23-
'ReagentKitVersion': 'Reagent kit version',
24-
'RTAVersion': 'RTA Version', 'RtaVersion': 'RTA Version',
25-
}
17+
self.run_parameters_tags = {
18+
"RunId": "Run ID",
19+
"RunID": "Run ID",
20+
"ApplicationName": "Control software",
21+
"Application": "Control software",
22+
"ApplicationVersion": "Control software version",
23+
"Flowcell": "Flowcell type",
24+
"FlowCellMode": "Flowcell type",
25+
"ReagentKitVersion": "Reagent kit version",
26+
"RTAVersion": "RTA Version",
27+
"RtaVersion": "RTA Version",
28+
}
2629

2730
def find(self, d, tag):
2831
if tag in d:
@@ -45,7 +48,8 @@ def read_run_parameters(self):
4548

4649
def read_stats_json(self, bcl2fastq_outdir):
4750
stats_json_path = os.path.join(
48-
self.runfolder, bcl2fastq_outdir, "Stats/Stats.json")
51+
self.runfolder, bcl2fastq_outdir, "Stats/Stats.json"
52+
)
4953
if os.path.exists(stats_json_path):
5054
with open(stats_json_path) as f:
5155
return json.load(f)
@@ -72,10 +76,14 @@ def get_read_cycles(self):
7276
try:
7377
for read_info in self.stats_json["ReadInfosForLanes"][0]["ReadInfos"]:
7478
if read_info["IsIndexedRead"]:
75-
read_and_cycles[f"Index {index_counter} (bp)"] = read_info["NumCycles"]
79+
read_and_cycles[f"Index {index_counter} (bp)"] = read_info[
80+
"NumCycles"
81+
]
7682
index_counter += 1
7783
else:
78-
read_and_cycles[f"Read {read_counter} (bp)"] = read_info["NumCycles"]
84+
read_and_cycles[f"Read {read_counter} (bp)"] = read_info[
85+
"NumCycles"
86+
]
7987
read_counter += 1
8088
return read_and_cycles
8189
except TypeError:
@@ -85,19 +93,21 @@ def get_info(self):
8593
results = self.get_read_cycles()
8694
results.update(self.get_run_parameters())
8795
if os.path.exists(os.path.join(self.runfolder, "bcl2fastq_version")):
88-
results['bcl2fastq version'] = self.get_bcl2fastq_version(
89-
self.runfolder)
96+
results["bcl2fastq version"] = self.get_bcl2fastq_version(self.runfolder)
9097
return results
9198

9299

93100
if __name__ == "__main__":
94-
parser = argparse.ArgumentParser(
95-
description='Dumps a metadata yaml for MultiQC')
96-
parser.add_argument('--runfolder', type=str,
97-
required=True, help='Path to runfolder')
98-
parser.add_argument('--bcl2fastq-outdir', type=str,
99-
default='Data/Intensities/BaseCalls',
100-
help='Path to bcl2fastq output folder relative to the runfolder')
101+
parser = argparse.ArgumentParser(description="Dumps a metadata yaml for MultiQC")
102+
parser.add_argument(
103+
"--runfolder", type=str, required=True, help="Path to runfolder"
104+
)
105+
parser.add_argument(
106+
"--bcl2fastq-outdir",
107+
type=str,
108+
default="Data/Intensities/BaseCalls",
109+
help="Path to bcl2fastq output folder relative to the runfolder",
110+
)
101111

102112
args = parser.parse_args()
103113
runfolder = args.runfolder
@@ -106,14 +116,16 @@ def get_info(self):
106116
runfolder_info = RunfolderInfo(runfolder, bcl2fastq_outdir)
107117
results = runfolder_info.get_info()
108118

109-
print ('''
119+
print(
120+
"""
110121
id: 'sequencing_metadata'
111122
section_name: 'Sequencing Metadata'
112123
plot_type: 'html'
113124
description: 'regarding the sequencing run'
114125
data: |
115126
<dl class="dl-horizontal">
116-
''')
127+
"""
128+
)
117129
for k, v in results.items():
118130
print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k, v))
119-
print (" </dl>")
131+
print(" </dl>")

bin/get_qc_config.py

Lines changed: 52 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,30 @@ def __init__(self, handler_name, multiqc_mapping, compare_direction):
1313
self.compare_direction = compare_direction
1414

1515

16-
class HandlerMapper():
16+
class HandlerMapper:
1717
def __init__(self):
18-
self._mapper_list = [ValueHandlerMapper(handler_name = 'ClusterPFHandler',
19-
multiqc_mapping = 'total',
20-
compare_direction = 'lt'),
21-
ValueHandlerMapper(handler_name = 'ErrorRateHandler',
22-
multiqc_mapping = 'Error',
23-
compare_direction = 'gt'),
24-
ValueHandlerMapper(handler_name = 'Q30Handler',
25-
multiqc_mapping = 'percent_Q30',
26-
compare_direction = 'lt'),
27-
ValueHandlerMapper(handler_name = 'ReadsPerSampleHandler',
28-
multiqc_mapping = 'mqc-generalstats-bcl2fastq-total',
29-
compare_direction = 'lt')]
18+
self._mapper_list = [
19+
ValueHandlerMapper(
20+
handler_name="ClusterPFHandler",
21+
multiqc_mapping="total",
22+
compare_direction="lt",
23+
),
24+
ValueHandlerMapper(
25+
handler_name="ErrorRateHandler",
26+
multiqc_mapping="Error",
27+
compare_direction="gt",
28+
),
29+
ValueHandlerMapper(
30+
handler_name="Q30Handler",
31+
multiqc_mapping="percent_Q30",
32+
compare_direction="lt",
33+
),
34+
ValueHandlerMapper(
35+
handler_name="ReadsPerSampleHandler",
36+
multiqc_mapping="mqc-generalstats-bcl2fastq-total",
37+
compare_direction="lt",
38+
),
39+
]
3040

3141
self.mapping = self._convert_to_mappings(self._mapper_list)
3242

@@ -36,33 +46,45 @@ def _convert_to_mappings(self, mapper_list):
3646
mapper_dict[mapper.handler_name] = mapper
3747
return mapper_dict
3848

49+
3950
def convert_to_multiqc_config(checkqc_config_dict):
4051
multiqc_config_format = {}
4152
handler_mapper = HandlerMapper()
4253
for mapper_name, mapper in handler_mapper.mapping.items():
4354
qc_criteria = checkqc_config_dict.get(mapper.handler_name)
4455
multiqc_config_value = {mapper.multiqc_mapping: {}}
45-
if not qc_criteria['warning'] == 'unknown':
46-
multiqc_config_value[mapper.multiqc_mapping]['warn'] = [{mapper.compare_direction: qc_criteria['warning']}]
47-
if not qc_criteria['error'] == 'unknown':
48-
multiqc_config_value[mapper.multiqc_mapping]['fail'] = [{mapper.compare_direction: qc_criteria['error']}]
56+
if not qc_criteria["warning"] == "unknown":
57+
multiqc_config_value[mapper.multiqc_mapping]["warn"] = [
58+
{mapper.compare_direction: qc_criteria["warning"]}
59+
]
60+
if not qc_criteria["error"] == "unknown":
61+
multiqc_config_value[mapper.multiqc_mapping]["fail"] = [
62+
{mapper.compare_direction: qc_criteria["error"]}
63+
]
64+
65+
multiqc_config_format[mapper.multiqc_mapping] = multiqc_config_value[
66+
mapper.multiqc_mapping
67+
]
4968

50-
multiqc_config_format[mapper.multiqc_mapping] = multiqc_config_value[mapper.multiqc_mapping]
69+
return {"table_cond_formatting_rules": multiqc_config_format}
5170

52-
return {'table_cond_formatting_rules': multiqc_config_format}
5371

5472
def convert_to_dict(checkqc_config):
5573
checkqc_config_dict = {}
5674
for qc_handler in checkqc_config:
57-
checkqc_config_dict[qc_handler['name']] = qc_handler
75+
checkqc_config_dict[qc_handler["name"]] = qc_handler
5876

5977
return checkqc_config_dict
6078

6179

6280
if __name__ == "__main__":
63-
parser = argparse.ArgumentParser(description='Converts CheckQC tresholds to MultiQC conditional format')
64-
parser.add_argument('--runfolder', type=str, required=True, help='Path to runfolder')
65-
parser.add_argument('--config', type=str, help='Path to checkQC config')
81+
parser = argparse.ArgumentParser(
82+
description="Converts CheckQC tresholds to MultiQC conditional format"
83+
)
84+
parser.add_argument(
85+
"--runfolder", type=str, required=True, help="Path to runfolder"
86+
)
87+
parser.add_argument("--config", type=str, help="Path to checkQC config")
6688

6789
args = parser.parse_args()
6890
runfolder = args.runfolder
@@ -71,12 +93,16 @@ def convert_to_dict(checkqc_config):
7193
run_type_recognizer = RunTypeRecognizer(runfolder)
7294
config = ConfigFactory.from_config_path(config)
7395

74-
instrument_and_reagent_version = run_type_recognizer.instrument_and_reagent_version()
96+
instrument_and_reagent_version = (
97+
run_type_recognizer.instrument_and_reagent_version()
98+
)
7599
both_read_lengths = run_type_recognizer.read_length()
76100
read_length = int(both_read_lengths.split("-")[0])
77-
checkqc_config = config.get_handler_configs(instrument_and_reagent_version, read_length)
101+
checkqc_config = config.get_handler_configs(
102+
instrument_and_reagent_version, read_length
103+
)
78104
checkqc_config_dict = convert_to_dict(checkqc_config)
79105
multiqc_config = convert_to_multiqc_config(checkqc_config_dict)
80106

81-
with open('qc_thresholds.yaml', 'w') as outfile:
107+
with open("qc_thresholds.yaml", "w") as outfile:
82108
yaml.dump(multiqc_config, outfile)
File renamed without changes.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
singularity {
2+
enabled = true
3+
autoMounts = true
4+
}
5+
6+
process {
7+
withName: 'FASTQC' {
8+
container = 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--hdfd78af_1'
9+
}
10+
withName: 'FASTQ_SCREEN' {
11+
container = 'https://depot.galaxyproject.org/singularity/fastq-screen:0.14.0--pl5262hdfd78af_1'
12+
}
13+
withName: 'GET_QC_THRESHOLDS' {
14+
container = 'https://depot.galaxyproject.org/singularity/checkqc:3.6.6--pyhdfd78af_0'
15+
}
16+
withName: 'GET_METADATA' {
17+
container = 'https://depot.galaxyproject.org/singularity/checkqc:3.6.6--pyhdfd78af_0'
18+
}
19+
withName: 'INTEROP_SUMMARY' {
20+
container = 'https://depot.galaxyproject.org/singularity/illumina-interop:1.1.23--h1b792b2_0'
21+
}
22+
withName: 'MULTIQC_PER_FLOWCELL' {
23+
container = 'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0'
24+
}
25+
withName: 'MULTIQC_PER_PROJECT' {
26+
container = 'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0'
27+
}
28+
}
29+

config/nextflow_config/test.config

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
/*
2+
========================================================================================
3+
Nextflow config file for running minimal tests
4+
========================================================================================
5+
Defines input files and everything required to run a fast and simple pipeline test.
6+
Use as follows:
7+
nextflow run main.nf -profile dev,test,singularity
8+
9+
10+
This config takes inspiration from https://github.com/nf-core/rnaseq
11+
----------------------------------------------------------------------------------------
12+
*/
13+
14+
params {
15+
run_folder = "$baseDir/test_data/210510_M03910_0104_000000000-JHGJL"
16+
fastqscreen_databases = "$baseDir/test_data/Test_FastQ_Screen_Genomes"
17+
checkqc_config = "$baseDir/test_data/checkqc_config.yaml"
18+
config_dir = "$baseDir/test_data/test_config"
19+
}
File renamed without changes.

0 commit comments

Comments
 (0)