Skip to content

Commit f0f59db

Browse files
committed
update StarAlign.wdl to move chemistry decision logic to the workflow, not the tasks
1 parent 5e1dd23 commit f0f59db

File tree

8 files changed

+157
-111
lines changed

8 files changed

+157
-111
lines changed

WARP_WDL_Style_Guide.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,28 @@
4646
- Cloud provider support inputs and validation
4747
- Output block organization
4848
- Error handling with ErrorWithMessage task
49+
- **Workflow-level parameter derivation**
50+
- Compute derived values (e.g. tool flags, directory paths, numeric parameters) as WDL expressions in the workflow, not as bash conditionals inside task command blocks
51+
- Pass the computed values as explicit task inputs rather than raw enum/mode inputs that the task must interpret
52+
- This keeps tasks focused on execution and makes the parameter contract visible in the WDL call site
53+
- Example — instead of passing `Int chemistry` and branching in bash:
54+
```wdl
55+
# Good: compute in workflow, pass to task
56+
Int umi_len = if tenx_chemistry_version == 2 then 10 else 12
57+
call MyTask { input: umi_len = umi_len }
58+
```
59+
```wdl
60+
# Avoid: passing raw mode flag and branching in bash
61+
call MyTask { input: chemistry = tenx_chemistry_version }
62+
# ...where the task command block contains if/elif/else to derive umi_len
63+
```
64+
- When a derived value requires string parsing that WDL 1.0 cannot express (e.g. parsing a read structure like `"8C18C9M1X"`), use a small dedicated utility task rather than embedding the logic in a large execution task
65+
- **Input validation placement**
66+
- Validate all user-facing inputs at the workflow level using `ErrorWithMessage` or a dedicated input-checking task (e.g. `checkOptimusInput`), not inside execution tasks
67+
- Consolidate validation for a pipeline into a single task or workflow section so there is one place to check for all constraints
68+
- Do not duplicate validation in both the workflow and the task — validate once upstream, then trust the inputs downstream
69+
- **Derived values with embedded spaces**
70+
- When a derived string contains spaces (e.g. `"GeneFull_Ex50pAS Gene"` for STAR's `--soloFeatures`), add a comment at the declaration site noting that the value is intentionally multi-word and must remain unquoted in the command block
4971
5072
## 6. Task Structure
5173
@@ -60,6 +82,11 @@
6082
- Runtime parameter specification
6183
- Docker image handling for multi-cloud
6284
- Memory, CPU, disk sizing
85+
- **Task purity principle**
86+
- Tasks should execute their tool with the parameters they are given, not decide *what* to run based on mode flags
87+
- Bash conditionals in command blocks are appropriate for post-execution file handling (moving outputs, renaming files) but not for deriving tool parameters or validating inputs
88+
- If a task currently branches on a mode input to select different tool flags, refactor by computing the flag values at the workflow level and passing them as explicit inputs
89+
- This makes tasks reusable across workflows without carrying pipeline-specific validation logic
6390
6491
## 7. Docker Handling
6592

pipeline_versions.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ ImputationBeagle 3.0.1 2026-02-23
1111
JointGenotyping 1.7.3 2025-08-11
1212
MultiSampleSmartSeq2SingleNucleus 2.2.5 2026-02-24
1313
Multiome 6.1.5 2026-02-24
14-
Optimus 8.0.6 2026-02-24
14+
Optimus 8.0.7 2026-03-31
1515
PairedTag 2.1.11 2026-02-24
1616
PeakCalling 1.0.1 2025-08-11
1717
Pipeline Name Version Date of Last Commit
1818
RNAWithUMIsPipeline 1.0.20 2026-01-21
1919
ReblockGVCF 2.4.4 2026-01-29
20-
SlideSeq 3.6.5 2026-02-24
20+
SlideSeq 3.6.6 2026-03-31
2121
SlideTags 1.0.8 2026-02-24
2222
UltimaGenomicsJointGenotyping 1.2.3 2025-08-11
2323
UltimaGenomicsWholeGenomeCramOnly 1.1.3 2026-01-21

pipelines/wdl/optimus/Optimus.changelog.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
# 8.0.7
2+
2026-03-31 (Date of Last Commit)
3+
4+
* Moved conditional logic for chemistry, counting mode, strand mode, and solo features out of STARsoloFastq bash command block into workflow-level WDL expressions
5+
* Added star_strand_mode validation to checkOptimusInput task
6+
* Replaced chemistry input with explicit umi_len, cb_len, solo_features, and solo_directory task inputs for STARsoloFastq
7+
* Removed redundant bash input validation from STARsoloFastq task (already validated by checkOptimusInput)
8+
* No functional changes to pipeline outputs
9+
110
# 8.0.6
211
2026-02-24 (Date of Last Commit)
312

pipelines/wdl/optimus/Optimus.wdl

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ workflow Optimus {
7878
}
7979

8080
# Version of this pipeline
81-
String pipeline_version = "8.0.6"
81+
String pipeline_version = "8.0.7"
8282

8383
# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
8484
Array[Int] indices = range(length(r1_fastq))
@@ -148,6 +148,7 @@ workflow Optimus {
148148
force_no_check = force_no_check,
149149
counting_mode = counting_mode,
150150
count_exons = count_exons,
151+
star_strand_mode = star_strand_mode,
151152
gcp_whitelist_v2 = gcp_whitelist_v2,
152153
gcp_whitelist_v3 = gcp_whitelist_v3,
153154
azure_whitelist_v2 = azure_whitelist_v2,
@@ -165,14 +166,25 @@ workflow Optimus {
165166
ubuntu_docker_path = ubuntu_docker_prefix + ubuntu_docker
166167
}
167168
169+
# Compute STAR parameters at workflow level instead of bash conditionals in task
170+
Int umi_len = if tenx_chemistry_version == 2 then 10 else 12
171+
Int cb_len = 16
172+
String solo_features = if counting_mode == "sc_rna" then "Gene"
173+
else if count_exons then "GeneFull_Ex50pAS Gene"
174+
else "GeneFull_Ex50pAS"
175+
String solo_directory = if counting_mode == "sc_rna" then "Solo.out/Gene" else "Solo.out/GeneFull_Ex50pAS"
176+
168177
call StarAlign.STARsoloFastq as STARsoloFastq {
169178
input:
170179
r1_fastq = r1_fastq,
171180
r2_fastq = r2_fastq,
172181
star_strand_mode = star_strand_mode,
173182
white_list = whitelist,
174183
tar_star_reference = tar_star_reference,
175-
chemistry = tenx_chemistry_version,
184+
umi_len = umi_len,
185+
cb_len = cb_len,
186+
solo_features = solo_features,
187+
solo_directory = solo_directory,
176188
counting_mode = counting_mode,
177189
count_exons = count_exons,
178190
input_id = input_id,

pipelines/wdl/slideseq/SlideSeq.changelog.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
# 3.6.6
2+
2026-03-31 (Date of Last Commit)
3+
4+
* Moved read structure parsing out of STARsoloFastqSlideSeq bash command block into a new ParseReadStructure task called at the workflow level
5+
* Moved solo features conditional logic out of STARsoloFastqSlideSeq bash command block into a workflow-level WDL expression
6+
* Replaced read_structure input with explicit umi_len, cb_len, and solo_features task inputs for STARsoloFastqSlideSeq
7+
* No functional changes to pipeline outputs
8+
19
# 3.6.5
210
2026-02-24 (Date of Last Commit)
311

pipelines/wdl/slideseq/SlideSeq.wdl

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ import "../../../tasks/wdl/Utilities.wdl" as utils
2525

2626
workflow SlideSeq {
2727

28-
String pipeline_version = "3.6.5"
28+
String pipeline_version = "3.6.6"
2929

3030
input {
3131
Array[File] r1_fastq
@@ -56,6 +56,11 @@ workflow SlideSeq {
5656
String acr_ubuntu_docker_prefix = "dsppipelinedev.azurecr.io/"
5757
String ubuntu_docker_prefix = if cloud_provider == "gcp" then gcp_ubuntu_docker_prefix else acr_ubuntu_docker_prefix
5858

59+
String alpine_docker = "alpine-bash@sha256:965a718a07c700a5204c77e391961edee37477634ce2f9cf652a8e4c2db858ff"
60+
String gcp_alpine_docker_prefix = "bashell/"
61+
String acr_alpine_docker_prefix = "dsppipelinedev.azurecr.io/"
62+
String alpine_docker_prefix = if cloud_provider == "gcp" then gcp_alpine_docker_prefix else acr_alpine_docker_prefix
63+
5964
String gcr_docker_prefix = "us.gcr.io/broad-gotc-prod/"
6065
String acr_docker_prefix = "dsppipelinedev.azurecr.io/"
6166

@@ -100,6 +105,15 @@ workflow SlideSeq {
100105
sample_id = input_id,
101106
whitelist = bead_locations
102107
}
108+
109+
# Parse read structure into UMI and CB lengths, and compute solo_features at workflow level
110+
call StarAlign.ParseReadStructure as ParseReadStructure {
111+
input:
112+
read_structure = read_structure,
113+
alpine_docker_path = alpine_docker_prefix + alpine_docker
114+
}
115+
String slideseq_solo_features = if count_exons then "Gene GeneFull" else "GeneFull"
116+
103117
scatter(idx in range(length(SplitFastq.fastq_R1_output_array))) {
104118
call StarAlign.STARsoloFastqSlideSeq as STARsoloFastqSlideSeq {
105119
input:
@@ -108,7 +122,9 @@ workflow SlideSeq {
108122
whitelist = bead_locations,
109123
tar_star_reference = tar_star_reference,
110124
output_bam_basename = output_bam_basename + "_" + idx,
111-
read_structure = read_structure,
125+
umi_len = ParseReadStructure.umi_len,
126+
cb_len = ParseReadStructure.cb_len,
127+
solo_features = slideseq_solo_features,
112128
count_exons = count_exons
113129
}
114130
}

tasks/wdl/CheckInputs.wdl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ task checkOptimusInput {
6161
String counting_mode
6262
Boolean force_no_check
6363
Boolean count_exons
64+
String star_strand_mode
6465
Int disk = ceil(size(r1_fastq, "GiB")) + 50
6566
Int machine_mem_mb = 1000
6667
Int cpu = 1
@@ -99,6 +100,12 @@ task checkOptimusInput {
99100
echo "ERROR: Invalid value \"${counting_mode}\" for input \"counting_mode\""
100101
fi
101102
103+
if [[ ! ("~{star_strand_mode}" == "Forward" || "~{star_strand_mode}" == "Reverse" || "~{star_strand_mode}" == "Unstranded") ]]
104+
then
105+
pass="false"
106+
echo "ERROR: Invalid value \"~{star_strand_mode}\" for input \"star_strand_mode\". Should be Forward, Reverse, or Unstranded."
107+
fi
108+
102109
if [[ ~{force_no_check} == "true" ]]
103110
then
104111
echo "force_no_check is set: Ignoring input checks"

0 commit comments

Comments
 (0)