Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
988c8bf
dummy commit
nikellepetrillo Jan 27, 2026
c44ab80
Merge remote-tracking branch 'origin/develop' into develop
nikellepetrillo Feb 5, 2026
a4515e0
Merge remote-tracking branch 'origin/develop' into develop
nikellepetrillo Feb 10, 2026
8f16317
Merge remote-tracking branch 'origin/develop' into develop
nikellepetrillo Feb 18, 2026
1e87df3
Merge remote-tracking branch 'origin/develop' into develop
nikellepetrillo Feb 20, 2026
2398610
add whitelist metadata to h5ad and as txt files
nikellepetrillo Feb 20, 2026
d2b40f5
python indentation
nikellepetrillo Feb 23, 2026
083d5a3
need strings not files
nikellepetrillo Feb 23, 2026
9468722
need strings not files
nikellepetrillo Feb 24, 2026
966b4f1
add whitelist file to optimus h5ad metadata
nikellepetrillo Feb 24, 2026
a21ea36
syntax
nikellepetrillo Feb 24, 2026
983803b
syntax
nikellepetrillo Feb 24, 2026
b98cd7a
syntax
nikellepetrillo Feb 24, 2026
6d7ac14
syntax
nikellepetrillo Feb 24, 2026
86e0397
need quotes
nikellepetrillo Feb 24, 2026
76678df
Merge branch 'develop' into np_add_whitelist_metadata_to_multiome
nikellepetrillo Feb 24, 2026
213d0b4
add to h5adwithexons task too
nikellepetrillo Feb 24, 2026
7efe106
Merge remote-tracking branch 'origin/np_add_whitelist_metadata_to_mul…
nikellepetrillo Feb 24, 2026
04ca44c
changelogs
nikellepetrillo Feb 24, 2026
8a1dcaa
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 24, 2026
0416229
changelogs
nikellepetrillo Feb 24, 2026
20e7ef0
Merge remote-tracking branch 'origin/np_add_whitelist_metadata_to_mul…
nikellepetrillo Feb 24, 2026
86a1cf5
Updated pipeline_versions.txt with all pipeline version information
actions-user Feb 24, 2026
0e83ee3
changelogs
nikellepetrillo Feb 24, 2026
72decc9
Merge remote-tracking branch 'origin/np_add_whitelist_metadata_to_mul…
nikellepetrillo Feb 24, 2026
408d6f5
changelogs
nikellepetrillo Feb 24, 2026
acaf365
doc changes
nikellepetrillo Feb 25, 2026
1aeb3ca
Merge branch 'develop' into np_add_whitelist_metadata_to_multiome
nikellepetrillo Feb 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions pipeline_versions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ IlluminaGenotypingArray 1.12.27 2026-01-21
Imputation 1.1.23 2025-10-03
ImputationBeagle 3.0.1 2026-02-23
JointGenotyping 1.7.3 2025-08-11
MultiSampleSmartSeq2SingleNucleus 2.2.4 2026-01-21
Multiome 6.1.4 2026-01-22
Optimus 8.0.5 2026-01-22
PairedTag 2.1.10 2026-01-22
MultiSampleSmartSeq2SingleNucleus 2.2.5 2026-02-24
Multiome 6.1.5 2026-02-24
Optimus 8.0.6 2026-02-24
PairedTag 2.1.11 2026-02-24
PeakCalling 1.0.1 2025-08-11
Pipeline Name Version Date of Last Commit
RNAWithUMIsPipeline 1.0.20 2026-01-21
ReblockGVCF 2.4.4 2026-01-29
SlideSeq 3.6.4 2026-01-22
SlideTags 1.0.7 2026-01-26
SlideSeq 3.6.5 2026-02-24
SlideTags 1.0.8 2026-02-24
UltimaGenomicsJointGenotyping 1.2.3 2025-08-11
UltimaGenomicsWholeGenomeCramOnly 1.1.3 2026-01-21
UltimaGenomicsWholeGenomeGermline 1.2.2 2026-01-29
Expand Down
6 changes: 6 additions & 0 deletions pipelines/wdl/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 6.1.5
2026-02-24 (Date of Last Commit)

* Added 2 new outputs to Multiome.wdl: gex_whitelist_used and atac_whitelist_used; these outputs indicate the whitelist used for the gene expression and ATAC pipelines, respectively. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to JoinMultiomeBarcodes by storing the GEX and ATAC whitelist paths in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 6.1.4
2026-01-22 (Date of Last Commit)

Expand Down
11 changes: 9 additions & 2 deletions pipelines/wdl/multiome/Multiome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import "../../../tasks/wdl/Utilities.wdl" as utils
workflow Multiome {


String pipeline_version = "6.1.4"
String pipeline_version = "6.1.5"


input {
Expand Down Expand Up @@ -76,6 +76,8 @@ workflow Multiome {
# Determine which whitelist files to use based on cloud provider
File gex_whitelist = if cloud_provider == "gcp" then gcp_gex_whitelist else azure_gex_whitelist
File atac_whitelist = if cloud_provider == "gcp" then gcp_atac_whitelist else azure_atac_whitelist
String gex_whitelist_gs_path = gex_whitelist
String atac_whitelist_gs_path = atac_whitelist

# Make sure either 'gcp' or 'azure' is supplied as cloud_provider input. If not, raise an error
if ((cloud_provider != "gcp") && (cloud_provider != "azure")) {
Expand Down Expand Up @@ -143,7 +145,9 @@ workflow Multiome {
atac_whitelist = atac_whitelist,
atac_fragment = Atac.fragment_file,
input_gtf = annotations_gtf,
input_bwa_reference = tar_bwa_reference
input_bwa_reference = tar_bwa_reference,
gex_whitelist_gs_path = gex_whitelist,
atac_whitelist_gs_path = atac_whitelist
}

if (run_peak_calling) {
Expand All @@ -165,6 +169,9 @@ workflow Multiome {
output {

String multiome_pipeline_version_out = pipeline_version
File gex_whitelist_used = JoinBarcodes.gex_whitelist_name_file
File atac_whitelist_used = JoinBarcodes.atac_whitelist_name_file


# atac outputs
File bam_aligned_output_atac = Atac.bam_aligned_output
Expand Down
6 changes: 6 additions & 0 deletions pipelines/wdl/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 8.0.6
2026-02-24 (Date of Last Commit)

* Added 1 new output to Optimus.wdl: whitelist_input_used; this output indicates the whitelist used. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to OptimusH5adGeneration and SingleNucleusOptimusH5adOutput by storing the whitelist path in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 8.0.5
2026-01-22 (Date of Last Commit)

Expand Down
10 changes: 7 additions & 3 deletions pipelines/wdl/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ workflow Optimus {
}

# Version of this pipeline
String pipeline_version = "8.0.5"
String pipeline_version = "8.0.6"

# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Array[Int] indices = range(length(r1_fastq))
Expand Down Expand Up @@ -230,7 +230,8 @@ workflow Optimus {
empty_drops_result = RunEmptyDrops.empty_drops_result,
counting_mode = counting_mode,
pipeline_version = "Optimus_v~{pipeline_version}",
warp_tools_docker_path = docker_prefix + warp_tools_docker
warp_tools_docker_path = docker_prefix + warp_tools_docker,
gex_whitelist_gs_path = whitelist
}
}

Expand All @@ -256,7 +257,8 @@ workflow Optimus {
cell_id_exon = STARsoloFastq.row_index,
gene_id_exon = STARsoloFastq.col_index,
pipeline_version = "Optimus_v~{pipeline_version}",
warp_tools_docker_path = docker_prefix + warp_tools_docker
warp_tools_docker_path = docker_prefix + warp_tools_docker,
gex_whitelist_gs_path = whitelist
}
}

Expand Down Expand Up @@ -295,12 +297,14 @@ workflow Optimus {
}

File final_h5ad_output = select_first([OptimusH5adGenerationWithExons.h5ad_output, OptimusH5adGeneration.h5ad_output])
File final_whitelist_input = select_first([OptimusH5adGenerationWithExons.whitelist_name_file, OptimusH5adGeneration.whitelist_name_file])
File final_library_metrics = select_first([OptimusH5adGenerationWithExons.library_metrics, OptimusH5adGeneration.library_metrics])

output {
# version of this pipeline
String pipeline_version_out = pipeline_version
File genomic_reference_version = ReferenceCheck.genomic_ref_version
File whitelist_input_used = final_whitelist_input

# Metrics outputs
File cell_metrics = CellMetrics.cell_metrics
Expand Down
6 changes: 6 additions & 0 deletions pipelines/wdl/paired_tag/PairedTag.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 2.1.11
2026-02-24 (Date of Last Commit)

* Added 2 new outputs to Multiome.wdl: gex_whitelist_used and atac_whitelist_used; these outputs indicate the whitelist used for the gene expression and ATAC pipelines, respectively. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to JoinMultiomeBarcodes by storing the GEX and ATAC whitelist paths in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 2.1.10
2026-01-22 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/wdl/paired_tag/PairedTag.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import "../../../tasks/wdl/Utilities.wdl" as utils

workflow PairedTag {

String pipeline_version = "2.1.10"
String pipeline_version = "2.1.11"

input {
String input_id
Expand Down
6 changes: 6 additions & 0 deletions pipelines/wdl/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 3.6.5
2026-02-24 (Date of Last Commit)

* Added 2 new outputs to Multiome.wdl: gex_whitelist_used and atac_whitelist_used; these outputs indicate the whitelist used for the gene expression and ATAC pipelines, respectively. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to JoinMultiomeBarcodes by storing the GEX and ATAC whitelist paths in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 3.6.4
2026-01-22 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/wdl/slideseq/SlideSeq.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import "../../../tasks/wdl/Utilities.wdl" as utils

workflow SlideSeq {

String pipeline_version = "3.6.4"
String pipeline_version = "3.6.5"

input {
Array[File] r1_fastq
Expand Down
6 changes: 6 additions & 0 deletions pipelines/wdl/slidetags/SlideTags.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 1.0.8
2026-02-24 (Date of Last Commit)

* Added 2 new outputs to Multiome.wdl: gex_whitelist_used and atac_whitelist_used; these outputs indicate the whitelist used for the gene expression and ATAC pipelines, respectively. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to JoinMultiomeBarcodes by storing the GEX and ATAC whitelist paths in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 1.0.7
2026-01-26 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/wdl/slidetags/SlideTags.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "../optimus/Optimus.wdl" as optimus

workflow SlideTags {

String pipeline_version = "1.0.7"
String pipeline_version = "1.0.8"

input {

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 2.2.5
2026-02-24 (Date of Last Commit)

* Added 2 new outputs to Multiome.wdl: gex_whitelist_used and atac_whitelist_used; these outputs indicate the whitelist used for the gene expression and ATAC pipelines, respectively. This change is provenance-only and introduces no functional changes to pipeline outputs
* Added whitelist provenance tracking to JoinMultiomeBarcodes by storing the GEX and ATAC whitelist paths in the h5ad unstructured metadata (.uns). This change is provenance-only and introduces no functional changes to pipeline outputs

# 2.2.4
2026-01-21 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ workflow MultiSampleSmartSeq2SingleNucleus {
}

# Version of this pipeline
String pipeline_version = "2.2.4"
String pipeline_version = "2.2.5"

if (false) {
String? none = "None"
Expand Down
59 changes: 56 additions & 3 deletions tasks/wdl/H5adUtils.wdl
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
version 1.0



task OptimusH5adGeneration {


input {
#runtime values
String warp_tools_docker_path
Expand Down Expand Up @@ -35,7 +34,7 @@ task OptimusH5adGeneration {
#String counting_mode = "sc_rna"
String add_emptydrops_data = "yes"
String gtf_path = annotation_file

String gex_whitelist_gs_path

String pipeline_version

Expand All @@ -58,6 +57,9 @@ task OptimusH5adGeneration {

touch empty_drops_result.csvs

whitelist_name=$(basename ~{gex_whitelist_gs_path})
echo "$whitelist_name" > whitelist_input.txt

if [ "~{counting_mode}" == "sc_rna" ]; then
python3 /warptools/scripts/create_h5ad_optimus.py \
~{if defined(empty_drops_result) then "--empty_drops_file " + empty_drops_result else "--empty_drops_file empty_drops_result.csv " } \
Expand Down Expand Up @@ -104,6 +106,15 @@ task OptimusH5adGeneration {
--counting_mode ~{counting_mode} \
--expected_cells ~{expected_cells}


python3 <<CODE
import anndata as ad
adata = ad.read_h5ad("~{input_id}.h5ad")
adata.uns["whitelist"] = {"gex_whitelist_gs_path": "~{gex_whitelist_gs_path}"}
adata.write("~{input_id}.h5ad")
CODE


mv library_metrics.csv ~{input_id}_~{gex_nhash_id}_library_metrics.csv

>>>
Expand All @@ -120,6 +131,7 @@ task OptimusH5adGeneration {
output {
File h5ad_output = "~{input_id}.h5ad"
File library_metrics = "~{input_id}_~{gex_nhash_id}_library_metrics.csv"
File whitelist_name_file = "whitelist_input.txt"
}
}

Expand Down Expand Up @@ -162,6 +174,7 @@ task SingleNucleusOptimusH5adOutput {
# Cell calls from starsolo in TSV format
File? cellbarcodes
String gtf_path = annotation_file
String gex_whitelist_gs_path

String pipeline_version

Expand All @@ -182,6 +195,9 @@ task SingleNucleusOptimusH5adOutput {
command <<<
set -euo pipefail

whitelist_name=$(basename ~{gex_whitelist_gs_path})
echo "$whitelist_name" > whitelist_input.txt

python3 /warptools/scripts/create_snrna_optimus_exons_h5ad.py \
--annotation_file ~{annotation_file} \
--cell_metrics ~{cell_metrics} \
Expand Down Expand Up @@ -211,6 +227,13 @@ task SingleNucleusOptimusH5adOutput {
--counting_mode ~{counting_mode} \
--expected_cells ~{expected_cells}

python3 <<CODE
import anndata as ad
adata = ad.read_h5ad("~{input_id}.h5ad")
adata.uns["whitelist"] = {"gex_whitelist_gs_path": "~{gex_whitelist_gs_path}"}
adata.write("~{input_id}.h5ad")
CODE


mv library_metrics.csv ~{input_id}_~{gex_nhash_id}_library_metrics.csv

Expand All @@ -227,6 +250,7 @@ task SingleNucleusOptimusH5adOutput {
output {
File h5ad_output = "~{input_id}.h5ad"
File library_metrics = "~{input_id}_~{gex_nhash_id}_library_metrics.csv"
File whitelist_name_file = "whitelist_input.txt"
}
}

Expand All @@ -237,6 +261,8 @@ task JoinMultiomeBarcodes {
File gex_h5ad
File gex_whitelist
File atac_whitelist
String gex_whitelist_gs_path
String atac_whitelist_gs_path
String input_gtf
String input_bwa_reference

Expand Down Expand Up @@ -279,6 +305,8 @@ task JoinMultiomeBarcodes {
gex_h5ad = "~{gex_h5ad}"
gex_whitelist = "~{gex_whitelist}"
atac_whitelist = "~{atac_whitelist}"
gex_whitelist_gs = "~{gex_whitelist_gs_path}"
atac_whitelist_gs = "~{atac_whitelist_gs_path}"
input_gtf = "~{input_gtf}"
input_bwa_reference = "~{input_bwa_reference}"

Expand Down Expand Up @@ -327,6 +355,29 @@ task JoinMultiomeBarcodes {
print("Setting Optimus obs to new dataframe")
gex_data.obs = df_gex

import os

# Add whitelist provenance metadata
gex_data.uns["whitelists"] = {
"gex_whitelist_gs_path": gex_whitelist_gs,
"atac_whitelist_gs_path": atac_whitelist_gs
}

atac_data.uns["whitelists"] = {
"gex_whitelist_gs_path": gex_whitelist_gs,
"atac_whitelist_gs_path": atac_whitelist_gs
}

# write out the names of the whitelists in separate text files for provenance tracking
gex_whitelist_name = os.path.basename(gex_whitelist)
atac_whitelist_name = os.path.basename(atac_whitelist)

with open("gex_whitelist_input.txt", "w") as f:
f.write(gex_whitelist_name)

with open("atac_whitelist_input.txt", "w") as f:
f.write(atac_whitelist_name)

# write out the files
gex_data.write("~{gex_base_name}.h5ad")
atac_data.write_h5ad("~{atac_base_name}.h5ad")
Expand Down Expand Up @@ -361,6 +412,8 @@ task JoinMultiomeBarcodes {
File atac_h5ad_file = "~{atac_base_name}.h5ad"
File atac_fragment_tsv = "~{atac_fragment_base}.sorted.tsv.gz"
File atac_fragment_tsv_index = "~{atac_fragment_base}.sorted.tsv.gz.csi"
File gex_whitelist_name_file = "gex_whitelist_input.txt"
File atac_whitelist_name_file = "atac_whitelist_input.txt"
}
}

Expand Down
12 changes: 12 additions & 0 deletions website/docs/Pipelines/Multiome_Pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt
| Output variable name | Filename, if applicable | Output format and description |
|--- | --- | --- |
| multiome_pipeline_version_out | N.A. | String describing the version of the Multiome pipeline used. |
| gex_whitelist_used | `gex_whitelist_input.txt` | File describing the whitelist used for the Optimus pipeline |
| atac_whitelist_used | `atac_whitelist_input.txt` | File describing the whitelist used for the ATAC pipeline |
| bam_aligned_output_atac | `<input_id>_atac.bam` | BAM file containing aligned reads from ATAC workflow. |
| fragment_file_atac | `<input_id>_atac.fragments.sorted.tsv.gz` | Sorted and bgzipped TSV file containing fragment start and stop coordinates per barcode. The columns are "Chromosome", "Start", "Stop", "ATAC Barcode", "Number of reads", and "GEX Barcode". |
| fragment_file_index | `<input_id>_atac.fragments.sorted.tsv.gz.csi` | Tabix CSI index file for the fragment file. |
Expand Down Expand Up @@ -139,6 +141,16 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt
| cellbybin_h5ad_file | h5ad | Cell by bin matrix produced by SnapATAC2 peak calling. This matrix contains (unmerged) peaks in the MACS3 unstructured metadata (adata.uns['MACS3']). The matrix consists of insertion counts per 500 bp genomic bin and cell barcode. |
| cellbypeak_h5ad_file | h5ad | Cell by peak matrix produced by SnapATAC2 peak calling. This matrix contains insertion counts per (merged) peak coordinates and per cell barcode. |

**Provenance metadata (whitelist):** The v6.1.5 Multiome pipeline now records the barcode whitelist used during processing in the .uns metadata of both GEX and ATAC h5ad outputs.. This is a provenance-only update and does not alter any counts, metrics, or downstream results.

You can inspect the whitelist file path in Python as follows:

```python
import anndata

adata = anndata.read_h5ad("<file_name>.h5ad")
print(adata.uns["whitelists"])
```


## Versioning and testing
Expand Down
Loading
Loading