Skip to content

Commit 4411f07

Browse files
committed
add check for duplicate cram paths
1 parent 252a7ea commit 4411f07

File tree

6 files changed

+21
-0
lines changed

6 files changed

+21
-0
lines changed

pipelines/wdl/glimpse/low_pass_imputation/input_qc/LowPassImputationQC.wdl

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,16 @@ task ValidateCramsAndIndices {
192192
echo "All CRAM index files have the correct .crai extension."
193193
fi
194194
195+
# validate that cram paths are unique
196+
unique_crams=$(cat crams_list.txt | sort -u | wc -l)
197+
if [ $unique_crams -ne ~{num_crams} ]; then
198+
# find duplicate CRAM paths
199+
duplicate_crams=$(cat crams_list.txt | sort | uniq -d | paste -sd, | sed 's/,/, /g')
200+
echo "Duplicate CRAM paths found: ${duplicate_crams}" >> qc_messages.txt
201+
else
202+
echo "CRAM paths are unique."
203+
fi
204+
195205
# ensure that all CRAM files are less than the maximum file size allowed by the service (currently 10GB)
196206
# this also serves as an access check, which should already have been performed by the service
197207
crams_exceeding_max_size=$(cat crams_list.txt | while read cram; do
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"LowPassImputationQC.reference_panel_prefix": "gs://broad-dsde-methods-bge-resources-public/GlimpseImputation/ReferencePanels/1000G_HGDP_with_trio_information_old_chrX",
3+
"LowPassImputationQC.contigs": ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22"],
4+
"LowPassImputationQC.fasta": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
5+
"LowPassImputationQC.fasta_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
6+
"LowPassImputationQC.output_basename": "plumbing_test",
7+
"LowPassImputationQC.ref_dict": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict",
8+
"LowPassImputationQC.crams": ["gs://fc-cddd72b5-323c-495c-9557-5057fff0275a/morgan_test/fakeCram0.cram", "gs://fc-cddd72b5-323c-495c-9557-5057fff0275a/morgan_test/fakeCram0.cram"],
9+
"LowPassImputationQC.cram_indices": ["gs://fc-cddd72b5-323c-495c-9557-5057fff0275a/morgan_test/fakeCram0.cram.crai", "gs://fc-cddd72b5-323c-495c-9557-5057fff0275a/morgan_test/fakeCram0.cram.crai"],
10+
"LowPassImputationQC.sample_ids": ["sample0", "sample1"]
11+
}

pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_with_duplicate_sample_ids.json renamed to pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_duplicate_sample_ids.json

File renamed without changes.

pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_without_crai.json renamed to pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_no_crai.json

File renamed without changes.

pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_without_sample_ids.json renamed to pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_cram_no_sample_ids.json

File renamed without changes.

pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_manifest_with_duplicate_sample_ids.json renamed to pipelines/wdl/glimpse/low_pass_imputation/input_qc/test_inputs/Plumbing/fail_manifest_duplicate_sample_ids.json

File renamed without changes.

0 commit comments

Comments
 (0)