-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
The input xxxx.g.vcf.gz file was generated using the BAM to VCF Cromwell pipeline: https://github.com/broadinstitute/wdl-runner
When I ran vcf_to_bq without --run_annotation_pipeline - it ran fine and BigQuery tables were created.
When I added the '--run_annotation_pipeline true' parameter - 8570 output files were generated, but none had the **_vep_output.vcf extension. The output file structure was 'annotation/shards/LONG_UUID' with a single file in each called 'count_20000'.
The command I ran was:
#!/bin/bash
# Parameters to replace:
GOOGLE_CLOUD_PROJECT=my_project
GOOGLE_CLOUD_REGION=my_region
TEMP_LOCATION=gs://my_output_bucket/temp
ANNOTATION_LOCATION=gs://my_output_bucket/annotation
INPUT_PATTERN=gs://my_input_bucket/gatk/gatk4-genome-processing-pipeline/output/NA12878.g.vcf.gz
OUTPUT_TABLE=my_project:vcf_to_bq.test_run
COMMAND="vcf_to_bq \
--input_pattern ${INPUT_PATTERN} \
--output_table ${OUTPUT_TABLE} \
--job_name vcf-to-bigquery-09-08-64 \
--run_annotation_pipeline true \
--use_allele_num true \
--max_num_workers 1000 \
--worker_machine_type n1-standard-64 \
--annotation_output_dir ${ANNOTATION_LOCATION} \
--runner DataflowRunner"
docker run -v ~/.config:/root/.config \
gcr.io/cloud-lifesciences/gcp-variant-transforms \
--project "${GOOGLE_CLOUD_PROJECT}" \
--region "${GOOGLE_CLOUD_REGION}" \
--temp_location "${TEMP_LOCATION}" \
"${COMMAND}"
The output error was:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 643, in <module>
raise e
IOError: No files found based on the file pattern gs://my_output_bucket/annotation/**_vep_output.vcf
Metadata
Metadata
Assignees
Labels
No labels