Skip to content

vcf_to_bq running with --run_annotation_pipeline fails to find **_vep_output.vcf files #655

@TomGardner

Description

@TomGardner

The input xxxx.g.vcf.gz file was generated using the BAM to VCF Cromwell pipeline: https://github.com/broadinstitute/wdl-runner

When I ran vcf_to_bq without --run_annotation_pipeline - it ran fine and BigQuery tables were created.

When I added the '--run_annotation_pipeline true' parameter - 8570 output files were generated, but none had the **_vep_output.vcf extension. The output file structure was 'annotation/shards/LONG_UUID' with a single file in each called 'count_20000'.

The command I ran was:

#!/bin/bash
# Parameters to replace:
GOOGLE_CLOUD_PROJECT=my_project
GOOGLE_CLOUD_REGION=my_region
TEMP_LOCATION=gs://my_output_bucket/temp
ANNOTATION_LOCATION=gs://my_output_bucket/annotation
INPUT_PATTERN=gs://my_input_bucket/gatk/gatk4-genome-processing-pipeline/output/NA12878.g.vcf.gz
OUTPUT_TABLE=my_project:vcf_to_bq.test_run

COMMAND="vcf_to_bq \
  --input_pattern ${INPUT_PATTERN} \
  --output_table ${OUTPUT_TABLE} \
  --job_name vcf-to-bigquery-09-08-64 \
  --run_annotation_pipeline true \
  --use_allele_num true \
  --max_num_workers 1000 \
  --worker_machine_type n1-standard-64 \
  --annotation_output_dir ${ANNOTATION_LOCATION} \
  --runner DataflowRunner"

docker run -v ~/.config:/root/.config \
  gcr.io/cloud-lifesciences/gcp-variant-transforms \
  --project "${GOOGLE_CLOUD_PROJECT}" \
  --region "${GOOGLE_CLOUD_REGION}" \
  --temp_location "${TEMP_LOCATION}" \
  "${COMMAND}"

The output error was:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 643, in <module>
    raise e
IOError: No files found based on the file pattern gs://my_output_bucket/annotation/**_vep_output.vcf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions