vcf_to_bq running with --run_annotation_pipeline fails to find **_vep_output.vcf files

The input  xxxx.g.vcf.gz file was generated using the BAM to VCF Cromwell pipeline: [https://github.com/broadinstitute/wdl-runner](url)

When I ran vcf_to_bq without --run_annotation_pipeline - it ran fine and BigQuery tables were created.

When I added the '--run_annotation_pipeline true' parameter - 8570 output files were generated, but none had the **_vep_output.vcf extension.  The output file structure was 'annotation/shards/LONG_UUID'  with a single file in each called 'count_20000'.

The command I ran was:

```
#!/bin/bash
# Parameters to replace:
GOOGLE_CLOUD_PROJECT=my_project
GOOGLE_CLOUD_REGION=my_region
TEMP_LOCATION=gs://my_output_bucket/temp
ANNOTATION_LOCATION=gs://my_output_bucket/annotation
INPUT_PATTERN=gs://my_input_bucket/gatk/gatk4-genome-processing-pipeline/output/NA12878.g.vcf.gz
OUTPUT_TABLE=my_project:vcf_to_bq.test_run

COMMAND="vcf_to_bq \
  --input_pattern ${INPUT_PATTERN} \
  --output_table ${OUTPUT_TABLE} \
  --job_name vcf-to-bigquery-09-08-64 \
  --run_annotation_pipeline true \
  --use_allele_num true \
  --max_num_workers 1000 \
  --worker_machine_type n1-standard-64 \
  --annotation_output_dir ${ANNOTATION_LOCATION} \
  --runner DataflowRunner"

docker run -v ~/.config:/root/.config \
  gcr.io/cloud-lifesciences/gcp-variant-transforms \
  --project "${GOOGLE_CLOUD_PROJECT}" \
  --region "${GOOGLE_CLOUD_REGION}" \
  --temp_location "${TEMP_LOCATION}" \
  "${COMMAND}"
```

The output error was:
```
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 643, in <module>
    raise e
IOError: No files found based on the file pattern gs://my_output_bucket/annotation/**_vep_output.vcf
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vcf_to_bq running with --run_annotation_pipeline fails to find **_vep_output.vcf files #655

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vcf_to_bq running with --run_annotation_pipeline fails to find **_vep_output.vcf files #655

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions