Skip to content

regions has no effect when streaming BCF #248

@cmdoret

Description

@cmdoret

Hello and thanks for providing this excellent implementation of htsget!
We are trying to use it together with a minio S3 storage and so far it worked well, however we noticed that when requesting a specific region, the /variants endpoint returned all records regardless. I believe this is a bug in htsget-rs based on the server logs, but I may also have mis-interpreted them or mis-used htsget-rs. Do you have any advice or suggestion on where the issue might be?

Environment:

Steps to reproduce:

  1. Download example file https://github.com/vcflib/vcflib/blob/master/samples/sample.vcf
  2. Process file into bcf:
bgzip sample.vcf
bcftools index sample.vcf.gz 
bcftools convert -Ob -o abc.bcf sample.vcf.gz
bcftools index abc.bcf
bcftools index -s abc.bcf.csi
19      .       2
20      .       6
X       .       1
  1. Upload bcf + csi index into s3 bucket
  2. Send a GET request to htsget for contig "19"
  3. Send a GET request to htsget for contig "20"

Observed behaviour: Both queries returned all variant records from all contigs.
Expected behaviour: Only variants of the requested chromosome are returned.

Observations:

Log from the contig 19 request shows that the query was parsed properly, and that segments (10,16) were requested.

2024-05-29T11:42:02.765895Z  INFO HTTP request{http.method=GET http.route=/variants/{id:.+} http.flavor=1.0 http.scheme=http http.host=htsget:8080 http.client_ip=172.23.0.5
http.user_agent=python-requests/2.31.0 http.target=/variants/ex/abc?referenceName=19&start=1000&end=5000&format=BCF otel.name=HTTP GET /variants/{id:.+} otel.kind="server" request_id=a9d7330
0-be43-4965-b972-ebfbef74041c}:variants{request=Query({"end": "5000", "referenceName": "19", "format": "BCF", "start": "1000"}) path=Path("ex/abc") http_request=
HttpRequest HTTP/1.0 GET:/variants/ex/abc
  query: ?"referenceName=19&start=1000&end=5000&format=BCF"
  params: Path { path: Url { uri: /variants/ex/abc?referenceName=19&start=1000&end=5000&format=BCF, path: None }, skip: 16, segments: [("id", Segment(10, 16))] }
  headers:
    "host": "htsget:8080"
    "connection": "close"
    "accept": "*/*"
    "accept-encoding": "gzip, deflate, br"
    "user-agent": "python-requests/2.31.0"
}: htsget_actix::handlers::get: variants endpoint GET request request=Request { path: "ex/abc", query: {"end": "5000", "referenceName": "19", "format": "BCF", "start": "1000
"}, headers: {"host": "htsget:8080", "connection": "close", "accept": "*/*", "accept-encoding": "gzip, deflate, br", "user-agent": "python-requests/2.31.0"} }

Logs from the contig 20 query show that the same segments (10,16) were requested, although the query is different. I am not sure whether I interpret this properly.

2024-05-29T11:56:32.734937Z  INFO HTTP request{http.method=GET http.route=/variants/{id:.+} http.flavor=1.0 http.scheme=http http.host=htsget:8080 http.client_ip=172.23.0.5
http.user_agent=python-requests/2.31.0 http.target=/variants/ex/abc?referenceName=20&start=1000&end=5000&format=BCF otel.name=HTTP GET /variants/{id:.+} otel.kind="server" request_id=baa2d58
8-3098-4ff5-acc7-decb9490403b}:variants{request=Query({"referenceName": "20", "start": "1000", "format": "BCF", "end": "5000"}) path=Path("ex/abc") http_request=
HttpRequest HTTP/1.0 GET:/variants/ex/abc
  query: ?"referenceName=20&start=1000&end=5000&format=BCF"
   params: Path { path: Url { uri: /variants/ex/abc?referenceName=20&start=1000&end=5000&format=BCF, path: None }, skip: 16, segments: [("id", Segment(10, 16))] }
  headers:
    "host": "htsget:8080"
    "user-agent": "python-requests/2.31.0"
    "accept": "*/*"
    "accept-encoding": "gzip, deflate, br"
    "connection": "close"
}: htsget_actix::handlers::get: variants endpoint GET request request=Request { path: "ex/abc", query: {"referenceName": "20", "start": "1000", "format": "BCF", "end": "5000
"}, headers: {"host": "htsget:8080", "user-agent": "python-requests/2.31.0", "accept": "*/*", "accept-encoding": "gzip, deflate, br", "connection": "close"} }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions