Skip to content

[BUG] cuVS ANN benchmarks failing due to HTTP 403 Forbidden error fetching data #724

Open
@jakirkham

Description

@jakirkham

Describe the bug

It looks like the nightly builds of the Docker images have started failing due to an error when fetching data for the cuVS benchmarks. This appears to happen in all Docker images

Have taken the snippet below from this GHA log, but similar errors can be seen in the others

 > [cuvs-bench-datasets 3/3] RUN /home/rapids/cuvs-bench/get_datasets.sh:
0.214     return self._call_chain(*args)
0.214            ^^^^^^^^^^^^^^^^^^^^^^^
0.214   File "/opt/conda/lib/python3.12/urllib/request.py", line 492, in _call_chain
0.214     result = func(*args)
0.214              ^^^^^^^^^^^
0.214   File "/opt/conda/lib/python3.12/urllib/request.py", line 639, in http_error_default
0.214     raise HTTPError(req.full_url, code, msg, hdrs, fp)
0.214 urllib.error.HTTPError: HTTP Error 403: Forbidden
0.214 downloading http://ann-benchmarks.com/deep-image-96-angular.hdf5 -> /home/rapids/preloaded_datasets/deep-image-96-angular.hdf5...
0.214 Cannot download http://ann-benchmarks.com/deep-image-96-angular.hdf5

Steps/Code to reproduce bug

Run the script cuvs-bench/get_datasets.sh. It appears to fail on the first dataset (please see below). However the later ones may also have the same issue

python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize --dataset-path /home/rapids/preloaded_datasets

Expected behavior

The benchmark datasets are retrieved.

Environment details (please complete the following information):

  • Environment location: Docker (on CI)
  • Method of cuDF install: Conda in Docker build (reproducible with any image or just the script above)
    • If method of install is [Docker], provide docker pull & docker run commands used
  • Please run and attach the output of the cudf/print_env.sh script to gather relevant environment details

Not seeing where cudf/print_env.sh is run on CI. Where should we be looking? Or should we add this to our CI scripts?

In any event there is a bunch of diagnostic information in the log. Though suspect this is as simple as the URL changing or us needing some additional authentication to get the data

Additional context

Not that I can think of

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions