Skip to content

Commit 639d4bb

Browse files
author
Yuanyuan Tian (from Dev Box)
committed
move benchmark datasets to separate repo (YuanyuanTian-hh/diskann-benchmark-data)
1 parent 5b0e7f7 commit 639d4bb

2 files changed

Lines changed: 8 additions & 4 deletions

File tree

.github/workflows/benchmarks-aa.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,13 @@ jobs:
7676
sudo apt-get install -y openssl libssl-dev pkg-config
7777
7878
# Download pre-packaged Wikipedia-100K dataset from GitHub Release
79+
# Source: https://github.com/harsha-simhadri/big-ann-benchmarks
7980
- name: Download wikipedia-100K dataset
8081
env:
8182
GH_TOKEN: ${{ github.token }}
8283
run: |
8384
mkdir -p diskann_rust/target/tmp baseline/target/tmp
84-
gh release download benchmark-data-v1 --repo ${{ github.repository }} --pattern 'wikipedia-100K.tar.gz' --dir .
85+
gh release download v1 --repo YuanyuanTian-hh/diskann-benchmark-data --pattern 'wikipedia-100K.tar.gz' --dir .
8586
tar xzf wikipedia-100K.tar.gz -C diskann_rust/target/tmp/
8687
cp -r diskann_rust/target/tmp/wikipedia_cohere baseline/target/tmp/
8788
@@ -181,12 +182,13 @@ jobs:
181182
sudo apt-get install -y openssl libssl-dev pkg-config
182183
183184
# Download pre-packaged OpenAI ArXiv 100K dataset from GitHub Release
185+
# Source: https://github.com/harsha-simhadri/big-ann-benchmarks
184186
- name: Download openai-100K dataset
185187
env:
186188
GH_TOKEN: ${{ github.token }}
187189
run: |
188190
mkdir -p diskann_rust/target/tmp baseline/target/tmp
189-
gh release download benchmark-data-v1 --repo ${{ github.repository }} --pattern 'openai-100K.tar.gz' --dir .
191+
gh release download v1 --repo YuanyuanTian-hh/diskann-benchmark-data --pattern 'openai-100K.tar.gz' --dir .
190192
tar xzf openai-100K.tar.gz -C diskann_rust/target/tmp/
191193
cp -r diskann_rust/target/tmp/OpenAIArXiv baseline/target/tmp/
192194

.github/workflows/benchmarks.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,12 +93,13 @@ jobs:
9393
9494
# Download pre-packaged Wikipedia-100K dataset from GitHub Release
9595
# Dataset: 100K Cohere Wikipedia embeddings (768-dim, float32, cosine distance)
96+
# Source: https://github.com/harsha-simhadri/big-ann-benchmarks
9697
- name: Download wikipedia-100K dataset
9798
env:
9899
GH_TOKEN: ${{ github.token }}
99100
run: |
100101
mkdir -p diskann_rust/target/tmp baseline/target/tmp
101-
gh release download benchmark-data-v1 --repo ${{ github.repository }} --pattern 'wikipedia-100K.tar.gz' --dir .
102+
gh release download v1 --repo YuanyuanTian-hh/diskann-benchmark-data --pattern 'wikipedia-100K.tar.gz' --dir .
102103
tar xzf wikipedia-100K.tar.gz -C diskann_rust/target/tmp/
103104
cp -r diskann_rust/target/tmp/wikipedia_cohere baseline/target/tmp/
104105
@@ -203,12 +204,13 @@ jobs:
203204
204205
# Download pre-packaged OpenAI ArXiv 100K dataset from GitHub Release
205206
# Dataset: 100K OpenAI embeddings of ArXiv papers (1536-dim, float32, euclidean distance)
207+
# Source: https://github.com/harsha-simhadri/big-ann-benchmarks
206208
- name: Download openai-100K dataset
207209
env:
208210
GH_TOKEN: ${{ github.token }}
209211
run: |
210212
mkdir -p diskann_rust/target/tmp baseline/target/tmp
211-
gh release download benchmark-data-v1 --repo ${{ github.repository }} --pattern 'openai-100K.tar.gz' --dir .
213+
gh release download v1 --repo YuanyuanTian-hh/diskann-benchmark-data --pattern 'openai-100K.tar.gz' --dir .
212214
tar xzf openai-100K.tar.gz -C diskann_rust/target/tmp/
213215
cp -r diskann_rust/target/tmp/OpenAIArXiv baseline/target/tmp/
214216

0 commit comments

Comments
 (0)