Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,20 +333,20 @@ The MS MARCO V2.1 corpora (documents and segmented documents) were derived from
Instructions for downloading the corpus can be found [here](https://trec-rag.github.io/annoucements/2024-corpus-finalization/).
The experiments below capture topics and _passage-level_ qrels for the V2.1 segmented documents corpus.

| | RAG 24 UMBRELA | RAG 24 NIST |
|-------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| baselines | [🔑](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.md) | [🔑](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.md) |
| SPLADE-v3 | [🫙](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.splade-v3.cached.md) [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.splade-v3.onnx.md) | [🫙](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.splade-v3.cached.md) [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.splade-v3.onnx.md) |
| Arctic-embed-l (`shard00`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md) |
| Arctic-embed-l (`shard01`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md) |
| Arctic-embed-l (`shard02`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md) |
| Arctic-embed-l (`shard03`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md) |
| Arctic-embed-l (`shard04`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md) |
| Arctic-embed-l (`shard05`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md) |
| Arctic-embed-l (`shard06`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md) |
| Arctic-embed-l (`shard07`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md) |
| Arctic-embed-l (`shard08`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md) |
| Arctic-embed-l (`shard09`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md) |
| | RAG 24 UMBRELA | RAG 24 NIST | RAG 25 UMBRELA2.0 | RAG 25 NIST |
|-------------------------------------------------|:---:|:---:|:---:|:---:|
| baselines | [🔑](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.md) | [🔑](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.md) | [🔑](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.md) | [🔑](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.md) |
| SPLADE-v3 | [🫙](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.splade-v3.cached.md) [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.splade-v3.onnx.md) | [🫙](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.splade-v3.cached.md) [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.splade-v3.onnx.md) | | |
| Arctic-embed-l (`shard00`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard00.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard00.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx.md) |
| Arctic-embed-l (`shard01`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard01.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard01.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx.md) |
| Arctic-embed-l (`shard02`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard02.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard02.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx.md) |
| Arctic-embed-l (`shard03`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard03.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard03.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx.md) |
| Arctic-embed-l (`shard04`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard04.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard04.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx.md) |
| Arctic-embed-l (`shard05`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard05.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard05.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx.md) |
| Arctic-embed-l (`shard06`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard06.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard06.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx.md) |
| Arctic-embed-l (`shard07`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard07.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard07.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx.md) |
| Arctic-embed-l (`shard08`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard08.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx.md) |
| Arctic-embed-l (`shard09`, flat vector indexes) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard09.flat.onnx.md) | [🅾️](docs/reproduce/from-document-collection/rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx.md) |

Key:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,17 @@ io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard08.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag24-doc-segmented-test-umbrela.arctic-embed-l.parquet.shard09.flat.onnx

io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard00.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard01.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard02.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard03.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard04.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard05.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard06.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard07.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard08.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag25-doc-segmented-test-umbrela2.arctic-embed-l.parquet.shard09.flat.onnx

# MS MARCO V2.1, full indexing - build both doc and doc segmented conditions
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag24-doc-segmented-test-umbrela
io.anserini.reproduce.ReproduceFromDocumentCollection --index --verify --search --config rag24-doc-raggy-dev
Expand Down Expand Up @@ -76,6 +87,17 @@ io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag24-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx

io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard00.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard01.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard02.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard03.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard04.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard05.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard06.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard07.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard08.flat.onnx
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag25-doc-segmented-test-nist.arctic-embed-l.parquet.shard09.flat.onnx

# MS MARCO V2.1 doc segment:
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag24-doc-segmented-test-nist
io.anserini.reproduce.ReproduceFromDocumentCollection --verify --search --config rag24-doc-segmented-test-umbrela.splade-v3.cached
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
corpus: msmarco-v2.1-doc-segmented-shard00.arctic-embed-l
corpus_path: collections/msmarco/msmarco_v2.1_doc_segmented.arctic-embed-l/shard00

index_path: indexes/lucene-flat.msmarco-v2.1-doc-segmented-shard00.arctic-embed-l
index_type: flat
collection_class: ParquetDenseVectorCollection
generator_class: DenseVectorDocumentGenerator
index_threads: 6
index_options: -docidField doc_id -vectorField embedding -normalizeVectors

metrics:
- metric: nDCG@30
command: bin/trec_eval
params: -c -m ndcg_cut.30
separator: "\t"
parse_index: 2
metric_precision: 4
can_combine: false
- metric: nDCG@100
command: bin/trec_eval
params: -c -m ndcg_cut.100
separator: "\t"
parse_index: 2
metric_precision: 4
can_combine: false
- metric: R@100
command: bin/trec_eval
params: -c -m recall.100
separator: "\t"
parse_index: 2
metric_precision: 4
can_combine: false

topic_reader: JsonString
topics:
- name: "RAG 25: Test queries"
id: rag25.test
path: topics.rag25.test.jsonl
qrel: qrels.rag25.test.txt

models:
- name: arctic-embed-l-flat-onnx
display: ArcticEmbedL
type: flat
params: -topics rag25.test -topicReader JsonString -topicField title -encoder ArcticEmbedLEncoder
results:
nDCG@30:
- 0.2702
nDCG@100:
- 0.1803
R@100:
- 0.0567
Loading
Loading