Reproducing CEDR-KNRM results on ANTIQUE

Hello,
I'm trying to reproduce results from the [OpenNIR paper](https://dl.acm.org/doi/pdf/10.1145/3336191.3371864) using the Vanilla BERT and CEDR-KNRM models on the ANTIQUE dataset.

Taking my cues from the [wsdm2020_demo.sh script](https://github.com/Georgetown-IR-Lab/OpenNIR/blob/master/scripts/wsdm2020_demo.sh), I trained my models as follow:

1. First I fine-tuned and tested a Vanilla BERT model:
```
BERT_MODEL_PARAMS="trainer.grad_acc_batch=1 valid_pred.batch_size=4 test_pred.batch_size=4"
python -m onir.bin.pipeline config/antique config/vanilla_bert $BERT_MODEL_PARAMS 
python -m onir.bin.pipeline config/antique config/vanilla_bert $BERT_MODEL_PARAMS  pipeline.test=true
```
Which produced the following results: `test  epoch=60 judged@10=0.6110 map_rel-3=0.2540 [mrr_rel-3=0.7288] p_rel-3@1=0.6450 p_rel-3@3=0.4917`
However, published results for Vanilla BERT are as follow:
- MAP: 0.2801
- MRR: 0.7101
- P@1: 0.5950
- P@3: 0.4967

2. I then initialized a CEDR-KNRM model using weights from the fine-tuned Vanilla BERT model and trained and tested it:
```
MODEL_PATH=[PATH_TO_FINE_TUNED_BERT]/60.p
BERT_MODEL_PARAMS="trainer.grad_acc_batch=1 valid_pred.batch_size=4 test_pred.batch_size=4"

python -m onir.bin.extract_bert_weights config/antique config/vanilla_bert $BERT_MODEL_PARAMS pipeline.bert_weights=$MODEL_PATH pipeline.overwrite=True
python -m onir.bin.pipeline config/antique config/cedr/knrm $BERT_MODEL_PARAMS vocab.bert_weights=$MODEL_PATH pipeline.overwrite=True
python -m onir.bin.pipeline config/antique config/cedr/knrm $BERT_MODEL_PARAMS vocab.bert_weights=$MODEL_PATH pipeline.test=true
```

Which produced the following results: `test  epoch=30 judged@10=0.6030 map_rel-3=0.2563 [mrr_rel-3=0.7302] p_rel-3@1=0.6400 p_rel-3@3=0.5083`
However, published results for CEDR-KNRM are as follow:
- MAP: 0.2861
- MRR: 0.7238
- P@1: 0.6300
- P@3: 0.4933

According to the logs, I understand that the inference is deterministic (`[trainer:pairwise][DEBUG] using GPU (deterministic)`).
Could anyone let me know what I am doing wrong?
Where does the differences come from (especially w.r.t. MAP)?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducing CEDR-KNRM results on ANTIQUE #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing CEDR-KNRM results on ANTIQUE #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions