-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Hello,
I'm trying to reproduce results from the OpenNIR paper using the Vanilla BERT and CEDR-KNRM models on the ANTIQUE dataset.
Taking my cues from the wsdm2020_demo.sh script, I trained my models as follow:
- First I fine-tuned and tested a Vanilla BERT model:
BERT_MODEL_PARAMS="trainer.grad_acc_batch=1 valid_pred.batch_size=4 test_pred.batch_size=4"
python -m onir.bin.pipeline config/antique config/vanilla_bert $BERT_MODEL_PARAMS
python -m onir.bin.pipeline config/antique config/vanilla_bert $BERT_MODEL_PARAMS pipeline.test=true
Which produced the following results: test epoch=60 judged@10=0.6110 map_rel-3=0.2540 [mrr_rel-3=0.7288] p_rel-3@1=0.6450 p_rel-3@3=0.4917
However, published results for Vanilla BERT are as follow:
- MAP: 0.2801
- MRR: 0.7101
- P@1: 0.5950
- P@3: 0.4967
- I then initialized a CEDR-KNRM model using weights from the fine-tuned Vanilla BERT model and trained and tested it:
MODEL_PATH=[PATH_TO_FINE_TUNED_BERT]/60.p
BERT_MODEL_PARAMS="trainer.grad_acc_batch=1 valid_pred.batch_size=4 test_pred.batch_size=4"
python -m onir.bin.extract_bert_weights config/antique config/vanilla_bert $BERT_MODEL_PARAMS pipeline.bert_weights=$MODEL_PATH pipeline.overwrite=True
python -m onir.bin.pipeline config/antique config/cedr/knrm $BERT_MODEL_PARAMS vocab.bert_weights=$MODEL_PATH pipeline.overwrite=True
python -m onir.bin.pipeline config/antique config/cedr/knrm $BERT_MODEL_PARAMS vocab.bert_weights=$MODEL_PATH pipeline.test=true
Which produced the following results: test epoch=30 judged@10=0.6030 map_rel-3=0.2563 [mrr_rel-3=0.7302] p_rel-3@1=0.6400 p_rel-3@3=0.5083
However, published results for CEDR-KNRM are as follow:
- MAP: 0.2861
- MRR: 0.7238
- P@1: 0.6300
- P@3: 0.4933
According to the logs, I understand that the inference is deterministic ([trainer:pairwise][DEBUG] using GPU (deterministic)).
Could anyone let me know what I am doing wrong?
Where does the differences come from (especially w.r.t. MAP)?