Repository for End term submission for Information Retrieval course (CS60092) offered in Spring semester 2023, Department of CSE, IIT Kharagpur.
Research for research papers
Report Bug
Β·
Request Feature
Table of Contents
This project is an attempt of implementing and improving on the work of Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani titled CSFCube - A Test Collection of Computer Science Papers for Faceted Query by Example
The dataset can be found here
The paper describing the dataset can be accessed here
Demo video:
Team members:
- Ashwani Kumar Kamal - 20CS10011
- Hardik Pravin Soni - 20CS30023
- Shiladitya De - 20CS30061
- Sourabh Soumyakanta Das - 20CS30051
A quick introduction of the minimal setup you need to get the application up
pip install -r requirements.txt
streamlit run deploy.py-
Any
.ipynbfiles that need to be run must be placed in this root directory which will contain the/datadirectory and/Resultsdirectory. -
The
datadirectory contains the CSFCube dataset
.
βββ abstracts-csfcube-preds.json
βββ abstracts-csfcube-preds.jsonl
βββ abstracts-csfcube-preds-no-unicode.jsonl
βββ evaluation_splits.json
βββ test-pid2anns-csfcube-background.json
βββ test-pid2anns-csfcube-method.json
βββ test-pid2anns-csfcube-result.json
βββ test-pid2pool-csfcube.json- The
Resultsdirectory contains the embeddings generated from the models used
.
βββ alberta
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-alberta-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-alberta-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-alberta-result-ranked.json
βββ allenai_specter
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-allenai_specter-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-allenai_specter-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-allenai_specter-result-ranked.json
βββ all_mpnet_base_v2
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-all_mpnet_base_v2-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-all_mpnet_base_v2-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-all_mpnet_base_v2-result-ranked.json
βββ bert_nli
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-bert_nli-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-bert_nli-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-bert_nli-result-ranked.json
βββ bert_pp
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-bert_pp-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-bert_pp-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-bert_pp-result-ranked.json
βββ distilbert_nli
βΒ Β βββ all.json
βΒ Β βββ background.json
βΒ Β βββ method.json
βΒ Β βββ result.json
βΒ Β βββ test-pid2pool-csfcube-distilbert_nli-background-ranked.json
βΒ Β βββ test-pid2pool-csfcube-distilbert_nli-method-ranked.json
βΒ Β βββ test-pid2pool-csfcube-distilbert_nli-result-ranked.json
βββ ensemble
βββ test-pid2pool-csfcube-ensemble-background-ranked.json
βββ test-pid2pool-csfcube-ensemble-method-ranked.json
βββ test-pid2pool-csfcube-ensemble-result-ranked.jsonThis notebook contains the code for generating embeddings from the base models. Avoid running it as it takes a long time to run. The embeddings are already provided in the Googe Drive of IR Submission Files.
This is for the fine tuning of the Distilbert model. The results are already present in it. Avoid ruuning it as it takes a long time.
Run each cell of this jupyter notebook and at the second last cell change the queries as per choice and then run both the cells (itself and after it) and it gives the results.
Apart rom all this We are also submitting a zip of the local copies and reports of the .ipynb files which can be run locally. [Note] Please change the file directories strings in the notebooks appropriately to avoid any errors.