SciPrompt

SciPrompt

SciPrompt is a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks, including few-shot and zero-shot settings.

News

(2025.03.16) Emerging NLP dataset is available on 🤗 Hugging Face!
(2025.03.16) Our datasets can be downloaded here.
(2024.12.01) Download the fine-tuned filtering models:
- Bi-Encoder Model
- Cross-Encoder Model

This project is developed based on the OpenPrompt framework.

Overall Framework

Installation

To install the necessary Python packages, clone this repo and then run the following command:

conda create -n sciprompt python=3.8.12
pip install -r requirements.txt

Prepare the Required Files and Directories

Replace the placeholder paths in the script with actual paths to your data and configuration files:
- --data_dir should point to your data directory
- --verbalizer_path should point to your arXiv_knowledgable_verbalizer.txt
- --semantic_score_path should point to your arXiv_knowledgable_verbalizer_semantic_search_scores.txt
- --doc_id_path should point to your doc_id.txt
- --config_path should point to config/arxiv_label_mappings.json
Prepare your class label dictionary similar to the .json files in the label_mappings folder

Knowledge Retrieval and Filtering

Run through our datasets:
- Step 1: Change paths in run_retrieval.sh and run bash run_retrieval.sh
- Step 2: Change paths of the filtering model, retrieved data (from Step 1), and output files in the run_knowledge_filtering.sh script
- Step 3: Run the filtering script:
```
bash run_knowledge_filtering.sh
```
Run using your own dataset:
- Step 1 and 2 are the same as above
- Step 3: Change your dataset name as custom and corresponding configs into the dataset_configs dictionary in knowledge_filtering.py Line 206
- Run bash run_knowledge_filtering_customized.sh

Run the main script:

Execute scripts for each dataset:

bash run_arxiv.sh
bash run_s2orc.sh
bash run_sdpra.sh

Run on your own data (need two input files: one only contains data, one only has labels, as used in arXiv):

bash run_custom_script.sh

Note: Please modify the required data file paths inside each script before running.

Citation Information

For the use of SciPrompt and Emerging NLP benchmark, please cite:

@inproceedings{you-etal-2024-sciprompt,
    title = "{S}ci{P}rompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics",
    author = "You, Zhiwen  and
      Han, Kanyao  and
      Zhu, Haotian  and
      Ludaescher, Bertram  and
      Diesner, Jana",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.350",
    pages = "6087--6104",
}

Contact Information

If you have any questions, please email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
data		data
knowledge_output		knowledge_output
label_mappings		label_mappings
openprompt		openprompt
pics		pics
scores		scores
LICENSE		LICENSE
README.md		README.md
arXiv_script.py		arXiv_script.py
contextualize_calibration.py		contextualize_calibration.py
custom_script.py		custom_script.py
knowledge_filtering.py		knowledge_filtering.py
requirements.txt		requirements.txt
retrieval.py		retrieval.py
retrieval_utils.py		retrieval_utils.py
run_arxiv.sh		run_arxiv.sh
run_custom_script.sh		run_custom_script.sh
run_knowledge_filtering.sh		run_knowledge_filtering.sh
run_knowledge_filtering_customized.sh		run_knowledge_filtering_customized.sh
run_retrieval.sh		run_retrieval.sh
run_s2orc.sh		run_s2orc.sh
run_sdpra.sh		run_sdpra.sh
s2orc_script.py		s2orc_script.py
sdpra_script.py		sdpra_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SciPrompt

News

Overall Framework

Installation

Prepare the Required Files and Directories

Knowledge Retrieval and Filtering

Run the main script:

Citation Information

Contact Information

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zhiwenyou103/SciPrompt

Folders and files

Latest commit

History

Repository files navigation

SciPrompt

News

Overall Framework

Installation

Prepare the Required Files and Directories

Knowledge Retrieval and Filtering

Run the main script:

Citation Information

Contact Information

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages