Paper: https://aclanthology.org/D19-1383/ Download: https://huggingface.co/datasets/allenai/csabstruct
Run csabstruct.iypnb to download and preprocess the dataset.
CSAbstruct is a dataset of annotated computer science abstracts with sentence labels according to their rhetorical roles. CSAbstruct is collected from the Semantic Scholar corpus. Each sentence is annotated by 5 workers, with one of 5 categories {BACKGROUND, OBJECTIVE, METHOD, RESULT, OTHER}
Label % in Dataset BACKGROUND 33% METHOD 32% RESULT 21% OBJECTIVE 12% OTHER 03%
Statistic Avg ± std Doc length in sentences 6.7 ± 1.99 Sentence length in words 21.8 ± 10.0
The label definitions are written by us. Their paper does not provide any further definitions of the labels. We use the same definitions in pubmed200k and csabstruct.
label_dict = {
"background": "Provides context or previous knowledge relevant to the research topic. Think of it as setting the stage for the study.",
"method": "Describes the procedures and techniques used in the research. This includes the study design, data collection, and analysis methods.",
"objective": "States the main goal or purpose of the research. What question is this work trying to answer?",
"result": "Presents the findings or outcomes of the research. This often includes statistical data, tables, and figures.",
"other": "Any sentence that doesn't fit into the above categories. This could be discussion, analysis, limitations, or concluding remarks.",
}
{'background', 'objective', 'method', 'result', 'other'}
You are a professional annotator specialized in annotating sequences of sentences of a document with the help of provided annotation guidelines. You are always strictly adhering to annotation guidelines making you a valuable partner in all research endevours. Further, you always stick to the desired output format.
You will be given the following information for every annotation task:
- Document: The document to annotate split into a numbered sequence of sentences.
- Additional Instructions: Optionally, the user may provide specific details about the task.
You are a member of the project '{}'. This project is about {}. The project's success depends on your contributions to the annotation process. We count on you!
These annotation guidelines explain all categories in detail. Make sure to use these guidelines during the annotation process.
{}
For all annotation tasks, it is crucial to go through the provided document sentence-by-sentence:
- Read the sentence carefully.
- Think and reason which category fits the sentence best.
- Classify the sentence based on the sentence itself and your reasoning.
In accordance with your strategy, you will answer in the following format for every provided sentence: <sentence_number> - -
e.g. 1 - Fits the definition of Category A - A 2 - Meets the description of Category C - C 3 - Is similar to the definition of Category B - B ...
It is required that you classify every provided sentence. Therefore, the number of output sentences has to match the input sentences.
Please annotate each sentence of the following document with the best fitting category of the Annotation Guidelines.
Document: {}
Additional Instructions: Remember to annotate every provided sentence. You are NOT ALLOWED to use any other category than those provided in the Annotation Guidelines!