SCIDOCS coread coview dataset questions

Hello, 
I have some questions regarding SCIDOCS subset data from scirepeval benchmark.

First, it seems that when I have tried to access several queries and candidates' abstract and title data using huggingface scidocs_view_cite_read data from scidocs_view and scidocs_read, some of them don't seem to exist in scidocs_view_cite_read_data, which contains document id, its title and corresponding abstracts. Since paper details do not exist in scidocs_view and scidocs_read split, I tried to access document details using the document id in scidocs_view and scidocs_read split and match them with ones in scidocs_view_cite_read_data. I was curious if I am using the dataset in a right way, or whether the dataset was originally constructed in such way. It seems that scidocs_cite and scidocs_cocite have perfect matches, and scidocs_view, scidocs_read are the splits that seem to be problematic.

Secondly, it seems that several candidates do not contain abstract and only have the titles. I am trying to make sure that such setting is okay.

Sincerely
Thank You.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCIDOCS coread coview dataset questions #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SCIDOCS coread coview dataset questions #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions