Skip to content

SCIDOCS coread coview dataset questions #64

@psw0021

Description

@psw0021

Hello,
I have some questions regarding SCIDOCS subset data from scirepeval benchmark.

First, it seems that when I have tried to access several queries and candidates' abstract and title data using huggingface scidocs_view_cite_read data from scidocs_view and scidocs_read, some of them don't seem to exist in scidocs_view_cite_read_data, which contains document id, its title and corresponding abstracts. Since paper details do not exist in scidocs_view and scidocs_read split, I tried to access document details using the document id in scidocs_view and scidocs_read split and match them with ones in scidocs_view_cite_read_data. I was curious if I am using the dataset in a right way, or whether the dataset was originally constructed in such way. It seems that scidocs_cite and scidocs_cocite have perfect matches, and scidocs_view, scidocs_read are the splits that seem to be problematic.

Secondly, it seems that several candidates do not contain abstract and only have the titles. I am trying to make sure that such setting is okay.

Sincerely
Thank You.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions