Skip to content

Latest commit

 

History

History

benchmark_collection

This directory includes the scripts to create the innovation benchmark dataset. Use it in 3 steps:

  • Create a list of paper titles or keywords in the papers_to_search file.
  • Run python 0_crawl_paper.py to collect papers and the related meta data.
  • Run 1_create_inno_graph.py to get the final innovation dataset.

Here are some important notes:

  • If you want to collect papers related to some keywords, change exact_match to False in 0_crawl_paper.py.
  • You need to set your OpenAI key globally, or edit utils/openai_utils.py.
  • Your final results is stored at innovation_graph/innovation_graph_final.json.
  • The innovation dataset we used in our evaluation is merged_papers_with_fields.json. It has gone through some additional manual revisement.