This repository contains notebooks and data from an exploratory study of the use of generative AI in research data processing.
We explore three research projects which involve complex data processing tasks:
- Seedlists: We extract plant species names from historical seedlists (catalogues of seeds) published by botanical gardens. This is an information extraction task.
- Health Technology Assessment (HTA) documents: We extract certain data points (name of drug, name of health indication, relative effectiveness, cost effectiveness, etc.) from documents published by HTA organisations in the EU. This is a natural language understanding task.
- Kickstarter: We assign industry codes to projects on the crowdfunding website Kickstarter. This is a text classification task.
.
├── .gitignore
├── LICENSE
├── README.md
├── seedlists
│ ├── data
│ └── notebooks
├── hta
│ ├── data
│ └── notebooks
├── kickstarter
│ ├── data
│ └── notebooks
This project is licensed under the terms of the MIT License.