A platform to create crowd-sourced gene function gold standards with Amazon Mechanical Turk
- Make sure you have all requirements: python2, pipenv, and java (tested on openjdk 1.8, used for NobleCoder).
- Download the repository
- Change into it and
pipenv installpython dependencies - Launch NobleCoder from
tools/NobleCoder-1.0.jarand import the Gene Ontology (download from here) under the namego. Theprocess.pyscript will run NobleCoder on your abstracts and tell it to use the Ontology "go", so if you choose a different name you will have to adapt the script.
- Put the Pubmed IDs of the abstracts you're interested in into
data/pmid_list.txt - Run
pipenv run python process.py - Output is in
data/abstractsanddata/brat-input. Put all files from these folders together in the same folder of your brat installation. In that same folder you will also need a fileannotation.confthat could look like this (more information here):There will also be a file[entities] Gene Function [relations] Does Arg1:Gene, Arg2:Function Does Arg1:Function, Arg2:Gene DoesNot Arg1:Function, Arg2:Gene DoesNot Arg1:Gene, Arg2:Function [attributes] [events]data/statistics.cvscontaining the number of words, genes, and functions for each abstract.