-
Notifications
You must be signed in to change notification settings - Fork 57
Licenses for datasets
Ricardo Usbeck edited this page Mar 23, 2023
·
1 revision
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | LDC | en |
- https://catalog.ldc.upenn.edu/LDC2005T09
- Available at: https://cogcomp.cs.illinois.edu/page/resource_view/4
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | CoNLL Licence | en |
- https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter for the AIDA/CoNLL dataset expects the following file
gerbil_data/datasets/aida/AIDA-YAGO2-dataset-update.tsv
The adapter is working with the original AIDA-YAGO2-dataset.tsv file as well. The differences between the original and the updated file seem to be the replacement of YAGO URL paths with IDs. However, our adapter does not use these values.
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | LDC User Agreement for Non-Members | en |
- https://catalog.ldc.upenn.edu/LDC2002T31
- Graff, D. 2002. The AQUAINT corpus of English news text. Technical report, Linguistic Data Consortium, Philadelphia, PA, USA.
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter for the AQUAINT dataset expects the following folders
gerbil_data/datasets/AQUAINT/RawTexts
gerbil_data/datasets/AQUAINT/Problems
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | CC BY 4.0 | en |
- http://www.yovisto.com/labs/ner-benchmarks/
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts | CC BY 4.0 | en |
- http://www.derczynski.com/sheffield/resources/ipm_nel.tar.gz
- Needs to be added to gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | mixed | Public Domain | en |
- http://www.cse.iitb.ac.in/~soumen/doc/CSAW/Annot/
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | CC BY 4.0 | en |
- http://www.yovisto.com/labs/ner-benchmarks/
- J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum, \KORE & Keyphrase Overlap Relatedness for Entity Disambiguation," presented at the Proceedings of the 21set ACM International Conference on Information and Knowledge Management, CIKM 2012, Hawaii, USA, 2012.
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| RT2KB | microposts | CC BY-NC-SA 3.0 | en |
- http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/MSM2013-CEChallengeFinal.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2013/goldStandard.tsv
gerbil_data/datasets/microposts2013/testSet.tsv
gerbil_data/datasets/microposts2013/TweetsTrainingSetCH.tsv
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts | Twitter license | en |
- http://www.scc.lancs.ac.uk/microposts2014/challenge/dataset/microposts2014-neel_challenge_gs.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTestSet.csv
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTrainingSet.csv
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts | CC BY 4.0 | en |
- Needs to be added to gerbil_data.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-gold_v3.tsv
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-tweets.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-gold_v2.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-tweets.tsv
gerbil_data/datasets/microposts2015/training/NEEL2015-training-gold_v4.ts
gerbil_data/datasets/microposts2015/training/NEEL2015-training-tweets_v2.tsv
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts | CC BY 4.0 | en |
- Needs to be added to gerbil_data.zip
- This dataset is not included in gerbil_data.zip. Installation: the implemented adapter expects the following files
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev.tsv
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev_neel.gs
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test.tsv
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test_neel.gs
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training.tsv
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training_neel.gs
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | - | en |
- http://cogcomp.cs.illinois.edu/page/resource_view/4
- http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Proc. EMNLP and CNLL, 708–716, 2007.
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | news | CC-by-SA-NC 4.0 International License | en |
- https://github.com/AKSW/n3-collection
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | RSS-feeds | CC-by-SA-NC 4.0 International License | en |
- https://github.com/AKSW/n3-collection
- This dataset is already included in gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| RT2KB | news | GNU v3 | en |
- https://github.com/aritter/twitter_nlp/blob/master/data/annotated/ner.txt
- This dataset needs to be included into gerbil_data.zip
| Task | Type | License | Language |
|---|---|---|---|
| ERec | mixed | Public Domain | en |
- http://www.hipposmond.com/senseval2/Results/guidelines.htm#rawdata
- However, the corpora and corpus samples may be subject to copyright restrictions depending on the source.
| Task | Type | License | Language |
|---|---|---|---|
| ERec | mixed | Public Domain | en |
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts (Twitter) | CC-BY(?) | en |
- Locke, B. and Martin, J. (2009). Named entity recognition: Adapting to microblogging. Senior Thesis, University of Colorado.
| Task | Type | License | Language |
|---|---|---|---|
| A2KB | microposts (Twitter) | CC-BY(?) | en |
- Habib, M. B. and van Keulen, M. (2012). Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), pages 1–10.
| Task | Type | License | Language |
|---|---|---|---|
| RT2KB | news | BSD 2 | en |
| Task | Type | License | Language |
|---|---|---|---|
| C2KB | microposts | - | en |