+
+
+
+
+
+
+
Abaza treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Akkadian treebanks
+
+
+
+
+ See
here for comparative statistics of Akkadian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Amharic treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Ancient Greek treebanks
+
+
+
+
+ See
here for comparative statistics of Ancient Greek treebanks.
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Arabic treebanks
+
+
+
+
+ See
here for comparative statistics of Arabic treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Bambara treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Basque treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Cantonese treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Chinese treebanks
+
+
+
+
+ See
here for comparative statistics of Chinese treebanks.
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Czech treebanks
+
+
+
+
+ See
here for comparative statistics of Czech treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Danish treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Dutch treebanks
+
+
+
+
+ See
here for comparative statistics of Dutch treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
English treebanks
+
+
+
+
+ See
here for comparative statistics of English treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Estonian treebanks
+
+
+
+
+ See
here for comparative statistics of Estonian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Faroese treebanks
+
+
+
+
+ See
here for comparative statistics of Faroese treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Finnish treebanks
+
+
+
+
+ See
here for comparative statistics of Finnish treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
French treebanks
+
+
+
+
+ See
here for comparative statistics of French treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Galician treebanks
+
+
+
+
+ See
here for comparative statistics of Galician treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
German treebanks
+
+
+
+
+ See
here for comparative statistics of German treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Gothic treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Hindi treebanks
+
+
+
+
+ See
here for comparative statistics of Hindi treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Hindi English treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Hittite treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Hungarian treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Icelandic treebanks
+
+
+
+
+ See
here for comparative statistics of Icelandic treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Indonesian treebanks
+
+
+
+
+ See
here for comparative statistics of Indonesian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Italian treebanks
+
+
+
+
+ See
here for comparative statistics of Italian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Japanese treebanks
+
+
+
+
+ See
here for comparative statistics of Japanese treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Kazakh treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Komi Zyrian treebanks
+
+
+
+
+ See
here for comparative statistics of Komi Zyrian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Korean treebanks
+
+
+
+
+ See
here for comparative statistics of Korean treebanks.
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Kurmanji treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Latin treebanks
+
+
+
+
+ See
here for comparative statistics of Latin treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Laz treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Lithuanian treebanks
+
+
+
+
+ See
here for comparative statistics of Lithuanian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Magahi treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Marathi treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Mbya Guarani treebanks
+
+
+
+
+ See
here for comparative statistics of Mbya Guarani treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Middle Irish treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Norwegian treebanks
+
+
+
+
+ See
here for comparative statistics of Norwegian treebanks.
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Old Church Slavonic treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Old East Slavic treebanks
+
+
+
+
+ See
here for comparative statistics of Old East Slavic treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Persian treebanks
+
+
+
+
+ See
here for comparative statistics of Persian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Polish treebanks
+
+
+
+
+ See
here for comparative statistics of Polish treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Portuguese treebanks
+
+
+
+
+ See
here for comparative statistics of Portuguese treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Romanian treebanks
+
+
+
+
+ See
here for comparative statistics of Romanian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Russian treebanks
+
+
+
+
+ See
here for comparative statistics of Russian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Sanskrit treebanks
+
+
+
+
+ See
here for comparative statistics of Sanskrit treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Sindhi treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Slovenian treebanks
+
+
+
+
+ See
here for comparative statistics of Slovenian treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Spanish treebanks
+
+
+
+
+ See
here for comparative statistics of Spanish treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Swedish treebanks
+
+
+
+
+ See
here for comparative statistics of Swedish treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Swedish Sign Language treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Tagalog treebanks
+
+
+
+
+ See
here for comparative statistics of Tagalog treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Tamil treebanks
+
+
+
+
+ See
here for comparative statistics of Tamil treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
Telugu treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Thai treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Turkish treebanks
+
+
+
+
+ See
here for comparative statistics of Turkish treebanks.
+
+
+
Language documentation
+
+
+ See the
language documentation page.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Ukrainian treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Uyghur treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
Vietnamese treebanks
+
+
+
+
+
+
Language documentation
+
+
+ The language hub documentation has not yet been created or ported from the UDv1 documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/setup.py b/setup.py
index 4e325514a..ef4b23d23 100644
--- a/setup.py
+++ b/setup.py
@@ -9,7 +9,7 @@ def parse_requirements(filename, session=None):
setuptools.setup(
name="nlpcube",
- version="0.1.0.8",
+ version="0.3.1.0",
author="Multiple authors",
author_email="tiberiu44@gmail.com",
description="Natural Language Processing Toolkit with support for tokenization, sentence splitting, lemmatization, tagging and parsing for more than 60 languages",
diff --git a/test.py b/test.py
new file mode 100644
index 000000000..0590afa45
--- /dev/null
+++ b/test.py
@@ -0,0 +1,76 @@
+import sys
+
+sys.path.append('')
+import torch
+from cube.networks.tokenizer import Tokenizer
+from cube.networks.tagger import Tagger
+from cube.io_utils.config import TokenizerConfig, TaggerConfig, ParserConfig
+from cube.io_utils.encodings import Encodings
+from cube.networks.utils_tokenizer import TokenCollateFTLanguasito, TokenCollateHF
+from cube.networks.lm import LMHelperFT, LMHelperHF
+from cube.networks.utils import MorphoCollate
+from cube.networks.parser import Parser
+
+# # tokenizer
+# enc = Encodings()
+# enc.load('data/tokenizer-ro-transformer.encodings')
+#
+# config = TokenizerConfig()
+# tokenizer = Tokenizer(config, enc, language_codes=['ro_nonstandard', 'ro_rrt'], ext_word_emb=[768 for _ in range(13)])
+#
+# model = torch.load('data/tokenizer-ro-transformer.ro_rrt.sent', map_location='cpu')
+#
+# tokenizer.load_state_dict(model['state_dict'])
+#
+# collate = TokenCollateHF(enc, lm_model='xlm-roberta-base', lm_device='cpu')
+# text = open('corpus/ud-treebanks-v2.5/UD_Romanian-RRT/ro_rrt-ud-test.txt').read()
+# # text = 'Și eu am mere. Ana are mere, dar nu are pere. Acesta este un test.'
+# d = tokenizer.process(text, collate, lang_id=1, batch_size=4)
+# tokenizer
+enc = Encodings()
+enc.load('data/tokenizer-ro-fasttext.encodings')
+
+config = TokenizerConfig()
+tokenizer = Tokenizer(config, enc, language_codes=['ro_rrt'], ext_word_emb=[300])
+
+model = torch.load('data/tokenizer-ro-fasttext.ro_rrt.sent', map_location='cpu')
+
+tokenizer.load_state_dict(model['state_dict'])
+
+collate = TokenCollateFTLanguasito(enc, lm_model='fasttext:ro', lm_device='cpu')
+text = open('corpus/ud-treebanks-v2.5/UD_Romanian-RRT/ro_rrt-ud-test.txt').read()
+# text = 'Și eu am mere. Ana are mere, dar nu are pere. Acesta este un test.'
+d = tokenizer.process(text, collate, lang_id=0, batch_size=4)
+for ii in range(len(d.sentences)):
+ d.sentences[ii].lang_id = 1
+
+# helper = LMHelperFT(model='ro')
+# helper.apply(d)
+
+# # tagger
+# enc = Encodings()
+# enc.load('data/tagger-ro-fasttext.encodings')
+# model = torch.load('data/tagger-ro-fasttext.ro_rrt.upos', map_location='cpu')
+# config = TaggerConfig()
+# config.load('data/tagger-ro-fasttext.config')
+# tagger = Tagger(config, enc, ext_word_emb=helper.get_embedding_size(), language_codes=['ro_nonstandard', 'ro_rrt'])
+# tagger.load_state_dict(model['state_dict'])
+# collate = MorphoCollate(enc)
+# d = tagger.process(d, collate)
+
+# parser
+# del helper
+helper = LMHelperHF(model='xlm-roberta-base')
+helper.apply(d)
+enc = Encodings()
+enc.load('data/parser-ro-transformer.encodings')
+collate = MorphoCollate(enc)
+model = torch.load('data/parser-ro-transformer.ro_rrt.uas', map_location='cpu')
+config = ParserConfig()
+config.load('data/parser-ro-transformer.config')
+parser = Parser(config, enc, ext_word_emb=helper.get_embedding_size(), language_codes=['ro_nonstandard', 'ro_rrt'])
+parser.load_state_dict(model['state_dict'])
+d = parser.process(d, collate)
+
+print(d)
+print("")
diff --git a/tests/.gitignore b/tests/.gitignore
deleted file mode 100644
index a9f2fb3bd..000000000
--- a/tests/.gitignore
+++ /dev/null
@@ -1,4 +0,0 @@
-scratch/
-scratch/*
-my_model-1.0/
-*.zip
\ No newline at end of file
diff --git a/tests/README.md b/tests/README.md
deleted file mode 100644
index 4d3a97c8f..000000000
--- a/tests/README.md
+++ /dev/null
@@ -1,21 +0,0 @@
-# NLP-Cube Tests
-
-To perform automatic testing, simply run (from the main folder):
-
-
-```
-python3 tests/main_test.py
-```
-
-
-and
-
-
-```
-python3 tests/api_test.py
-```
-
-
-Please run them in this sequence as ``main_test.py`` creates a local model that ``api_test.py`` expects to find.
-
-
diff --git a/tests/api_tests.py b/tests/api_tests.py
deleted file mode 100644
index bd476f307..000000000
--- a/tests/api_tests.py
+++ /dev/null
@@ -1,282 +0,0 @@
-"""
-This class should test:
-
-0. Init the model_store object and list online models
-1. Download and run an online model
-2. Run a local model (should be created with main_tests.py before in tests/my_model-1.0)
-3.1. Package a local model *without* embeddings link set in metadata.json
-3.2. Import it into NLP-Cube
-3.3. Run the local model with manual embeddings
-
-4.1. Package a local model *with* embeddings link set in metadata.json
-4.2. Import it into NLP-Cube
-4.3. Run the local model without manual embeddings
-
-"""
-import os, sys, subprocess
-import unittest
-
-class Api_Tests(unittest.TestCase):
-
- def setUp(self):
- # get current directory
- self.root_path = os.path.dirname(os.path.realpath(__file__))
- self.root_path = os.path.abspath(os.path.join(self.root_path, os.pardir))
- self.main_file_path = os.path.join(self.root_path, "cube", "main.py")
- self.scripts_path = os.path.join(self.root_path, "scripts")
- self.corpus_path = os.path.join(self.root_path, "tests", "test_corpus")
- self.model_path = os.path.join(self.root_path, "tests", "my_model-1.0")
- self.local_model_repo = os.path.join(self.root_path, "tests")
- self.scratch_path = os.path.join(self.root_path, "tests", "scratch")
- self.input_file_path = os.path.join(self.corpus_path, "en_ewt-ud-test.txt")
- self.output_file_path = os.path.join(self.scratch_path, "en_ewt-ud-test-output.conllu")
-
- if not os.path.exists(self.model_path):
- os.makedirs(self.model_path)
- if not os.path.exists(self.scratch_path):
- os.makedirs(self.scratch_path)
-
- #import root_path
- sys.path.append(self.root_path)
-
- #print("[setUp] Absolute path of NLP-Cube: "+self.root_path)
- #print()
-
-
- def test_0_init_model_store_and_list_online_models(self):
- print("\n\33[33m{}\33[0m".format("0. Loading the model store and querying the online database ..."))
- from cube.io_utils.model_store import ModelMetadata, ModelStore
- model_store_object = ModelStore()
- online_models = model_store_object.list_online_models()
- print("Found "+str(len(online_models))+ " models online.")
- self.assertTrue(len(online_models)>0)
-
- def test_1_1_download_and_run_an_online_model_latest_version(self):
- print("\n\33[33m{}\33[0m".format("1.1 Loading an online model (latest_version) ..."))
- from cube.api import Cube
- cube = Cube(verbose=True)
- #cube.load('en_small', tokenization=True, compound_word_expanding=False, tagging=True, lemmatization=True, parsing=True)
- cube.load('bxr', tokenization=True, compound_word_expanding=False, tagging=True, lemmatization=True, parsing=True)
- cube.metadata.info()
- text = "I'm a success today because I had a friend who believed in me and I didn't have the heart to let him down. This is a quote by Abraham Lincoln."
- sentences = cube(text)
- self.assertTrue(len(sentences)>0)
- self.assertTrue(len(sentences[0])>0)
-
- def test_1_2_download_and_run_an_online_model_specific_version(self):
- print("\n\33[33m{}\33[0m".format("1.2. Loading an online model (sme, 1.0) ..."))
- from cube.api import Cube
- cube = Cube(verbose=True)
- cube.load('sme', version='1.0', tokenization=True, compound_word_expanding=False, tagging=False, lemmatization=False, parsing=False)
- cube.metadata.info()
- text = "I'm a success today because I had a friend who believed in me and I didn't have the heart to let him down. This is a quote by Abraham Lincoln."
- sentences = cube(text)
- self.assertTrue(len(sentences)>0)
- self.assertTrue(len(sentences[0])>0)
-
- # This test needs my_model-1.0 to be locally created with main_tests.py
- def test_2_run_a_local_model(self):
- print("\n\33[33m{}\33[0m".format("2. Run a local model that does not have embeddings or metadata (running with dummy.vec embeddings) ..."))
- embeddings = os.path.join(self.root_path, "examples","wiki.dummy.vec")
- from cube.api import Cube
- cube = Cube(verbose=True)
- cube.load('my_model', tokenization=True, compound_word_expanding=False, tagging=True, lemmatization=True, parsing=True, local_models_repository=self.local_model_repo, local_embeddings_file=embeddings)
- text = "I'm a success today because I had a friend who believed in me and I didn't have the heart to let him down. This is a quote by Abraham Lincoln."
- sentences = cube(text)
- self.assertTrue(len(sentences)>0)
- self.assertTrue(len(sentences[0])>0)
-
-
- def test_3_1_package_a_local_model_without_embeddings_link_in_metadata(self):
- print("\n\33[33m{}\33[0m".format("3.1. Package a local model without an embeddings file ..."))
-
- # create metadata file
- with open(os.path.join(self.model_path,"metadata.json"),"w",encoding="utf-8") as f:
- f.write("{\n")
- f.write('"embeddings_file_name": "wiki.dummy.vec",\n')
- f.write('"embeddings_remote_link": "",\n')
- f.write('"language": "UD_English",\n')
- f.write('"language_code": "my_model",\n')
- f.write('"model_build_date": "2020-01-01",\n')
- f.write('"model_build_source": "UD_English-ParTuT",\n')
- f.write('"model_version": 1.0,\n')
- f.write('"notes": "Source: ud-treebanks-v2.2, dummy model",\n')
- f.write('"token_delimiter": " "\n')
- f.write("}\n")
-
- #python3 /work/NLP-Cube/scripts/export_model.py /work/my_model-1.0 --tokenizer --tagger
- command = "python3 " + os.path.join(self.scripts_path, "export_model.py") + " " + self.model_path
- command+= " --tokenizer --tagger --parser --lemmatizer"
- print("\n\t\t\33[32m{}\n{}\33[0m".format("Export command:",command))
- ''' popen = subprocess.Popen(command.split(" "), stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- '''
- os.system(command)
-
- test = os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip"))
- self.assertTrue(test)
-
- def test_3_2_import_model_in_store(self):
- print("\n\33[33m{}\33[0m".format("3.2. Import locally created model in store ..."))
- command = "python3 " + os.path.join(self.scripts_path, "import_model.py") + " " + os.path.join(self.local_model_repo,"my_model-1.0.zip")
- print("\n\t\t\33[32m{}\n{}\33[0m".format("Import command:",command))
- '''popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- '''
- os.system(command)
-
- # check it is in store
- from cube.io_utils.model_store import ModelMetadata, ModelStore
- model_store_object = ModelStore()
- local_models = model_store_object.list_local_models()
- test = False
- for model, version in local_models:
- if model == "my_model":
- test = True
- self.assertTrue(test)
-
-
- def test_3_3_run_model_with_manual_embeddings(self):
- print("\n\33[33m{}\33[0m".format("3.3. Run a local model with manual embeddings ..."))
- embeddings = os.path.join(self.root_path, "examples","wiki.dummy.vec")
- print("\t\tPath to local manual embeddings file: "+embeddings)
- from cube.api import Cube
- cube = Cube(verbose=True)
- cube.load('my_model', tokenization=True, compound_word_expanding=False, tagging=True, lemmatization=True, parsing=True, local_embeddings_file=embeddings)
- text = "I'm a success today because I had a friend who believed in me and I didn't have the heart to let him down. This is a quote by Abraham Lincoln."
- sentences = cube(text)
- self.assertTrue(len(sentences)>0)
- self.assertTrue(len(sentences[0])>0)
-
- def test_4_1_package_a_local_model_with_embeddings_link_in_metadata(self):
- print("\n\33[33m{}\33[0m".format("4.1. Package a local model with an external embeddings file link..."))
-
- # create metadata file
- with open(os.path.join(self.model_path,"metadata.json"),"w",encoding="utf-8") as f:
- f.write("{\n")
- f.write('"embeddings_file_name": "wiki.got.vec",\n')
- f.write('"embeddings_remote_link": "https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.got.vec",\n')
- f.write('"language": "UD_English",\n')
- f.write('"language_code": "my_model",\n')
- f.write('"model_build_date": "2020-01-01",\n')
- f.write('"model_build_source": "UD_English-ParTuT",\n')
- f.write('"model_version": 1.0,\n')
- f.write('"notes": "Source: ud-treebanks-v2.2, dummy model, -got- embeddings because they are small",\n')
- f.write('"token_delimiter": " "\n')
- f.write("}\n")
-
- # first cleanup if my_model-1.0.zip already exists
- if os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip")):
- os.remove(os.path.join(self.local_model_repo,"my_model-1.0.zip"))
-
- command = "python3 " + os.path.join(self.scripts_path, "export_model.py") + " " + self.model_path
- command+= " --tokenizer --tagger --parser --lemmatizer"
- print("\n\t\t\33[32m{}\n{}\33[0m".format("Export command:",command))
- '''popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- '''
- os.system(command)
- test = os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip"))
- self.assertTrue(test)
-
- def test_4_2_import_model_in_store(self):
- print("\n\33[33m{}\33[0m".format("4.2. Import locally created model in store (with prior cleanup)..."))
-
- # first check local models
- from cube.io_utils.model_store import ModelMetadata, ModelStore
- model_store_object = ModelStore()
- local_models = model_store_object.list_local_models()
- print("\tFound local models:"+str(local_models))
- self.assertTrue(len(local_models)>0)
-
- # search for my_model
- for model, version in local_models:
- if model == "my_model":
- # delete local model
- print("\tDeleting 'my_model-1.0'...")
- model_store_object.delete_model("my_model","1.0")
- local_models_new = model_store_object.list_local_models()
- print("\tFound local models:"+str(local_models_new))
- self.assertTrue(len(local_models)>len(local_models_new))
-
- # import new model
- command = "python3 " + os.path.join(self.scripts_path, "import_model.py") + " " + os.path.join(self.local_model_repo,"my_model-1.0.zip")
- print("\n\t\t\33[32m{}\n{}\33[0m".format("Import command:",command))
- '''popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- '''
- os.system(command)
-
- test = os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip"))
- self.assertTrue(test)
-
- # check it is in store
- local_models = model_store_object.list_local_models()
- test = False
- for model, version in local_models:
- if model == "my_model":
- test = True
- self.assertTrue(test)
-
- def test_4_3_run_model_with_default_external_embeddings(self):
- print("\n\33[33m{}\33[0m".format("4.3. Run a local model with default external embeddings ..."))
- from cube.api import Cube
- cube = Cube(verbose=True)
- cube.load('my_model', tokenization=True, compound_word_expanding=False, tagging=True, lemmatization=True, parsing=True)
- text = "I'm a success today because I had a friend who believed in me and I didn't have the heart to let him down. This is a quote by Abraham Lincoln."
- sentences = cube(text)
- self.assertTrue(len(sentences)>0)
- self.assertTrue(len(sentences[0])>0)
-
- def test_5_cleanup(self):
- print("\n\33[33m{}\33[0m".format("5. Cleanup after myself ..."))
-
- # delete my_model from the store, if it exists
- from cube.io_utils.model_store import ModelMetadata, ModelStore
- model_store_object = ModelStore()
- local_models = model_store_object.list_local_models()
- print("\tFound local models:"+str(local_models))
- self.assertTrue(len(local_models)>0)
-
- for model, version in local_models:
- if model == "my_model":
- # delete local model
- print("\tDeleting 'my_model-1.0'...")
- model_store_object.delete_model("my_model","1.0")
- local_models_new = model_store_object.list_local_models()
- print("\tFound local models:"+str(local_models_new))
- self.assertTrue(len(local_models)>len(local_models_new))
- break
-
- # delete my_model.zip, if it exists
- if os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip")):
- os.remove(os.path.join(self.local_model_repo,"my_model-1.0.zip"))
- self.assertFalse(os.path.exists(os.path.join(self.local_model_repo,"my_model-1.0.zip")))
-
-
-if __name__ == '__main__':
- unittest.main()
\ No newline at end of file
diff --git a/tests/main_tests.py b/tests/main_tests.py
deleted file mode 100644
index 021338775..000000000
--- a/tests/main_tests.py
+++ /dev/null
@@ -1,132 +0,0 @@
-"""
-This class should test:
-
-1. Train a very small model with tokenizer -> parser with several options (train)
-2. Test the model using the main functions (run)
-
-"""
-import os, sys, subprocess
-import unittest
-
-class Main_Tests(unittest.TestCase):
-
- def setUp(self):
- # get current directory
- self.root_path = os.path.dirname(os.path.realpath(__file__))
- self.root_path = os.path.abspath(os.path.join(self.root_path, os.pardir))
- self.main_file_path = os.path.join(self.root_path, "cube", "main.py")
- self.corpus_path = os.path.join(self.root_path, "tests", "test_corpus")
- self.model_path = os.path.join(self.root_path, "tests", "my_model-1.0")
- self.scratch_path = os.path.join(self.root_path, "tests", "scratch")
- self.input_file_path = os.path.join(self.corpus_path, "en_ewt-ud-test.txt")
- self.output_file_path = os.path.join(self.scratch_path, "en_ewt-ud-test-output.conllu")
-
- if not os.path.exists(self.model_path):
- os.makedirs(self.model_path)
- if not os.path.exists(self.scratch_path):
- os.makedirs(self.scratch_path)
-
- #print("\n\n"+"_"*72)
- #print("[setUp] Absolute path of NLP-Cube: "+self.root_path)
- #print()
-
- def test_1_tokenizer_training(self):
- command = "python3 " + self.main_file_path + " --train tokenizer"
- command+= " --train-file "+os.path.join(self.corpus_path,"en_ewt-ud-train.conllu") + " --raw-train-file " + os.path.join(self.corpus_path,"en_ewt-ud-train.txt")
- command+= " --dev-file "+os.path.join(self.corpus_path,"en_ewt-ud-dev.conllu") + " --raw-dev-file " + os.path.join(self.corpus_path,"en_ewt-ud-dev.txt")
- command+= " --embeddings "+os.path.join(self.root_path, "examples", "wiki.dummy.vec")
- command+= " --store " + os.path.join(self.model_path, "tokenizer")
- command+= " --autobatch --batch-size 1000 --set-mem 1000 --random-seed 42 --patience 1"
- print("\n\33[33m{}\n{}\33[0m".format("Tokenizer command:",command))
- popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- test = "Training is done with " in output[-1]
- self.assertTrue(test)
-
- def test_2_tagger_training(self):
- command = "python3 " + self.main_file_path + " --train tagger"
- command+= " --train-file "+os.path.join(self.corpus_path,"en_ewt-ud-train.conllu")
- command+= " --dev-file "+os.path.join(self.corpus_path,"en_ewt-ud-dev.conllu")
- command+= " --embeddings "+os.path.join(self.root_path, "examples", "wiki.dummy.vec")
- command+= " --store " + os.path.join(self.model_path, "tagger")
- command+= " --batch-size 500 --set-mem 1000 --patience 1"
- print("\n\33[33m{}\n{}\33[0m".format("Tagger command:",command))
- popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- test = "Training is done with " in output[-1]
- self.assertTrue(test)
-
-
-
- def test_3_lemmatizer_training(self):
- command = "python3 " + self.main_file_path + " --train lemmatizer"
- command+= " --train-file "+os.path.join(self.corpus_path,"en_ewt-ud-train.conllu")
- command+= " --dev-file "+os.path.join(self.corpus_path,"en_ewt-ud-dev.conllu")
- command+= " --embeddings "+os.path.join(self.root_path, "examples", "wiki.dummy.vec")
- command+= " --store " + os.path.join(self.model_path, "lemmatizer")
- command+= " --batch-size 750 --patience 1"
- print("\n\33[33m{}\n{}\33[0m".format("Lemmatizer command:",command))
- popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- test = "Training is done with " in output[-1]
- self.assertTrue(test)
-
- def test_4_parser_training(self):
- command = "python3 " + self.main_file_path + " --train parser"
- command+= " --train-file "+os.path.join(self.corpus_path,"en_ewt-ud-train.conllu")
- command+= " --dev-file "+os.path.join(self.corpus_path,"en_ewt-ud-dev.conllu")
- command+= " --embeddings "+os.path.join(self.root_path, "examples", "wiki.dummy.vec")
- command+= " --store " + os.path.join(self.model_path, "parser")
- command+= " --batch-size 1000 --set-mem 950 --patience 1"
- print("\n\33[33m{}\n{}\33[0m".format("Parser command:",command))
- popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- output = []
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- if stdout_line.strip()!= "":
- output.append(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- test = "Training is done with " in output[-7]
- self.assertTrue(test)
-
- def test_5_run_model(self):
- command = "python3 " + self.main_file_path + " --run tokenizer,parser,tagger,lemmatizer"
- command+= " --models " + self.model_path
- command+= " --embeddings " + os.path.join(self.root_path, "examples", "wiki.dummy.vec")
- command+= " --input-file " + self.input_file_path
- command+= " --output-file " + self.output_file_path
- print("\n\33[33m{}\n{}\33[0m".format("Model run command:",command))
- popen = subprocess.Popen(command.split(" ") , stdout=subprocess.PIPE, universal_newlines=True)
- for stdout_line in iter(popen.stdout.readline, ""):
- print(stdout_line[:-1])
- popen.stdout.close()
- return_code = popen.wait()
- self.assertTrue(return_code == 0)
-
- lines = []
- with open(self.output_file_path,"r",encoding="utf8") as f:
- lines = [line for line in f.readlines() if line.strip() != ""]
- test = "treaty" in lines[-2]
- self.assertTrue(test)
-
-if __name__ == '__main__':
- unittest.main()
\ No newline at end of file
diff --git a/tests/scratch/en_ewt-ud-test-output.conllu b/tests/scratch/en_ewt-ud-test-output.conllu
deleted file mode 100644
index e160ced9f..000000000
--- a/tests/scratch/en_ewt-ud-test-output.conllu
+++ /dev/null
@@ -1,512 +0,0 @@
-1 What wwwwwwww NOUN NNP _ 13 det _ _
-2 if iiii NOUN IN _ 13 det _ _
-3 Google gggggg NOUN NNP _ 13 det _ _
-4 Morphed mmmmm NOUN NNP _ 13 det _ _
-5 Into iiiiii NOUN NNP _ 13 det _ _
-6 GoogleOS? ggggggggssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss NOUN NNP _ 13 det _ _
-7 What wwwwwwww NOUN NNP _ 13 det _ _
-8 if iiii NOUN IN _ 13 det _ _
-9 Google ggggg NOUN JJ _ 13 det _ _
-10 expanded eee NOUN NNS _ 13 det _ _
-11 on oooo NOUN IN _ 13 det _ _
-12 its iiii NOUN DT _ 13 det _ _
-13 search- sssss NOUN JJ _ 0 det _ SpaceAfter=No
-14 engine eeee NOUN JJ _ 13 det _ _
-15 (and ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( NOUN NNP _ 13 det _ _
-16 now nnnn NOUN RB _ 13 det _ _
-17 e- eeee NOUN CD _ 13 det _ SpaceAfter=No
-18 mail) mmmm NOUN , _ 13 det _ _
-19 wares ww NOUN NNS _ 13 det _ _
-20 into iii NOUN IN _ 13 det _ _
-21 a aaaaaa NOUN DT _ 13 det _ _
-22 full- fffffffff------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ NOUN NNP _ 13 det _ SpaceAfter=No
-23 fledged ffff NOUN JJ _ 13 det _ _
-24 operating oooo NOUN JJ _ 13 det _ _
-25 system? ssssss NOUN NNP _ 13 det _ _
-26 [via [[[[[[[[[ NOUN NNP _ 13 det _ _
-27 Microsoft mmmmm NOUN NNP _ 13 det _ _
-28 Watch wwwwwww NOUN NNP _ 13 det _ _
-29 from ffff NOUN IN _ 13 det _ _
-30 Mary mmmmmmm NOUN NNP _ 13 det _ _
-31 Jo jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj NOUN NNP _ 13 det _ _
-32 Foley fffffff NOUN NNP _ 13 det _ _
-33 ] ]]]]]]] NOUN , _ 13 det _ _
-
-1 ( (((((( NOUN HYPH _ 24 det _ SpaceAfter=No
-2 And aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa NOUN NNP _ 24 det _ SpaceAfter=No
-3 , ,,,,,,,, NOUN , _ 24 det _ _
-4 by bbbb NOUN IN _ 24 det _ _
-5 the tttt NOUN DT _ 24 det _ _
-6 way, wwww NOUN , _ 24 det _ _
-7 is iiii NOUN VBZ _ 24 det _ _
-8 anybody aaa NOUN IN _ 24 det _ _
-9 else eeeeeee NOUN NN _ 24 det _ _
-10 just jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj NOUN NN _ 24 det _ SpaceAfter=No
-11 a aaaaaa NOUN DT _ 24 det _ _
-12 little llll NOUN JJ _ 24 det _ _
-13 nostalgic nnnnn NOUN NN _ 24 det _ _
-14 for ffff NOUN IN _ 24 det _ _
-15 the tttt NOUN DT _ 24 det _ _
-16 days dddddddd NOUN NN _ 24 det _ _
-17 when wwwwwww NOUN NN _ 24 det _ _
-18 that ttt NOUN IN _ 24 det _ _
-19 was wwww NOUN VBZ _ 24 det _ _
-20 a aaaaaa NOUN DT _ 24 det _ _
-21 good gggggggg NOUN NN _ 24 det _ _
-22 thing? tttt NOUN RB _ 24 det _ SpaceAfter=No
-23 ) ))))))) NOUN , _ 24 det _ _
-24 This ttttt NOUN DT _ 0 det _ _
-25 BuzzMachine bbbbbb NOUN NNP _ 24 det _ _
-26 post pppppppp NOUN NN _ 24 det _ _
-27 argues aa NOUN NNS _ 24 det _ _
-28 that ttt NOUN IN _ 24 det _ _
-29 Google's ggggg NOUN JJ _ 24 det _ _
-30 rush rrrrr NOUN JJ _ 24 det _ _
-31 toward ttttt NOUN JJ _ 24 det _ _
-32 ubiquity uuuuu NOUN JJ _ 24 det _ _
-33 might mmmmm NOUN JJ _ 24 det _ _
-34 backfire bbbbbb NOUN NN _ 24 det _ _
-35 - ------- NOUN , _ 24 det _ SpaceAfter=No
-36 - ------- NOUN , _ 24 det _ _
-37 which wwwww NOUN NNP _ 24 det _ _
-38 we've ww NOUN VBD _ 24 det _ _
-39 all aaaaaaa NOUN NNP _ 24 det _ _
-40 heard hhh NOUN RB _ 24 det _ _
-41 before, bbb NOUN CD _ 24 det _ _
-42 but bbbb NOUN IN _ 24 det _ _
-43 it's iiii NOUN PRP _ 24 det _ _
-44 particularly pppp NOUN JJ _ 24 det _ _
-45 well www NOUN NNS _ 24 det _ SpaceAfter=No
-46 - ------- NOUN , _ 24 det _ SpaceAfter=No
-47 put ppp NOUN NNS _ 24 det _ _
-48 in iiii NOUN IN _ 24 det _ _
-49 this tttt NOUN DT _ 24 det _ _
-50 post pppppppp NOUN NN _ 24 det _ SpaceAfter=No
-51 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 24 det _ _
-
-1 Google ggggggg NOUN NN _ 3 det _ _
-2 is iiii NOUN VBZ _ 3 det _ _
-3 a aaaaaa NOUN DT _ 0 det _ _
-4 nice nnnn NOUN JJ _ 3 det _ _
-5 search ssss NOUN JJ _ 3 det _ _
-6 engine eeeee NOUN NN _ 3 det _ SpaceAfter=No
-7 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ _
-
-1 Does ddddddd NOUN NNP _ 5 det _ _
-2 anybody aaa NOUN IN _ 5 det _ _
-3 use uuu NOUN VBD _ 5 det _ _
-4 it iiii NOUN VBZ _ 5 det _ _
-5 for ffff NOUN IN _ 0 det _ _
-6 anything aaa NOUN RB _ 5 det _ _
-7 else? eee NOUN CD _ 5 det _ _
-8 They tttt NOUN PRP _ 5 det _ _
-9 own oooo NOUN IN _ 5 det _ _
-10 blogger bbb NOUN NNS _ 5 det _ SpaceAfter=No
-11 , ,,,,,,,, NOUN , _ 5 det _ _
-12 of oooo NOUN IN _ 5 det _ _
-13 course ccc NOUN NNS _ 5 det _ SpaceAfter=No
-14 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 Is iiiii NOUN VBZ _ 5 det _ _
-2 that ttt NOUN IN _ 5 det _ _
-3 a aaaaaa NOUN DT _ 5 det _ _
-4 money mmmmmm NOUN NN _ 5 det _ _
-5 maker? mmmmmm NOUN NNP _ 0 det _ _
-6 I'm iii NOUN VBD _ 5 det _ _
-7 staying ssssss NOUN NN _ 5 det _ _
-8 away aaaaaaa NOUN NN _ 5 det _ _
-9 from ffff NOUN IN _ 5 det _ _
-10 the tttt NOUN DT _ 5 det _ _
-11 stock ssssssss NOUN NN _ 5 det _ SpaceAfter=No
-12 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 I iiiiiiii NOUN VBZ _ 9 det _ _
-2 doubt ddd NOUN IN _ 9 det _ _
-3 the tttt NOUN DT _ 9 det _ _
-4 very vvvvvvv NOUN NN _ 9 det _ _
-5 few ffff NOUN RB _ 10 det _ _
-6 who wwww NOUN TO _ 10 det _ _
-7 actually aaaaa NOUN JJ _ 10 det _ _
-8 read rrr NOUN NNS _ 10 det _ _
-9 my mmmm NOUN IN _ 10 det _ _
-10 blog bbbb NOUN RB _ 0 det _ _
-11 have hhhhhh NOUN VBN _ 10 det _ _
-12 not nnnn NOUN IN _ 10 det _ _
-13 come ccc NOUN VB _ 10 det _ _
-14 across aaaaaa NOUN NN _ 10 det _ _
-15 this tttt NOUN PRP _ 10 det _ _
-16 yet, yyyy NOUN , _ 10 det _ _
-17 but bbbb NOUN IN _ 10 det _ _
-18 I iiiiiiii NOUN DT _ 10 det _ _
-19 figured ffffff NOUN NN _ 10 det _ _
-20 I iiiiiiii NOUN DT _ 10 det _ _
-21 would wwww NOUN RB _ 10 det _ _
-22 put pppp NOUN RB _ 10 det _ _
-23 it iiii NOUN PRP _ 10 det _ _
-24 out oooo NOUN RB _ 10 det _ _
-25 there ttt NOUN VB _ 10 det _ _
-26 anyways aaaaaa NOUN NN _ 10 det _ SpaceAfter=No
-27 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 10 det _ _
-
-1 John jjjjjjjj NOUN NNP _ 7 det _ _
-2 Donovan dddddd NOUN NNP _ 7 det _ _
-3 from ffff NOUN IN _ 7 det _ _
-4 Argghhh! aaaaa NOUN NNP _ 7 det _ _
-5 has hhhh NOUN VBZ _ 7 det _ _
-6 put ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp NOUN NN _ 7 det _ _
-7 out oooo NOUN IN _ 0 det _ SpaceAfter=No
-8 a aaaaaa NOUN DT _ 7 det _ _
-9 excellent eeeee NOUN NN _ 7 det _ _
-10 slide ssssss NOUN NN _ 7 det _ _
-11 show ssss NOUN RB _ 7 det _ _
-12 on oooo NOUN IN _ 7 det _ _
-13 what wwww NOUN RB _ 7 det _ _
-14 was wwww NOUN VBZ _ 7 det _ _
-15 actually aaaaaa NOUN NN _ 7 det _ _
-16 found ffff NOUN RB _ 7 det _ _
-17 and aaa NOUN CC _ 7 det _ _
-18 fought ffff NOUN RB _ 7 det _ _
-19 for ffff NOUN IN _ 7 det _ _
-20 in iiii NOUN IN _ 7 det _ _
-21 Fallujah ffffff NOUN NNP _ 7 det _ SpaceAfter=No
-22 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 7 det _ _
-
-1 Click cccccc NOUN NNP _ 2 det _ _
-2 here hh NOUN VBD _ 0 det _ _
-3 To ttttt NOUN , _ 2 det _ _
-4 view vvvvvv NOUN VBN _ 2 det _ _
-5 it iiii NOUN RB _ 2 det _ SpaceAfter=No
-6 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 2 det _ _
-
-1 He hhhhh NOUN PRP _ 4 det _ _
-2 makes mmmmmmm NOUN NN _ 4 det _ _
-3 some sssssss NOUN VBN _ 4 det _ _
-4 good ggggggg NOUN VBN _ 0 det _ _
-5 observations ooo NOUN NNS _ 4 det _ _
-6 on oooo NOUN IN _ 4 det _ _
-7 a aaaaaa NOUN DT _ 4 det _ _
-8 few fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NOUN NN _ 4 det _ _
-9 of oooo NOUN IN _ 4 det _ _
-10 the tttt NOUN DT _ 4 det _ _
-11 pic's ppppppp NOUN NN _ 4 det _ SpaceAfter=No
-12 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 4 det _ _
-
-1 One ooo NOUN NNS _ 5 det _ _
-2 of oooo NOUN IN _ 5 det _ _
-3 the tttt NOUN DT _ 5 det _ _
-4 pictures ppppp NOUN NN _ 5 det _ _
-5 shows ss NOUN VBD _ 0 det _ _
-6 a aaaaaa NOUN DT _ 5 det _ _
-7 flag ffffffff NOUN NN _ 5 det _ _
-8 that ttt NOUN IN _ 5 det _ _
-9 was wwww NOUN VBZ _ 5 det _ _
-10 found ffff NOUN RB _ 5 det _ _
-11 in iiii NOUN IN _ 5 det _ _
-12 Fallujah ffffff NOUN NNP _ 5 det _ SpaceAfter=No
-13 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 On ooooo NOUN IN _ 5 det _ _
-2 the tttt NOUN DT _ 5 det _ _
-3 next nnnnnnn NOUN NN _ 5 det _ _
-4 two tttt NOUN CD _ 5 det _ _
-5 pictures ppp NOUN RB _ 0 det _ _
-6 he hhhh NOUN PRP _ 5 det _ _
-7 took tttt NOUN RB _ 5 det _ _
-8 screenshots sss NOUN NNS _ 5 det _ _
-9 of oooo NOUN IN _ 5 det _ _
-10 two tttt NOUN PRP _ 5 det _ _
-11 beheading bbb NOUN RB _ 5 det _ _
-12 video's vvv NOUN NNS _ 5 det _ SpaceAfter=No
-13 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 Compare ccccc NOUN NNP _ 3 det _ _
-2 the tttt NOUN DT _ 3 det _ _
-3 flags fffffff NOUN NN _ 0 det _ _
-4 to tttt NOUN IN _ 3 det _ _
-5 the tttt NOUN DT _ 3 det _ _
-6 Fallujah ffffff NOUN NNP _ 3 det _ _
-7 one oooooooooeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee NOUN NN _ 3 det _ SpaceAfter=No
-8 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ _
-
-1 You yyyyy NOUN RB _ 3 det _ _
-2 have hh NOUN VBD _ 3 det _ _
-3 to ttttt NOUN TO _ 0 det _ _
-4 see sss NOUN VB _ 3 det _ _
-5 these ttt NOUN VB _ 3 det _ _
-6 slides sss NOUN NNS _ 3 det _ SpaceAfter=No
-7 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ SpaceAfter=No
-
-1 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 0 det _ SpaceAfter=No
-
-1 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 0 det _ SpaceAfter=No
-
-1 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 0 det _ SpaceAfter=No
-
-1 they tttt NOUN PRP _ 2 det _ _
-2 are aaaaa NOUN VBP _ 0 det _ _
-3 amazing aaaaa NOUN VBN _ 2 det _ SpaceAfter=No
-4 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 2 det _ _
-
-1 This ttttt NOUN DT _ 9 det _ _
-2 Fallujah ffffff NOUN NNP _ 9 det _ _
-3 operation ooooo NOUN NN _ 9 det _ _
-4 my mmmm NOUN IN _ 9 det _ _
-5 turn tttttttt NOUN NN _ 9 det _ _
-6 out oooo NOUN RB _ 9 det _ _
-7 to ttttt NOUN TO _ 9 det _ _
-8 be bbbb NOUN VB _ 9 det _ _
-9 the tttt NOUN DT _ 0 det _ _
-10 most mmmmmmmmmttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt NOUN NN _ 9 det _ _
-11 important iiiii NOUN NN _ 9 det _ _
-12 operation ooooo NOUN NN _ 9 det _ _
-13 done ddd NOUN RB _ 9 det _ _
-14 by bbbb NOUN IN _ 9 det _ _
-15 the tttt NOUN DT _ 9 det _ _
-16 US uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu NOUN NNP _ 9 det _ _
-17 Military mmmmm NOUN NNP _ 9 det _ _
-18 since sss NOUN IN _ 9 det _ _
-19 the tttt NOUN DT _ 9 det _ _
-20 end eeeeeeeeeeddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd NOUN NN _ 9 det _ _
-21 of oooo NOUN IN _ 9 det _ _
-22 the tttt NOUN DT _ 9 det _ _
-23 war wwwwwwwwwwrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr NOUN NN _ 9 det _ SpaceAfter=No
-24 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 9 det _ _
-
-1 Let lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll NOUN NNP _ 9 det _ _
-2 me mmm NOUN VBD _ 9 det _ _
-3 join jjj NOUN IN _ 10 det _ _
-4 the tttt NOUN DT _ 10 det _ _
-5 chorus cccccc NOUN NN _ 10 det _ _
-6 of oooo NOUN IN _ 10 det _ _
-7 annoyance aaaa NOUN JJ _ 10 det _ _
-8 over ooo NOUN RB _ 10 det _ _
-9 Google's ggggg NOUN JJ _ 10 det _ _
-10 new nnnn NOUN RB _ 0 det _ _
-11 toolbar ttt NOUN NNS _ 10 det _ _
-12 , ,,,,,,,, NOUN , _ 10 det _ _
-13 which, wwww NOUN , _ 10 det _ _
-14 as aaaaa NOUN DT _ 10 det _ _
-15 noted nnn NOUN NNS _ 10 det _ _
-16 in iiii NOUN IN _ 10 det _ _
-17 the tttt NOUN DT _ 10 det _ _
-18 linked lllll NOUN NN _ 10 det _ _
-19 article, aaaaa NOUN NNP _ 10 det _ _
-20 commits ccccc NOUN JJ _ 10 det _ _
-21 just jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj NOUN NN _ 10 det _ _
-22 about aaa NOUN IN _ 10 det _ _
-23 every eee NOUN NNS _ 10 det _ _
-24 sin sss NOUN IN _ 10 det _ _
-25 an aaaa NOUN IN _ 10 det _ _
-26 online oooo NOUN JJ _ 10 det _ _
-27 marketer mmmm NOUN JJ _ 10 det _ _
-28 could cccc NOUN RB _ 10 det _ _
-29 commit, ccccc NOUN NNP _ 10 det _ _
-30 and aaa NOUN CC _ 10 det _ _
-31 makes mmm NOUN NNS _ 10 det _ _
-32 up uuuu NOUN IN _ 10 det _ _
-33 a aaaaaa NOUN DT _ 10 det _ _
-34 few fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NOUN NN _ 10 det _ _
-35 new nnnn NOUN RB _ 10 det _ _
-36 ones oooo NOUN RB _ 10 det _ _
-37 besides bbb NOUN NNS _ 10 det _ SpaceAfter=No
-38 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 10 det _ _
-
-1 I'm iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii NOUN NNP _ 9 det _ _
-2 not nnnn NOUN RB _ 9 det _ _
-3 fond ffff NOUN RB _ 9 det _ _
-4 of oooo NOUN IN _ 9 det _ _
-5 the tttt NOUN DT _ 9 det _ _
-6 Google- gggggg NOUN NNP _ 9 det _ SpaceAfter=No
-7 hates- hhhhh NOUN NNP _ 9 det _ SpaceAfter=No
-8 privacy pppp NOUN JJ _ 9 det _ _
-9 argument aaaaa NOUN NN _ 0 det _ _
-10 ( ((((((( NOUN , _ 9 det _ SpaceAfter=No
-11 You yyyyy NOUN RB _ 9 det _ _
-12 don't dddd NOUN RB _ 9 det _ _
-13 need nnn NOUN RB _ 9 det _ _
-14 to ttttt NOUN TO _ 9 det _ _
-15 use uuuu NOUN VB _ 9 det _ _
-16 their ttt NOUN VB _ 9 det _ _
-17 site, ssss NOUN , _ 9 det _ _
-18 you yyyy NOUN IN _ 9 det _ _
-19 can ccccc NOUN JJ _ 9 det _ _
-20 opt- ooo NOUN HYPH _ 9 det _ SpaceAfter=No
-21 out oooo NOUN RB _ 9 det _ _
-22 of oooo NOUN IN _ 9 det _ _
-23 sharing sss NOUN RB _ 9 det _ _
-24 your yyyy NOUN RB _ 9 det _ _
-25 information iii NOUN RB _ 9 det _ SpaceAfter=No
-26 , ,,,,,,,, NOUN , _ 9 det _ _
-27 you yyyy NOUN RB _ 9 det _ _
-28 don't dddd NOUN RB _ 9 det _ _
-29 need nnn NOUN RB _ 9 det _ _
-30 to ttttt NOUN TO _ 9 det _ _
-31 send ssss NOUN RB _ 9 det _ _
-32 stuff ssss NOUN IN _ 9 det _ _
-33 to ttttt NOUN TO _ 9 det _ _
-34 anyone aaa NOUN RB _ 9 det _ _
-35 with www NOUN IN _ 9 det _ _
-36 a aaaaaa NOUN DT _ 9 det _ _
-37 Gmail gggggg NOUN NNP _ 9 det _ _
-38 account aaaaa NOUN NNP _ 9 det _ SpaceAfter=No
-39 , ,,,,,,,, NOUN , _ 9 det _ _
-40 and aaa NOUN CC _ 9 det _ _
-41 if iiii NOUN IN _ 9 det _ _
-42 - ------ NOUN HYPH _ 9 det _ SpaceAfter=No
-43 - ------ NOUN HYPH _ 9 det _ _
-44 wonder www NOUN NNS _ 9 det _ _
-45 of oooo NOUN IN _ 9 det _ _
-46 wonders www NOUN NNS _ 9 det _ _
-47 - ------- NOUN , _ 9 det _ SpaceAfter=No
-48 - ------- NOUN , _ 9 det _ _
-49 you're yyyy NOUN RB _ 9 det _ _
-50 worried ww NOUN NNS _ 9 det _ _
-51 that ttt NOUN IN _ 9 det _ _
-52 you yyyy NOUN PRP _ 9 det _ _
-53 might mmm NOUN RB _ 9 det _ _
-54 send ssss NOUN RB _ 9 det _ _
-55 something sss NOUN NNS _ 9 det _ _
-56 to ttttt NOUN TO _ 9 det _ _
-57 someone sss NOUN RB _ 9 det _ _
-58 who wwww NOUN VB _ 9 det _ _
-59 would wwww NOUN RB _ 9 det _ _
-60 forward ffff NOUN RB _ 9 det _ _
-61 an aaaa NOUN IN _ 9 det _ _
-62 excerpt eeeeee NOUN NN _ 9 det _ _
-63 to ttttt NOUN TO _ 9 det _ _
-64 someone sss NOUN RB _ 9 det _ _
-65 who wwww NOUN VB _ 9 det _ _
-66 would wwww NOUN VB _ 9 det _ _
-67 then ttt NOUN VB _ 9 det _ _
-68 store sss NOUN RB _ 9 det _ _
-69 it iiii NOUN RB _ 9 det _ _
-70 on oooo NOUN IN _ 9 det _ _
-71 a aaaaaa NOUN DT _ 9 det _ _
-72 Gmail gggggg NOUN NNP _ 9 det _ _
-73 account aaaaaa NOUN NN _ 9 det _ SpaceAfter=No
-74 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 9 det _ SpaceAfter=No
-
-1 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 0 det _ SpaceAfter=No
-
-1 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 0 det _ _
-
-1 you yyyy NOUN RB _ 5 det _ _
-2 have hh NOUN VBD _ 5 det _ _
-3 far fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff NOUN NN _ 5 det _ SpaceAfter=No
-4 , ,,,,,,,, NOUN , _ 5 det _ _
-5 far fffffff NOUN NNP _ 0 det _ _
-6 too tttt NOUN , _ 5 det _ _
-7 much mmmm NOUN RB _ 5 det _ _
-8 time ttt NOUN NNS _ 5 det _ _
-9 on oooo NOUN IN _ 5 det _ _
-10 your yyyy NOUN RB _ 5 det _ _
-11 hands hhh NOUN RB _ 5 det _ SpaceAfter=No
-12 ) ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) NOUN . _ 5 det _ SpaceAfter=No
-13 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 However hhhhh NOUN NNP _ 3 det _ SpaceAfter=No
-2 , ,,,,,,,, NOUN , _ 3 det _ _
-3 this tttt NOUN VBZ _ 0 det _ _
-4 toolbar tttttt NOUN NN _ 3 det _ _
-5 is iiii NOUN VBZ _ 3 det _ _
-6 really rrrr NOUN JJ _ 3 det _ _
-7 bad bbbbbb NOUN JJ _ 3 det _ _
-8 news nnn NOUN NNS _ 3 det _ SpaceAfter=No
-9 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ _
-
-1 On ooooo NOUN IN _ 4 det _ _
-2 the tttt NOUN DT _ 4 det _ _
-3 other ooooo NOUN NNP _ 4 det _ _
-4 hand hhhhh NOUN NNP _ 0 det _ SpaceAfter=No
-5 , ,,,,,,,, NOUN , _ 4 det _ _
-6 it iiii NOUN PRP _ 4 det _ _
-7 looks lllll NOUN JJ _ 4 det _ _
-8 pretty pppp NOUN JJ _ 4 det _ _
-9 cool ccc NOUN NNS _ 4 det _ SpaceAfter=No
-10 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 4 det _ _
-
-1 Iran iiiiiii NOUN NN _ 5 det _ _
-2 says ssssssss NOUN NN _ 5 det _ _
-3 it iiii NOUN IN _ 5 det _ _
-4 is iiii NOUN VBZ _ 5 det _ _
-5 creating cccc NOUN JJ _ 0 det _ _
-6 nuclear nnnnn NOUN JJ _ 5 det _ _
-7 energy eeeee NOUN NN _ 5 det _ _
-8 without www NOUN IN _ 5 det _ _
-9 wanting www NOUN VB _ 5 det _ _
-10 nuclear nnnnn NOUN JJ _ 5 det _ _
-11 weapons www NOUN NNS _ 5 det _ SpaceAfter=No
-12 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 5 det _ _
-
-1 The ttttt NOUN DT _ 3 det _ _
-2 United uuuuuu NOUN NNP _ 3 det _ _
-3 States ssssss NOUN NNP _ 0 det _ _
-4 doesn't dddddd NOUN NN _ 3 det _ _
-5 believe bbb NOUN IN _ 3 det _ _
-6 the tttt NOUN DT _ 3 det _ _
-7 Iranian iiii NOUN JJ _ 3 det _ _
-8 Government gggggg NOUN NN _ 3 det _ SpaceAfter=No
-9 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ _
-
-1 One oooo NOUN CD _ 3 det _ _
-2 can cccccccccnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn NOUN NN _ 3 det _ _
-3 suspect sss NOUN IN _ 0 det _ _
-4 the tttt NOUN DT _ 3 det _ _
-5 Iranian iiiii NOUN NNP _ 3 det _ _
-6 Government gggggg NOUN NN _ 3 det _ SpaceAfter=No
-7 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 3 det _ _
-
-1 But bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb NOUN NNP _ 2 det _ _
-2 there tt NOUN VBD _ 0 det _ _
-3 is iiii NOUN VBZ _ 2 det _ _
-4 no nnnn NOUN RB _ 2 det _ _
-5 proof ppp NOUN RB _ 2 det _ _
-6 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 2 det _ _
-
-1 I iiiiiiii NOUN DT _ 7 det _ _
-2 read rrrrrr NOUN NN _ 7 det _ _
-3 an aaaa NOUN IN _ 7 det _ _
-4 Article aaaaa NOUN NNP _ 7 det _ _
-5 in iiii NOUN IN _ 7 det _ _
-6 Time tttt NOUN DT _ 7 det _ _
-7 magazine mmmmm NOUN NN _ 0 det _ _
-8 accusing aaa NOUN IN _ 7 det _ _
-9 the tttt NOUN DT _ 7 det _ _
-10 Iranian iiii NOUN JJ _ 7 det _ _
-11 Government gggggg NOUN NN _ 7 det _ _
-12 of oooo NOUN IN _ 7 det _ _
-13 being bbb NOUN RB _ 7 det _ _
-14 willing ww NOUN NNS _ 7 det _ _
-15 to ttttt NOUN TO _ 7 det _ _
-16 start ssss NOUN RB _ 7 det _ _
-17 a aaaaaa NOUN DT _ 7 det _ _
-18 nuclear nnnnn NOUN NNP _ 7 det _ _
-19 war wwwwwwwwwwrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr NOUN NN _ 8 det _ _
-20 and aaa NOUN CC _ 8 det _ _
-21 I iiiiiiii NOUN DT _ 8 det _ _
-22 sympathise ssssss NOUN NN _ 8 det _ _
-23 with www NOUN IN _ 8 det _ _
-24 the tttt NOUN DT _ 8 det _ _
-25 Article aaaaa NOUN NNP _ 8 det _ SpaceAfter=No
-26 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 8 det _ _
-
-1 They tttt NOUN PRP _ 7 det _ _
-2 are aa NOUN VBD _ 7 det _ _
-3 certainly cccc NOUN JJ _ 7 det _ _
-4 being bbbbbb NOUN VBN _ 7 det _ _
-5 nasty nnnnnnn NOUN NN _ 7 det _ _
-6 to tttt NOUN IN _ 7 det _ _
-7 the tttt NOUN DT _ 0 det _ _
-8 United uuuuuu NOUN NNP _ 7 det _ _
-9 Nations nnnnnn NOUN NNP _ 7 det _ _
-10 Security ssssss NOUN NNP _ 7 det _ _
-11 Council cccccc NOUN NNP _ 7 det _ _
-12 in iiii NOUN IN _ 7 det _ _
-13 connection ccccc NOUN NN _ 7 det _ _
-14 with www NOUN IN _ 7 det _ _
-15 the tttt NOUN DT _ 7 det _ _
-16 anti- aaaaaa NOUN NNP _ 7 det _ SpaceAfter=No
-17 proliferation pppp NOUN JJ _ 7 det _ _
-18 treaty tttttt NOUN NN _ 7 det _ SpaceAfter=No
-19 . ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... NOUN . _ 7 det _ _
-
diff --git a/tests/test_corpus/LICENSE.txt b/tests/test_corpus/LICENSE.txt
deleted file mode 100644
index 547fb999d..000000000
--- a/tests/test_corpus/LICENSE.txt
+++ /dev/null
@@ -1,426 +0,0 @@
-Attribution-ShareAlike 4.0 International
-
-=======================================================================
-
-Creative Commons Corporation ("Creative Commons") is not a law firm and
-does not provide legal services or legal advice. Distribution of
-Creative Commons public licenses does not create a lawyer-client or
-other relationship. Creative Commons makes its licenses and related
-information available on an "as-is" basis. Creative Commons gives no
-warranties regarding its licenses, any material licensed under their
-terms and conditions, or any related information. Creative Commons
-disclaims all liability for damages resulting from their use to the
-fullest extent possible.
-
-Using Creative Commons Public Licenses
-
-Creative Commons public licenses provide a standard set of terms and
-conditions that creators and other rights holders may use to share
-original works of authorship and other material subject to copyright
-and certain other rights specified in the public license below. The
-following considerations are for informational purposes only, are not
-exhaustive, and do not form part of our licenses.
-
- Considerations for licensors: Our public licenses are
- intended for use by those authorized to give the public
- permission to use material in ways otherwise restricted by
- copyright and certain other rights. Our licenses are
- irrevocable. Licensors should read and understand the terms
- and conditions of the license they choose before applying it.
- Licensors should also secure all rights necessary before
- applying our licenses so that the public can reuse the
- material as expected. Licensors should clearly mark any
- material not subject to the license. This includes other CC-
- licensed material, or material used under an exception or
- limitation to copyright. More considerations for licensors:
- wiki.creativecommons.org/Considerations_for_licensors
-
- Considerations for the public: By using one of our public
- licenses, a licensor grants the public permission to use the
- licensed material under specified terms and conditions. If
- the licensor's permission is not necessary for any reason--for
- example, because of any applicable exception or limitation to
- copyright--then that use is not regulated by the license. Our
- licenses grant only permissions under copyright and certain
- other rights that a licensor has authority to grant. Use of
- the licensed material may still be restricted for other
- reasons, including because others have copyright or other
- rights in the material. A licensor may make special requests,
- such as asking that all changes be marked or described.
- Although not required by our licenses, you are encouraged to
- respect those requests where reasonable. More_considerations
- for the public:
- wiki.creativecommons.org/Considerations_for_licensees
-
-=======================================================================
-
-Creative Commons Attribution-ShareAlike 4.0 International Public
-License
-
-By exercising the Licensed Rights (defined below), You accept and agree
-to be bound by the terms and conditions of this Creative Commons
-Attribution-ShareAlike 4.0 International Public License ("Public
-License"). To the extent this Public License may be interpreted as a
-contract, You are granted the Licensed Rights in consideration of Your
-acceptance of these terms and conditions, and the Licensor grants You
-such rights in consideration of benefits the Licensor receives from
-making the Licensed Material available under these terms and
-conditions.
-
-
-Section 1 -- Definitions.
-
- a. Adapted Material means material subject to Copyright and Similar
- Rights that is derived from or based upon the Licensed Material
- and in which the Licensed Material is translated, altered,
- arranged, transformed, or otherwise modified in a manner requiring
- permission under the Copyright and Similar Rights held by the
- Licensor. For purposes of this Public License, where the Licensed
- Material is a musical work, performance, or sound recording,
- Adapted Material is always produced where the Licensed Material is
- synched in timed relation with a moving image.
-
- b. Adapter's License means the license You apply to Your Copyright
- and Similar Rights in Your contributions to Adapted Material in
- accordance with the terms and conditions of this Public License.
-
- c. BY-SA Compatible License means a license listed at
- creativecommons.org/compatiblelicenses, approved by Creative
- Commons as essentially the equivalent of this Public License.
-
- d. Copyright and Similar Rights means copyright and/or similar rights
- closely related to copyright including, without limitation,
- performance, broadcast, sound recording, and Sui Generis Database
- Rights, without regard to how the rights are labeled or
- categorized. For purposes of this Public License, the rights
- specified in Section 2(b)(1)-(2) are not Copyright and Similar
- Rights.
-
- e. Effective Technological Measures means those measures that, in the
- absence of proper authority, may not be circumvented under laws
- fulfilling obligations under Article 11 of the WIPO Copyright
- Treaty adopted on December 20, 1996, and/or similar international
- agreements.
-
- f. Exceptions and Limitations means fair use, fair dealing, and/or
- any other exception or limitation to Copyright and Similar Rights
- that applies to Your use of the Licensed Material.
-
- g. License Elements means the license attributes listed in the name
- of a Creative Commons Public License. The License Elements of this
- Public License are Attribution and ShareAlike.
-
- h. Licensed Material means the artistic or literary work, database,
- or other material to which the Licensor applied this Public
- License.
-
- i. Licensed Rights means the rights granted to You subject to the
- terms and conditions of this Public License, which are limited to
- all Copyright and Similar Rights that apply to Your use of the
- Licensed Material and that the Licensor has authority to license.
-
- j. Licensor means the individual(s) or entity(ies) granting rights
- under this Public License.
-
- k. Share means to provide material to the public by any means or
- process that requires permission under the Licensed Rights, such
- as reproduction, public display, public performance, distribution,
- dissemination, communication, or importation, and to make material
- available to the public including in ways that members of the
- public may access the material from a place and at a time
- individually chosen by them.
-
- l. Sui Generis Database Rights means rights other than copyright
- resulting from Directive 96/9/EC of the European Parliament and of
- the Council of 11 March 1996 on the legal protection of databases,
- as amended and/or succeeded, as well as other essentially
- equivalent rights anywhere in the world.
-
- m. You means the individual or entity exercising the Licensed Rights
- under this Public License. Your has a corresponding meaning.
-
-
-Section 2 -- Scope.
-
- a. License grant.
-
- 1. Subject to the terms and conditions of this Public License,
- the Licensor hereby grants You a worldwide, royalty-free,
- non-sublicensable, non-exclusive, irrevocable license to
- exercise the Licensed Rights in the Licensed Material to:
-
- a. reproduce and Share the Licensed Material, in whole or
- in part; and
-
- b. produce, reproduce, and Share Adapted Material.
-
- 2. Exceptions and Limitations. For the avoidance of doubt, where
- Exceptions and Limitations apply to Your use, this Public
- License does not apply, and You do not need to comply with
- its terms and conditions.
-
- 3. Term. The term of this Public License is specified in Section
- 6(a).
-
- 4. Media and formats; technical modifications allowed. The
- Licensor authorizes You to exercise the Licensed Rights in
- all media and formats whether now known or hereafter created,
- and to make technical modifications necessary to do so. The
- Licensor waives and/or agrees not to assert any right or
- authority to forbid You from making technical modifications
- necessary to exercise the Licensed Rights, including
- technical modifications necessary to circumvent Effective
- Technological Measures. For purposes of this Public License,
- simply making modifications authorized by this Section 2(a)
- (4) never produces Adapted Material.
-
- 5. Downstream recipients.
-
- a. Offer from the Licensor -- Licensed Material. Every
- recipient of the Licensed Material automatically
- receives an offer from the Licensor to exercise the
- Licensed Rights under the terms and conditions of this
- Public License.
-
- b. Additional offer from the Licensor -- Adapted Material.
- Every recipient of Adapted Material from You
- automatically receives an offer from the Licensor to
- exercise the Licensed Rights in the Adapted Material
- under the conditions of the Adapter's License You apply.
-
- c. No downstream restrictions. You may not offer or impose
- any additional or different terms or conditions on, or
- apply any Effective Technological Measures to, the
- Licensed Material if doing so restricts exercise of the
- Licensed Rights by any recipient of the Licensed
- Material.
-
- 6. No endorsement. Nothing in this Public License constitutes or
- may be construed as permission to assert or imply that You
- are, or that Your use of the Licensed Material is, connected
- with, or sponsored, endorsed, or granted official status by,
- the Licensor or others designated to receive attribution as
- provided in Section 3(a)(1)(A)(i).
-
- b. Other rights.
-
- 1. Moral rights, such as the right of integrity, are not
- licensed under this Public License, nor are publicity,
- privacy, and/or other similar personality rights; however, to
- the extent possible, the Licensor waives and/or agrees not to
- assert any such rights held by the Licensor to the limited
- extent necessary to allow You to exercise the Licensed
- Rights, but not otherwise.
-
- 2. Patent and trademark rights are not licensed under this
- Public License.
-
- 3. To the extent possible, the Licensor waives any right to
- collect royalties from You for the exercise of the Licensed
- Rights, whether directly or through a collecting society
- under any voluntary or waivable statutory or compulsory
- licensing scheme. In all other cases the Licensor expressly
- reserves any right to collect such royalties.
-
-
-Section 3 -- License Conditions.
-
-Your exercise of the Licensed Rights is expressly made subject to the
-following conditions.
-
- a. Attribution.
-
- 1. If You Share the Licensed Material (including in modified
- form), You must:
-
- a. retain the following if it is supplied by the Licensor
- with the Licensed Material:
-
- i. identification of the creator(s) of the Licensed
- Material and any others designated to receive
- attribution, in any reasonable manner requested by
- the Licensor (including by pseudonym if
- designated);
-
- ii. a copyright notice;
-
- iii. a notice that refers to this Public License;
-
- iv. a notice that refers to the disclaimer of
- warranties;
-
- v. a URI or hyperlink to the Licensed Material to the
- extent reasonably practicable;
-
- b. indicate if You modified the Licensed Material and
- retain an indication of any previous modifications; and
-
- c. indicate the Licensed Material is licensed under this
- Public License, and include the text of, or the URI or
- hyperlink to, this Public License.
-
- 2. You may satisfy the conditions in Section 3(a)(1) in any
- reasonable manner based on the medium, means, and context in
- which You Share the Licensed Material. For example, it may be
- reasonable to satisfy the conditions by providing a URI or
- hyperlink to a resource that includes the required
- information.
-
- 3. If requested by the Licensor, You must remove any of the
- information required by Section 3(a)(1)(A) to the extent
- reasonably practicable.
-
- b. ShareAlike.
-
- In addition to the conditions in Section 3(a), if You Share
- Adapted Material You produce, the following conditions also apply.
-
- 1. The Adapter's License You apply must be a Creative Commons
- license with the same License Elements, this version or
- later, or a BY-SA Compatible License.
-
- 2. You must include the text of, or the URI or hyperlink to, the
- Adapter's License You apply. You may satisfy this condition
- in any reasonable manner based on the medium, means, and
- context in which You Share Adapted Material.
-
- 3. You may not offer or impose any additional or different terms
- or conditions on, or apply any Effective Technological
- Measures to, Adapted Material that restrict exercise of the
- rights granted under the Adapter's License You apply.
-
-
-Section 4 -- Sui Generis Database Rights.
-
-Where the Licensed Rights include Sui Generis Database Rights that
-apply to Your use of the Licensed Material:
-
- a. for the avoidance of doubt, Section 2(a)(1) grants You the right
- to extract, reuse, reproduce, and Share all or a substantial
- portion of the contents of the database;
-
- b. if You include all or a substantial portion of the database
- contents in a database in which You have Sui Generis Database
- Rights, then the database in which You have Sui Generis Database
- Rights (but not its individual contents) is Adapted Material,
-
- including for purposes of Section 3(b); and
- c. You must comply with the conditions in Section 3(a) if You Share
- all or a substantial portion of the contents of the database.
-
-For the avoidance of doubt, this Section 4 supplements and does not
-replace Your obligations under this Public License where the Licensed
-Rights include other Copyright and Similar Rights.
-
-
-Section 5 -- Disclaimer of Warranties and Limitation of Liability.
-
- a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
- EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
- AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
- ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
- IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
- WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
- PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
- ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
- KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
- ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
-
- b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
- TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
- NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
- INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
- COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
- USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
- ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
- DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
- IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
-
- c. The disclaimer of warranties and limitation of liability provided
- above shall be interpreted in a manner that, to the extent
- possible, most closely approximates an absolute disclaimer and
- waiver of all liability.
-
-
-Section 6 -- Term and Termination.
-
- a. This Public License applies for the term of the Copyright and
- Similar Rights licensed here. However, if You fail to comply with
- this Public License, then Your rights under this Public License
- terminate automatically.
-
- b. Where Your right to use the Licensed Material has terminated under
- Section 6(a), it reinstates:
-
- 1. automatically as of the date the violation is cured, provided
- it is cured within 30 days of Your discovery of the
- violation; or
-
- 2. upon express reinstatement by the Licensor.
-
- For the avoidance of doubt, this Section 6(b) does not affect any
- right the Licensor may have to seek remedies for Your violations
- of this Public License.
-
- c. For the avoidance of doubt, the Licensor may also offer the
- Licensed Material under separate terms or conditions or stop
- distributing the Licensed Material at any time; however, doing so
- will not terminate this Public License.
-
- d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
- License.
-
-
-Section 7 -- Other Terms and Conditions.
-
- a. The Licensor shall not be bound by any additional or different
- terms or conditions communicated by You unless expressly agreed.
-
- b. Any arrangements, understandings, or agreements regarding the
- Licensed Material not stated herein are separate from and
- independent of the terms and conditions of this Public License.
-
-
-Section 8 -- Interpretation.
-
- a. For the avoidance of doubt, this Public License does not, and
- shall not be interpreted to, reduce, limit, restrict, or impose
- conditions on any use of the Licensed Material that could lawfully
- be made without permission under this Public License.
-
- b. To the extent possible, if any provision of this Public License is
- deemed unenforceable, it shall be automatically reformed to the
- minimum extent necessary to make it enforceable. If the provision
- cannot be reformed, it shall be severed from this Public License
- without affecting the enforceability of the remaining terms and
- conditions.
-
- c. No term or condition of this Public License will be waived and no
- failure to comply consented to unless expressly agreed to by the
- Licensor.
-
- d. Nothing in this Public License constitutes or may be interpreted
- as a limitation upon, or waiver of, any privileges and immunities
- that apply to the Licensor or You, including from the legal
- processes of any jurisdiction or authority.
-
-
-=======================================================================
-
-Creative Commons is not a party to its public licenses.
-Notwithstanding, Creative Commons may elect to apply one of its public
-licenses to material it publishes and in those instances will be
-considered the "Licensor." Except for the limited purpose of indicating
-that material is shared under a Creative Commons public license or as
-otherwise permitted by the Creative Commons policies published at
-creativecommons.org/policies, Creative Commons does not authorize the
-use of the trademark "Creative Commons" or any other trademark or logo
-of Creative Commons without its prior written consent including,
-without limitation, in connection with any unauthorized modifications
-to any of its public licenses or any other arrangements,
-understandings, or agreements concerning use of licensed material. For
-the avoidance of doubt, this paragraph does not form part of the public
-licenses.
-
-Creative Commons may be contacted at creativecommons.org.
-
diff --git a/tests/test_corpus/README.md b/tests/test_corpus/README.md
deleted file mode 100644
index cc5f09aed..000000000
--- a/tests/test_corpus/README.md
+++ /dev/null
@@ -1,209 +0,0 @@
-**Note: This is a subset of the full UD_English-EWT corpus, fit for testing purposes. Below is the original readme:**
-
-
-Universal Dependencies - English Dependency Treebank
-Universal Dependencies English Web Treebank v2.2 -- 2018-04-15
-https://github.com/UniversalDependencies/UD_English-EWT
-
-# Summary
-
-A Gold Standard Universal Dependencies Corpus for English,
-built over the source material of the English Web Treebank
-LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
-
-
-# Introduction
-
-The corpus comprises 254,830 words and 16,622 sentences, taken from five genres
-of web media: weblogs, newsgroups, emails, reviews, and Yahoo! answers. See the
-LDC2012T13 documentation for more details on the sources of the sentences. The
-trees were automatically converted into Stanford Dependencies and then
-hand-corrected to Universal Dependencies. All the basic dependency annotations have been single-annotated, a limited portion of them have been double-annotated,
-and subsequent correction has been done to improve consistency. Other aspects
-of the treebank, such as Universal POS, features and enhanced dependencies, has mainly been done
-automatically, with very limited hand-correction.
-
-# License/Copyright
-
-Universal Dependencies English Web Treebank © 2013-2018
-by The Board of Trustees of The Leland Stanford Junior University.
-All Rights Reserved.
-
-The annotations and database rights of the Universal Dependencies
-English Web Treebank are licensed under a
-Creative Commons Attribution-ShareAlike 4.0 International License.
-
-You should have received a copy of the license along with this
-work. If not, see