This repository contains a script for training any custom NER model easily using a SpaCy setup without worrying about more complicated configurations. The only requirement is a well annotated data for training.
git clone https://github.com/Harisudhan5/Train-Custom-NER-Model-With-SpaCy.gitInstall requirement file
pip install -r requirements.txtDownload a pre-trained model
python -m spacy download en_core_web_lgTo run the pretrained NER model with its predefined entities, execute default.py to get the results
python default.py- Prepare the Dataset: Create a dataset and store it in data.py in the following format
training_data = [
["Python is one of the easiest languages to learn", {'entities': [[0, 6, 'PROGRAMMING_LANGUAGE']]}],
['Support vector machines are powerful, but neural networks are more flexible.', {'entities': [[0, 22, 'ALGORITHM_MODEL'], [44, 59, 'ALGORITHM_MODEL']]}],
['I use Django for web development, and Flask for microservices.', {'entities': [[8, 14, 'FRAMEWORK_LIBRARY'], [41, 46, 'FRAMEWORK_LIBRARY']]}]
]-
Define New Labels: Modify train.py to define new entity labels in the list as per the prepared dataset.
-
Train the Model: Run the training script.
python train.pyThe trained model will be saved as ner inside the Model Directory.
Once the model is trained, run inference.py to test the results with your data
python inference.pyBy training a custom model via SpaCy, you may lose the pretrained entities such as GEO, ORG, etc. To retain previous knowledge, include those entities in your new dataset and make it diverse by adding multiple entity types in each sample.
