Document Similarity framework A framework to compare the documents by extracting and comparing triplets of the sentences in documents.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them: Required Python modules in addition to basic libraries:
- sys
- os
- nltk (StanfordDependencyParser)
- subprocess
- parse
- pycorenlp
pip install spacy (>=3.0.6)
pip install networkx (>=2.5.1)
pip install flask (>=2.0.1)
pip isntall pycorenlp (>=0.3.0)
Install en_core_web_sm for nlp model:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
We will use two Stanford packages. Download the packages using the link below:
- StanfordCoreNLP : https://stanfordnlp.github.io/CoreNLP/download.html
- Stanford parsor using https://nlp.stanford.edu/software/stanford-parser-4.2.0.zip
Note : Make sure you have java 8 and jdk 8 installed in your system. Then give the respective path name for CLASS_NAME and JAVAHOME environment variables (in simplification.py)
Open the command propmpt and run the following command( path_name : path where coreNLP file is downloaded)
java -mx4g -cp "{path_name}\stanford-corenlp-4.2.2\stanford-corenlp-4.2.2\*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,depparse,natlog,openie" -outputFormat "json" -openie.triple.strict "true" -openie.max_entailments_per_clause 3
Open the terminal or new command prompt and go to 'App' folder:
flask run
- Open the browser and open http://localhost:5000/
- You can see the dsf framework running (It may take a little time initially to import all modules at first)
- Select the text files and upload it(The processing time depends on the processor speed and complexity of documents uploaded)
- Click on the buttons to get the reuired result(i.e, similarity score, keyword search or knowledge graph)