A simple search engine that uses simple python data structures to store the data and search for it.
pip install -r requirements.txtThe data should be free text and in the data folder. Every file in the data folder will be indexed as a document.
To run the script, first cd into the root directory of the project and then run the following command:
export PYTHONPATH=$(pwd)Then run the following command:
python src/main.pyThe search engine will ask you to enter a query. Enter the query and press enter. The search engine will return the top 5 results.
The search engine uses a few simple steps to search for a query:
- It reads the data from the
datafolder and stores it in a dictionary. - It preprocesses the data by removing stop words, punctuations, numbers and converting the text to lowercase.
- It reads stopwords from the
stopwords.txtfile, preprocesses the stopwords and stores them in a list. - It reads the query from the user and preprocesses it.
- It calculates the score of each document for the query and ranks them in descending order. The score is calculated by the number of times the document contains the query words.
- It returns the top 5 results.