Skip to content

Releases: donalshijan/Search-Engine

Fast Accuracy

11 Mar 19:11

Choose a tag to compare

In this version we have improved the accuracy of search results significantly due to a new technique been introduced in the scoring algorithm known by different names which are Stopword Attenuation,Term Weighting Adjustment,Linguistic Normalization for Relevance Ranking and Content Word Emphasis.
It is a technique where some words (articles, determiners, auxiliary verbs, etc.) which are often considered stopwords (words that carry little semantic value), so they are assigned lesser significance or their contribution towards score calculations carries less weight than other more significant words, reducing their influence in scoring of the results or another way to put it would be, attenuating their impact.
This version also introduces caching results using an LRU Cache to store results of some of the most frequently used queries, which results in much faster response times than previous version, particularly in our test.
While testing, we set up a cache that can hold upto 5 results. Since we have only 10 sample queries for test, 5 is 50% of the number of sample queries we have for our test , and 1% of the number of queries or search query requests made during test.
Here are the results from running performance test on this version:

single_core_single_thread
Average query processing time: 0.4642217039999998 secs
Average accuracy: 87.20%
multi_core_single_thread
Average query processing time: 0.004051417999999999 secs
Average accuracy: 87.00%
multi_core_multiple_threads_each_thread_searching_against_whole_index
Average query processing time: 0.003243517999999999 secs
Average accuracy: 88.30%
multi_core_multiple_threads_each_thread_in_each_core_searching_against_single_sharded_subset_of_index
Average query processing time: 0.007090070000000002 secs
Average accuracy: 85.96%

Initial Release

01 Jan 07:25

Choose a tag to compare

This is the first release version of this project, and here are the best test results when running tests in every mode for this version:

single_core_single_thread
Average query processing time: 0.8534768679999997 secs
Average accuracy: 69.04%
multi_core_single_thread
Average query processing time: 0.07578554000000005 secs
Average accuracy: 68.80%
multi_core_multiple_threads_each_thread_searching_against_whole_index
Average query processing time: 0.05897567800000001 secs
Average accuracy: 68.40%
multi_core_multiple_threads_each_thread_in_each_core_searching_against_single_sharded_subset_of_index
Average query processing time: 0.13842398600000008 secs
Average accuracy: 71.16%

The average processing time for multi_core_multiple_threads_each_thread_in_each_core_searching_against_single_sharded_subset_of_index could be much better than this if we are running on a machine that supports hyper threading, unfortunately mac doesn't , and I don't see the point of adding support for windows cause I don't have a windows laptop.