-
Notifications
You must be signed in to change notification settings - Fork 22
Description
First, thanks for the great plugin, we've been battling Grav's search for a while and finding this has really helped.
The Problem
Currently when searching larger data sets the search can become slow and cumbersome and cause some weird interaction with the search field playing catch up to the user. One of the reasons for this is that the plugin appears to search for every letter after you reach the initial minimum. For example searching for eating an apple will find every instance of an in your data set and search through them even if you have min: 3 set. This is slow and also lowers the result quality.
Note: Perhaps a configurable delay on the search input would help here too, aiming for searching on typing finishing not every key stroke.
Potential Solution
This is traditionally solved using stop words, which are actually implemented in TNTSearch: teamtnt/tntsearch#83 and seen in TNTIndexer.php:
class TNTIndexer
{
protected $index = null;
protected $dbh = null;
protected $primaryKey = null;
protected $excludePrimaryKey = true;
public $stemmer = null;
public $tokenizer = null;
public $stopWords = [];A common list as a starting point for English would be:
public $stopWords = ['a', 'an', 'and', 'are', 'as', 'at', 'be', 'but', 'by', 'for', 'if', 'in', 'into', 'is', 'it', 'no', 'not', 'of', 'on', 'or', 'such', 'that', 'the', 'their', 'then', 'there', 'these', 'they', 'this', 'to', 'was', 'will', 'with'];It would be great if there was an option to pass a list of these stop words to be ignored in tntsearch.yaml that way it's easy to discover and manage, plus won't be lost during a plugin update like updating the current vendor file will do. Ideally these words would also then be not used for the Highlighter functionality.
Grav: 1.7.46
TNT Search: 3.4.0
PHP: 8.2