Skip to content

Implement StopWords from TNTSearch & Add Config Option #136

@benlilley

Description

@benlilley

First, thanks for the great plugin, we've been battling Grav's search for a while and finding this has really helped.

The Problem

Currently when searching larger data sets the search can become slow and cumbersome and cause some weird interaction with the search field playing catch up to the user. One of the reasons for this is that the plugin appears to search for every letter after you reach the initial minimum. For example searching for eating an apple will find every instance of an in your data set and search through them even if you have min: 3 set. This is slow and also lowers the result quality.

Note: Perhaps a configurable delay on the search input would help here too, aiming for searching on typing finishing not every key stroke.

Potential Solution

This is traditionally solved using stop words, which are actually implemented in TNTSearch: teamtnt/tntsearch#83 and seen in TNTIndexer.php:

class TNTIndexer
{
    protected $index              = null;
    protected $dbh                = null;
    protected $primaryKey         = null;
    protected $excludePrimaryKey  = true;
    public $stemmer               = null;
    public $tokenizer             = null;
    public $stopWords             = [];

A common list as a starting point for English would be:

public $stopWords = ['a', 'an', 'and', 'are', 'as', 'at', 'be', 'but', 'by', 'for', 'if', 'in', 'into', 'is', 'it', 'no', 'not', 'of', 'on', 'or', 'such', 'that', 'the', 'their', 'then', 'there', 'these', 'they', 'this', 'to', 'was', 'will', 'with'];

It would be great if there was an option to pass a list of these stop words to be ignored in tntsearch.yaml that way it's easy to discover and manage, plus won't be lost during a plugin update like updating the current vendor file will do. Ideally these words would also then be not used for the Highlighter functionality.

Grav: 1.7.46
TNT Search: 3.4.0
PHP: 8.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions