RFC: Tokenized free text search #14135
adevinwild
started this conversation in
RFC
Replies: 1 comment
-
|
Agree! Right now the current search is not usable for customers lookup. I have implemented my own solution for that purpose and the query filter is exactly as you posted. I have been using it for about two months, very frequently, and I'm very happy with it.
Not a problem for me
In most cases the number of tokens will not exceed 2. First name plus last name (even with compound names) should yield fairly specific results. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey team,
Following customer feedback on the Admin UI customer page, I realized that the current search is very restrictive.
That's why I'm writing this RFC to improve it and find a solution to this problem together.
Current logic
The search logic treats the input as a monolithic string.
For example:
You search "John Doe"
The factory takes the entire string "John Doe" and checks if it exists (as it is) inside any column
We can imagine that it will return that filter object :
When applied to the SQL query as it is, this will return no results if your database contains "John" as the
first_nameand "Doe" as thelast_name(data lives in two different columns)Proposed implementation (Tokenized search)
IMO, this is the most straightforward way I've seen that pattern used in a lot of search engines and it could be beneficial for us too.
The goal here is to tokenize the input string without rewriting the
retrieveRelationsConstraints.We could split the input by using spaces and/or non-alphanumeric characters (same way Algolia handle the tokenization process on their side).
We can now iterate on each token and use the
retrieveRelationsConstraints.Storing each token constraints in multiple
$orand linking them with$andintersection so we're much more closer to what the end user is looking for.For example:
You search "John Doe"
It is now tokenized to
["John", "Doe"]We can imagine that it will return that filter object :
Trade-offs
I'm still investigating this , but here's what I've been able to list so far :
- No more exact phrase search
- Query performance
Would love to have a fresh vision on this, so we can identify more cons to this approach
Beta Was this translation helpful? Give feedback.
All reactions