RFC: Tokenized free text search #14135

adevinwild · 2025-11-26T15:36:42Z

adevinwild
Nov 26, 2025

Hey team,
Following customer feedback on the Admin UI customer page, I realized that the current search is very restrictive.
That's why I'm writing this RFC to improve it and find a solution to this problem together.

Current logic

The search logic treats the input as a monolithic string.

For example:

You search "John Doe"
The factory takes the entire string "John Doe" and checks if it exists (as it is) inside any column

We can imagine that it will return that filter object :

{
  $or: [
    { first_name: { $ilike: "%John Doe%" } },
    { last_name: { $ilike: "%John Doe%" } },
    { email: { $ilike: "%John Doe%" } },
  ]
}

When applied to the SQL query as it is, this will return no results if your database contains "John" as the first_name and "Doe" as the last_name (data lives in two different columns)

Proposed implementation (Tokenized search)

IMO, this is the most straightforward way I've seen that pattern used in a lot of search engines and it could be beneficial for us too.
The goal here is to tokenize the input string without rewriting the retrieveRelationsConstraints.

We could split the input by using spaces and/or non-alphanumeric characters (same way Algolia handle the tokenization process on their side).

We can now iterate on each token and use the retrieveRelationsConstraints.
Storing each token constraints in multiple $or and linking them with $and intersection so we're much more closer to what the end user is looking for.

For example:

You search "John Doe"
It is now tokenized to ["John", "Doe"]

We can imagine that it will return that filter object :

// This is the filter that will be generated by the mikro-orm-free-text-search-filter
// It means: Show me the records that contain "John" AND also contain "Doe" in any searchable* field of the entity

{
  $and: [
    {
      $or: [
        { first_name: { $ilike: "%John%" } },
        { last_name: { $ilike: "%John%" } },
        { email: { $ilike: "%John%" } },
      ],
    },
    {
      $or: [
        { first_name: { $ilike: "%Doe%" } },
        { last_name: { $ilike: "%Doe%" } },
        { email: { $ilike: "%Doe%" } },
      ],
    },
  ],
}

Trade-offs

I'm still investigating this , but here's what I've been able to list so far :

- No more exact phrase search

For example, searching for an exact customer address will not be possible

- Query performance

Still not sure about this because I didn't tried the solution yet, but we can expect this to increased by N (based on the number of tokens)

Would love to have a fresh vision on this, so we can identify more cons to this approach

yanchesky · 2025-11-27T19:30:24Z

yanchesky
Nov 27, 2025

Agree! Right now the current search is not usable for customers lookup. I have implemented my own solution for that purpose and the query filter is exactly as you posted. I have been using it for about two months, very frequently, and I'm very happy with it.

- No more exact phrase search

Not a problem for me

- Query performance

In most cases the number of tokens will not exceed 2. First name plus last name (even with compound names) should yield fairly specific results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Tokenized free text search #14135

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

RFC: Tokenized free text search #14135

Uh oh!

Uh oh!

adevinwild Nov 26, 2025

Current logic

For example:

Proposed implementation (Tokenized search)

For example:

Trade-offs

Replies: 1 comment

Uh oh!

yanchesky Nov 27, 2025

adevinwild
Nov 26, 2025

yanchesky
Nov 27, 2025