Skip to content

Feature: Implement N-gram bloom filter index to improve the performance of LIKE queries #17724

Open
@b41sh

Description

@b41sh

Summary

Currently, LIKE queries in Databend often result in full table scans, especially when the pattern includes leading wildcards (e.g., LIKE '%keyword') or complex regular expressions. This can lead to unacceptable query latencies, especially on large datasets.

N-gram bloom index offers a powerful solution to this problem by pre-processing and indexing substrings (N-grams) of the text data. This allows the query engine to quickly identify potential matches based on the indexed N-grams, drastically reducing the number of rows that need to be scanned.

Benefits of N-gram bloom index:

  • Significant Performance Improvement for LIKE Queries: Dramatically reduces query execution time for LIKE queries, especially those with leading wildcards or complex patterns.
  • Reduced Resource Consumption: By minimizing full table scans, N-gram bloom index reduces CPU and I/O usage, leading to more efficient resource utilization.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions