Open
Description
Summary
Currently, LIKE
queries in Databend often result in full table scans, especially when the pattern includes leading wildcards (e.g., LIKE '%keyword'
) or complex regular expressions. This can lead to unacceptable query latencies, especially on large datasets.
N-gram bloom index offers a powerful solution to this problem by pre-processing and indexing substrings (N-grams) of the text data. This allows the query engine to quickly identify potential matches based on the indexed N-grams, drastically reducing the number of rows that need to be scanned.
Benefits of N-gram bloom index:
- Significant Performance Improvement for LIKE Queries: Dramatically reduces query execution time for
LIKE
queries, especially those with leading wildcards or complex patterns. - Reduced Resource Consumption: By minimizing full table scans, N-gram bloom index reduces CPU and I/O usage, leading to more efficient resource utilization.