Skip to content

v0.0.14

Latest

Choose a tag to compare

@Mini256 Mini256 released this 04 Feb 02:44
· 6 commits to main since this release
5b170c3

🐛 Bug fixes

NULL Vector handling Bug

  • fix: refactor NULL vector handling to avoid Vector Index invalidation by @Mini256 in #257

Bug description

In PyTiDB 0.0.13, to address the NULL Vector issue, the client automatically appends a clause like HAVING embedding IS NOT NULL to filter out NULL vectors. However, this prevents vector search queries from using the Vector Index.

Bug Fix

PyTiDB 0.0.14 introduces the following changes:

  1. NULL vector filtering is disabled by default

  2. A .skip_null_vectors(True) option is provided, allowing developers to control whether NULL vectors should be filtered

  3. To avoid filters causing vector indexes to become ineffective, PyTiDB now uses post-filtering mode by default for vector search:

    • The ANN query is executed in the inner subquery
    • Filtering is applied in the outer query

    In PyTiDB 0.0.13, the NULL vector filtering condition was placed in the inner query, which caused the Vector Index to be bypassed. In PyTiDB 0.0.14, the filtering is moved to the outer query.

What is the NULL Vector issue?

In real-world RAG application development, the vector column is often populated asynchronously after the database record is created during the embedding process. Before the embedding is completed, the vector column is filled with NULL.

Since ANN queries are typically executed with ORDER BY … ASC, and in MySQL semantics NULL values are sorted before all non-NULL values, the presence of a large number of NULL vectors can severely degrade vector search results.

📝 Documentation & Examples

New Contributors

Full Changelog: v0.0.13...v0.0.14