Release v0.0.14 · pingcap/pytidb

🐛 Bug fixes

NULL Vector handling Bug

fix: refactor NULL vector handling to avoid Vector Index invalidation by @Mini256 in #257

Bug description

In PyTiDB 0.0.13, to address the NULL Vector issue, the client automatically appends a clause like HAVING embedding IS NOT NULL to filter out NULL vectors. However, this prevents vector search queries from using the Vector Index.

Bug Fix

PyTiDB 0.0.14 introduces the following changes:

NULL vector filtering is disabled by default
A .skip_null_vectors(True) option is provided, allowing developers to control whether NULL vectors should be filtered
To avoid filters causing vector indexes to become ineffective, PyTiDB now uses post-filtering mode by default for vector search:
- The ANN query is executed in the inner subquery
- Filtering is applied in the outer query
In PyTiDB 0.0.13, the NULL vector filtering condition was placed in the inner query, which caused the Vector Index to be bypassed. In PyTiDB 0.0.14, the filtering is moved to the outer query.

What is the NULL Vector issue?

In real-world RAG application development, the vector column is often populated asynchronously after the database record is created during the embedding process. Before the embedding is completed, the vector column is filled with NULL.

Since ANN queries are typically executed with ORDER BY … ASC, and in MySQL semantics NULL values are sorted before all non-NULL values, the presence of a large number of NULL vectors can severely degrade vector search results.

📝 Documentation & Examples

docs: add vector index example by @Mini256 in #258
docs: add example of vector search with realtime data by @Icemap in #199
docs: use tidb_client.db_engine in README example (fixes #193) #195 by @haseebpvt in #196

New Contributors

@haseebpvt made their first contribution in #196

Full Changelog: v0.0.13...v0.0.14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.14

Choose a tag to compare

Sorry, something went wrong.