Releases: pingcap/pytidb
v0.0.14
🐛 Bug fixes
NULL Vector handling Bug
Bug description
In PyTiDB 0.0.13, to address the NULL Vector issue, the client automatically appends a clause like HAVING embedding IS NOT NULL to filter out NULL vectors. However, this prevents vector search queries from using the Vector Index.
Bug Fix
PyTiDB 0.0.14 introduces the following changes:
-
NULL vector filtering is disabled by default
-
A
.skip_null_vectors(True)option is provided, allowing developers to control whether NULL vectors should be filtered -
To avoid filters causing vector indexes to become ineffective, PyTiDB now uses post-filtering mode by default for vector search:
- The ANN query is executed in the inner subquery
- Filtering is applied in the outer query
In PyTiDB 0.0.13, the NULL vector filtering condition was placed in the inner query, which caused the Vector Index to be bypassed. In PyTiDB 0.0.14, the filtering is moved to the outer query.
What is the NULL Vector issue?
In real-world RAG application development, the vector column is often populated asynchronously after the database record is created during the embedding process. Before the embedding is completed, the vector column is filled with NULL.
Since ANN queries are typically executed with ORDER BY … ASC, and in MySQL semantics NULL values are sorted before all non-NULL values, the presence of a large number of NULL vectors can severely degrade vector search results.
📝 Documentation & Examples
- docs: add vector index example by @Mini256 in #258
- docs: add example of vector search with realtime data by @Icemap in #199
- docs: use tidb_client.db_engine in README example (fixes #193) #195 by @haseebpvt in #196
New Contributors
- @haseebpvt made their first contribution in #196
Full Changelog: v0.0.13...v0.0.14
v0.0.13
✨ What's New
-
EmbeddingFunctionsupportdimensionsconfig for server-side embedding #184 by @Mini256embed_fn = EmbeddingFunction(model_name="text-embedding-3-small", dimensions=1024)
You can use this parameter to reduce the dimensionality of the vectors generated by the embedding model, which can reduce the storage consumption and improve the query efficiency of the vector search to some extent.
-
table.search()API and support returning Relationship Field #180 by @Mini256For example:
class Entity(TableModel): __tablename__ = "entities" id: int = Field(primary_key=True) name: str = Field() entity_table = db.create_table(schema=Entity, if_exists="skip") class Relation(TableModel): __tablename__ = "relations" id: int = Field(primary_key=True) description: str = Field() source_entity_id: int = Field(foreign_key="entities.id") target_entity_id: int = Field(foreign_key="entities.id") embedding: list[float] = text_embed.VectorField(source_field="description") source_entity: Entity = Relationship( sa_relationship_kwargs={ "primaryjoin": "Relation.source_entity_id == Entity.id", "lazy": "joined" }, ) target_entity: Entity = Relationship( sa_relationship_kwargs={ "primaryjoin": "Relation.target_entity_id == Entity.id", "lazy": "joined" }, ) relation_table = db.create_table(schema=Relation, if_exists="skip")
Now,
relation_table.search("xxxx").limit(1).to_pydantic();will return thesource_entity, andtarget_entityfield, it will help you build GraphRAG application with fewer lines of code.
📝 Documentation & Examples
- Rename "Serverless" to "Starter" #179 by @sykp241095
- Add auto embedding example for multiple model providers #181 by @Mini256
- Add custom embedding function example #167 by @Icemap
🧰 Refactor
- Move table creation logic out of
Tableconstructor #169 by @Mini256
Inittable = Table(schema=TableModel)will no create the table in database, usingtable.create()ordb.create_table(schema=TableModel)instead.
🔗 Full Changelog: v0.0.12...v0.0.13
v0.0.12
✨ What's New
Breaking Changes
In the new version, EmbeddingFunction for text will use server-side embedding by default, which no longer sends the request on the client side, but leaves it to the database side for automatic embedding.
You need to configure the API Key using tidb_client.configure_embedding_provider() or use the built-in Embedding model on TiDB Cloud.
If you want to fall back to client-side embedding, pass use_server=False when initializing the EmbeddingFunction.
from app.db import tidb_client
from pytidb.embeddings import EmbeddingFunction
from pytidb.schema import TableModel, Field
# Set API key globally
tidb_client.configure_embedding_provider("openai", os.getenv("OPENAI_API_KEY"))
# Define table schema with auto embedding config.
class Chunk(TableModel):
id: int = Field(primary_key=True)
text: str = Field()
text_vec: Optional[list[float]] = EmbeddingFunction(
"openai/text-embedding-3-small"
).VectorField(source_field="text")
# Create table
tbl = tidb_client.create_table(schema=Chunk, if_exists="overwrite")
# Insert data
tbl.insert(Chunk(id=1, text="foo"))
# Search
results = tbl.search("bar").limit(1).to_pydantic(with_score=True)Full Changelog: v0.0.11...v0.0.12
v0.0.11
A patch version for v0.0.10
Full Changelog: v0.0.9...v0.0.11
v0.0.10
Warning: Please using v0.0.10.post1
✨ What's New
table.create()API usesif_existsinstead ofmode#152 by @breezewish- mode="overwrite" -> if_exists="overwrite"
- mode="create" -> if_exists="raise"
- mode="exist_ok" -> if_exists="skip"
- Add
use_database()andcurrent_database()APIs, and enable concurrent test execution #154 by @breezewish - Support
table.save()API for convenient data persistence #147 by @Icemap - Rename
table_names/database_namestolist_tables/list_databases#151 by @breezewish - Expose
build_tidb_connection_url()function #151 by @breezewish - Refine exposed column types, use capitalization for type name #161 by @Mini256
- using
TEXTinstead ofText - using
VECTORinstead ofVector
- using
🐛 Bug Fixes
🧰 Refactor
🔗 Full Changelog: v0.0.9...v0.0.10
v0.0.9
✨ What's New
- Enable automatic embedding for images (Beta) #137 by @Mini256
- Set connection pool recycle time to 300s by default for serverless mode #131 by @Icemap
- Support
insertandbulk_insertwith Pythondictinput #142 by @Icemap
🐛 Bug Fixes
📚 Documentation & Examples
- Add an example for image search #149 by @Mini256
- Add a memory-based chat application with Web UI #150 by @Mini256
🔗 Full Changelog: v0.0.8.post2...v0.0.9
v0.0.8.post2
What's Changed
- fix: make check_vector_column to use the proper method for identifying vector columns by @Icemap in #126
Full Changelog: v0.0.8.post1...v0.0.8.post2
v0.0.8.post1
What's Changed
Full Changelog: v0.0.8...v0.0.8.post1
v0.0.8
🆕 What's New
-
Declarative Indexes
-
Hybrid Search Enhancements
-
Query Features
-
Table Management
🐛 Bug Fixes
- Fixed
label got an empty valueerror #100 by @yahonda - Fixed
pull model manifest: file does not existerror in docs #102 by @yahonda - Fixed default logic for choosing text column in search #120 by @Mini256
- Added missing
pytidb[models]dependency to in-memory example #101 by @yahonda
📚 Documentation
- Added "Add to Cursor" button for one-click MCP installation #95 by @Mini256
- Redirected to the new documentation site #96 by @Mini256
- Removed old documentation #114 by @Mini256
- Refined and added full-text search examples, updated sample items #117, #118 by @Mini256
- Refined hybrid search example #121 by @Mini256
- Refined RAG example #110 by @Mini256
New Contributors
Full Changelog: v0.0.7...v0.0.8
v0.0.7
What's Changed
- feat: add database management API by @Mini256 in #82
- feat: support ensure db exists parameter for TiDBClient by @Mini256 in #88
- fix: clean up is_serverless logic by @yihong0618 in #83
- chore: aiohttp to 3.11.18 to drop the warning by @yihong0618 in #81
- refactor: move sql result class by @Mini256 in #86
New Contributors
- @yihong0618 made their first contribution in #83
Full Changelog: v0.0.6...v0.0.7