Releases · pingcap/pytidb

04 Feb 02:44

Mini256

v0.0.14

5b170c3

v0.0.14 Latest

Latest

🐛 Bug fixes

NULL Vector handling Bug

fix: refactor NULL vector handling to avoid Vector Index invalidation by @Mini256 in #257

Bug description

In PyTiDB 0.0.13, to address the NULL Vector issue, the client automatically appends a clause like HAVING embedding IS NOT NULL to filter out NULL vectors. However, this prevents vector search queries from using the Vector Index.

Bug Fix

PyTiDB 0.0.14 introduces the following changes:

NULL vector filtering is disabled by default
A .skip_null_vectors(True) option is provided, allowing developers to control whether NULL vectors should be filtered
To avoid filters causing vector indexes to become ineffective, PyTiDB now uses post-filtering mode by default for vector search:
- The ANN query is executed in the inner subquery
- Filtering is applied in the outer query
In PyTiDB 0.0.13, the NULL vector filtering condition was placed in the inner query, which caused the Vector Index to be bypassed. In PyTiDB 0.0.14, the filtering is moved to the outer query.

What is the NULL Vector issue?

In real-world RAG application development, the vector column is often populated asynchronously after the database record is created during the embedding process. Before the embedding is completed, the vector column is filled with NULL.

Since ANN queries are typically executed with ORDER BY … ASC, and in MySQL semantics NULL values are sorted before all non-NULL values, the presence of a large number of NULL vectors can severely degrade vector search results.

📝 Documentation & Examples

docs: add vector index example by @Mini256 in #258
docs: add example of vector search with realtime data by @Icemap in #199
docs: use tidb_client.db_engine in README example (fixes #193) #195 by @haseebpvt in #196

New Contributors

@haseebpvt made their first contribution in #196

Full Changelog: v0.0.13...v0.0.14

Contributors

Mini256, haseebpvt, and Icemap

Assets 2

29 Aug 03:10

Mini256

v0.0.13

686ea92

v0.0.13

✨ What's New

Make pytidb compatible with TiDB v8.5 #171 by @Mini256
EmbeddingFunction support dimensions config for server-side embedding #184 by @Mini256
```
embed_fn = EmbeddingFunction(model_name="text-embedding-3-small", dimensions=1024)
```
You can use this parameter to reduce the dimensionality of the vectors generated by the embedding model, which can reduce the storage consumption and improve the query efficiency of the vector search to some extent.

table.search() API and support returning Relationship Field #180 by @Mini256

For example:

class Entity(TableModel):
    __tablename__ = "entities"
    id: int = Field(primary_key=True)
    name: str = Field()

 entity_table = db.create_table(schema=Entity, if_exists="skip")

  class Relation(TableModel):
      __tablename__ = "relations"
      id: int = Field(primary_key=True)
      description: str = Field()
      source_entity_id: int = Field(foreign_key="entities.id")
      target_entity_id: int = Field(foreign_key="entities.id")
      embedding: list[float] = text_embed.VectorField(source_field="description")
      source_entity: Entity = Relationship(
          sa_relationship_kwargs={ "primaryjoin": "Relation.source_entity_id == Entity.id", "lazy": "joined" },
      )
      target_entity: Entity = Relationship(
          sa_relationship_kwargs={ "primaryjoin": "Relation.target_entity_id == Entity.id", "lazy": "joined" },
      )

relation_table = db.create_table(schema=Relation, if_exists="skip")

Now, relation_table.search("xxxx").limit(1).to_pydantic(); will return the source_entity, and target_entity field, it will help you build GraphRAG application with fewer lines of code.

📝 Documentation & Examples

Rename "Serverless" to "Starter" #179 by @sykp241095
Add auto embedding example for multiple model providers #181 by @Mini256
Add custom embedding function example #167 by @Icemap

🧰 Refactor

Move table creation logic out of Table constructor #169 by @Mini256
Init table = Table(schema=TableModel) will no create the table in database, using table.create() or db.create_table(schema=TableModel) instead.

🔗 Full Changelog: v0.0.12...v0.0.13

Contributors

sykp241095, Mini256, and Icemap

Assets 2

08 Aug 05:22

Mini256

v0.0.12

d4fc552

v0.0.12

✨ What's New

feat: support server side auto embedding by @Mini256 in #159

Breaking Changes

In the new version, EmbeddingFunction for text will use server-side embedding by default, which no longer sends the request on the client side, but leaves it to the database side for automatic embedding.

You need to configure the API Key using tidb_client.configure_embedding_provider() or use the built-in Embedding model on TiDB Cloud.

If you want to fall back to client-side embedding, pass use_server=False when initializing the EmbeddingFunction.

from app.db import tidb_client
from pytidb.embeddings import EmbeddingFunction
from pytidb.schema import TableModel, Field

# Set API key globally
tidb_client.configure_embedding_provider("openai", os.getenv("OPENAI_API_KEY"))

# Define table schema with auto embedding config.
class Chunk(TableModel):
    id: int = Field(primary_key=True)
    text: str = Field()
    text_vec: Optional[list[float]] = EmbeddingFunction(
        "openai/text-embedding-3-small"
    ).VectorField(source_field="text")

# Create table
tbl = tidb_client.create_table(schema=Chunk, if_exists="overwrite")

# Insert data
tbl.insert(Chunk(id=1, text="foo"))

# Search
results = tbl.search("bar").limit(1).to_pydantic(with_score=True)

Full Changelog: v0.0.11...v0.0.12

Contributors

Mini256

Assets 2

05 Aug 06:40

Mini256

v0.0.11

4f6c1c2

v0.0.11

A patch version for v0.0.10

Full Changelog: v0.0.9...v0.0.11

Assets 2

05 Aug 02:04

Mini256

v0.0.10

0849046

v0.0.10

Warning: Please using v0.0.10.post1

✨ What's New

table.create() API uses if_exists instead of mode #152 by @breezewish
- mode="overwrite" -> if_exists="overwrite"
- mode="create" -> if_exists="raise"
- mode="exist_ok" -> if_exists="skip"
Add use_database() and current_database() APIs, and enable concurrent test execution #154 by @breezewish
Support table.save() API for convenient data persistence #147 by @Icemap
Rename table_names / database_names to list_tables / list_databases #151 by @breezewish
Expose build_tidb_connection_url() function #151 by @breezewish
Refine exposed column types, use capitalization for type name #161 by @Mini256
- using TEXT instead of Text
- using VECTOR instead of Vector

🐛 Bug Fixes

Fix automatic image embedding for Amazon Bedrock #148 by @Icemap

🧰 Refactor

Remove tidb-vector dependency #161 by @Mini256

🔗 Full Changelog: v0.0.9...v0.0.10

Assets 2

17 Jul 02:27

Mini256

v0.0.9

828865c

v0.0.9

✨ What's New

Enable automatic embedding for images (Beta) #137 by @Mini256
Set connection pool recycle time to 300s by default for serverless mode #131 by @Icemap
Support insert and bulk_insert with Python dict input #142 by @Icemap

🐛 Bug Fixes

Skip None values during auto embedding #140 by @Mini256

📚 Documentation & Examples

Add an example for image search #149 by @Mini256
Add a memory-based chat application with Web UI #150 by @Mini256

🔗 Full Changelog: v0.0.8.post2...v0.0.9

Assets 2

01 Jul 18:04

Mini256

v0.0.8.post2

716c78c

v0.0.8.post2

What's Changed

fix: make check_vector_column to use the proper method for identifying vector columns by @Icemap in #126

Full Changelog: v0.0.8.post1...v0.0.8.post2

Contributors

Icemap

Assets 2

01 Jul 18:04

Mini256

v0.0.8.post1

3f42bcf

v0.0.8.post1

What's Changed

fix: make metadata filter works well for table.search() by @Mini256 in #128

Full Changelog: v0.0.8...v0.0.8.post1

Contributors

Mini256

Assets 2

01 Jul 10:00

Mini256

v0.0.8

eb6e941

v0.0.8

🆕 What's New

Declarative Indexes
- Added declarative API for vector and full-text indexes #109 by @Mini256
Hybrid Search Enhancements
- Added support for weighted fusion in table.search(search_type='hybrid') #105 by @Icemap
- Added prefilter support for table.search() #119 by @Icemap
Query Features
- table.query() now supports pagination and sorting #91 by @Mini256
- table.query() now supports SQL-style filters #91 by @Mini256
Table Management
- client.create_table() now supports a create mode parameter (create / exist_ok / overwrite) #90 by @Mini256
- TableModel uses table=True by default #94 by @Mini256
- Added support to drop tables via SQLAlchemy’s drop_table() API #122 by @Mini256

🐛 Bug Fixes

Fixed label got an empty value error #100 by @yahonda
Fixed pull model manifest: file does not exist error in docs #102 by @yahonda
Fixed default logic for choosing text column in search #120 by @Mini256
Added missing pytidb[models] dependency to in-memory example #101 by @yahonda

📚 Documentation

Added "Add to Cursor" button for one-click MCP installation #95 by @Mini256
Redirected to the new documentation site #96 by @Mini256
Removed old documentation #114 by @Mini256
Refined and added full-text search examples, updated sample items #117, #118 by @Mini256
Refined hybrid search example #121 by @Mini256
Refined RAG example #110 by @Mini256

New Contributors

@yahonda made their first contribution in #102

Full Changelog: v0.0.7...v0.0.8

Contributors

yahonda, Mini256, and Icemap

Assets 2

04 Jun 13:56

Mini256

v0.0.7

293e81e

v0.0.7

What's Changed

feat: add database management API by @Mini256 in #82
feat: support ensure db exists parameter for TiDBClient by @Mini256 in #88
fix: clean up is_serverless logic by @yihong0618 in #83
chore: aiohttp to 3.11.18 to drop the warning by @yihong0618 in #81
refactor: move sql result class by @Mini256 in #86

New Contributors

@yihong0618 made their first contribution in #83

Full Changelog: v0.0.6...v0.0.7

Contributors

Mini256 and yihong0618

Assets 2

Releases: pingcap/pytidb

v0.0.14

🐛 Bug fixes

NULL Vector handling Bug

Bug description

Bug Fix

What is the NULL Vector issue?

📝 Documentation & Examples

New Contributors

Contributors

Uh oh!

v0.0.13

✨ What's New

📝 Documentation & Examples

🧰 Refactor

Contributors

Uh oh!

v0.0.12

✨ What's New

Breaking Changes

Contributors

Uh oh!

v0.0.11

Uh oh!

v0.0.10

✨ What's New

🐛 Bug Fixes

🧰 Refactor

Uh oh!

v0.0.9

✨ What's New

🐛 Bug Fixes

📚 Documentation & Examples

Uh oh!

v0.0.8.post2

What's Changed

Contributors

Uh oh!

v0.0.8.post1

What's Changed

Contributors

Uh oh!

v0.0.8

🆕 What's New

🐛 Bug Fixes

📚 Documentation

New Contributors

Contributors

Uh oh!

v0.0.7

What's Changed

New Contributors

Contributors

Uh oh!