Skip to content

Releases: pingcap/pytidb

v0.0.14

04 Feb 02:44
5b170c3

Choose a tag to compare

🐛 Bug fixes

NULL Vector handling Bug

  • fix: refactor NULL vector handling to avoid Vector Index invalidation by @Mini256 in #257

Bug description

In PyTiDB 0.0.13, to address the NULL Vector issue, the client automatically appends a clause like HAVING embedding IS NOT NULL to filter out NULL vectors. However, this prevents vector search queries from using the Vector Index.

Bug Fix

PyTiDB 0.0.14 introduces the following changes:

  1. NULL vector filtering is disabled by default

  2. A .skip_null_vectors(True) option is provided, allowing developers to control whether NULL vectors should be filtered

  3. To avoid filters causing vector indexes to become ineffective, PyTiDB now uses post-filtering mode by default for vector search:

    • The ANN query is executed in the inner subquery
    • Filtering is applied in the outer query

    In PyTiDB 0.0.13, the NULL vector filtering condition was placed in the inner query, which caused the Vector Index to be bypassed. In PyTiDB 0.0.14, the filtering is moved to the outer query.

What is the NULL Vector issue?

In real-world RAG application development, the vector column is often populated asynchronously after the database record is created during the embedding process. Before the embedding is completed, the vector column is filled with NULL.

Since ANN queries are typically executed with ORDER BY … ASC, and in MySQL semantics NULL values are sorted before all non-NULL values, the presence of a large number of NULL vectors can severely degrade vector search results.

📝 Documentation & Examples

New Contributors

Full Changelog: v0.0.13...v0.0.14

v0.0.13

29 Aug 03:10
686ea92

Choose a tag to compare

✨ What's New

  • Make pytidb compatible with TiDB v8.5 #171 by @Mini256

  • EmbeddingFunction support dimensions config for server-side embedding #184 by @Mini256

    embed_fn = EmbeddingFunction(model_name="text-embedding-3-small", dimensions=1024)

    You can use this parameter to reduce the dimensionality of the vectors generated by the embedding model, which can reduce the storage consumption and improve the query efficiency of the vector search to some extent.

  • table.search() API and support returning Relationship Field #180 by @Mini256

    For example:

    class Entity(TableModel):
        __tablename__ = "entities"
        id: int = Field(primary_key=True)
        name: str = Field()
    
     entity_table = db.create_table(schema=Entity, if_exists="skip")
    
      class Relation(TableModel):
          __tablename__ = "relations"
          id: int = Field(primary_key=True)
          description: str = Field()
          source_entity_id: int = Field(foreign_key="entities.id")
          target_entity_id: int = Field(foreign_key="entities.id")
          embedding: list[float] = text_embed.VectorField(source_field="description")
          source_entity: Entity = Relationship(
              sa_relationship_kwargs={ "primaryjoin": "Relation.source_entity_id == Entity.id", "lazy": "joined" },
          )
          target_entity: Entity = Relationship(
              sa_relationship_kwargs={ "primaryjoin": "Relation.target_entity_id == Entity.id", "lazy": "joined" },
          )
    
    relation_table = db.create_table(schema=Relation, if_exists="skip")

    Now, relation_table.search("xxxx").limit(1).to_pydantic(); will return the source_entity, and target_entity field, it will help you build GraphRAG application with fewer lines of code.

📝 Documentation & Examples

🧰 Refactor

  • Move table creation logic out of Table constructor #169 by @Mini256
    Init table = Table(schema=TableModel) will no create the table in database, using table.create() or db.create_table(schema=TableModel) instead.

🔗 Full Changelog: v0.0.12...v0.0.13

v0.0.12

08 Aug 05:22
d4fc552

Choose a tag to compare

✨ What's New

  • feat: support server side auto embedding by @Mini256 in #159

Breaking Changes

In the new version, EmbeddingFunction for text will use server-side embedding by default, which no longer sends the request on the client side, but leaves it to the database side for automatic embedding.

You need to configure the API Key using tidb_client.configure_embedding_provider() or use the built-in Embedding model on TiDB Cloud.

If you want to fall back to client-side embedding, pass use_server=False when initializing the EmbeddingFunction.

from app.db import tidb_client
from pytidb.embeddings import EmbeddingFunction
from pytidb.schema import TableModel, Field

# Set API key globally
tidb_client.configure_embedding_provider("openai", os.getenv("OPENAI_API_KEY"))

# Define table schema with auto embedding config.
class Chunk(TableModel):
    id: int = Field(primary_key=True)
    text: str = Field()
    text_vec: Optional[list[float]] = EmbeddingFunction(
        "openai/text-embedding-3-small"
    ).VectorField(source_field="text")

# Create table
tbl = tidb_client.create_table(schema=Chunk, if_exists="overwrite")

# Insert data
tbl.insert(Chunk(id=1, text="foo"))

# Search
results = tbl.search("bar").limit(1).to_pydantic(with_score=True)

Full Changelog: v0.0.11...v0.0.12

v0.0.11

05 Aug 06:40
4f6c1c2

Choose a tag to compare

A patch version for v0.0.10

Full Changelog: v0.0.9...v0.0.11

v0.0.10

05 Aug 02:04
0849046

Choose a tag to compare

Warning: Please using v0.0.10.post1

✨ What's New

  • table.create() API uses if_exists instead of mode #152 by @breezewish
    • mode="overwrite" -> if_exists="overwrite"
    • mode="create" -> if_exists="raise"
    • mode="exist_ok" -> if_exists="skip"
  • Add use_database() and current_database() APIs, and enable concurrent test execution #154 by @breezewish
  • Support table.save() API for convenient data persistence #147 by @Icemap
  • Rename table_names / database_names to list_tables / list_databases #151 by @breezewish
  • Expose build_tidb_connection_url() function #151 by @breezewish
  • Refine exposed column types, use capitalization for type name #161 by @Mini256
    • using TEXT instead of Text
    • using VECTOR instead of Vector

🐛 Bug Fixes

  • Fix automatic image embedding for Amazon Bedrock #148 by @Icemap

🧰 Refactor


🔗 Full Changelog: v0.0.9...v0.0.10

v0.0.9

17 Jul 02:27
828865c

Choose a tag to compare

✨ What's New

  • Enable automatic embedding for images (Beta) #137 by @Mini256
  • Set connection pool recycle time to 300s by default for serverless mode #131 by @Icemap
  • Support insert and bulk_insert with Python dict input #142 by @Icemap

🐛 Bug Fixes

📚 Documentation & Examples


🔗 Full Changelog: v0.0.8.post2...v0.0.9

v0.0.8.post2

01 Jul 18:04
716c78c

Choose a tag to compare

What's Changed

  • fix: make check_vector_column to use the proper method for identifying vector columns by @Icemap in #126

Full Changelog: v0.0.8.post1...v0.0.8.post2

v0.0.8.post1

01 Jul 18:04
3f42bcf

Choose a tag to compare

What's Changed

  • fix: make metadata filter works well for table.search() by @Mini256 in #128

Full Changelog: v0.0.8...v0.0.8.post1

v0.0.8

01 Jul 10:00
eb6e941

Choose a tag to compare

🆕 What's New

  • Declarative Indexes

    • Added declarative API for vector and full-text indexes #109 by @Mini256
  • Hybrid Search Enhancements

    • Added support for weighted fusion in table.search(search_type='hybrid') #105 by @Icemap
    • Added prefilter support for table.search() #119 by @Icemap
  • Query Features

    • table.query() now supports pagination and sorting #91 by @Mini256
    • table.query() now supports SQL-style filters #91 by @Mini256
  • Table Management

    • client.create_table() now supports a create mode parameter (create / exist_ok / overwrite) #90 by @Mini256
    • TableModel uses table=True by default #94 by @Mini256
    • Added support to drop tables via SQLAlchemy’s drop_table() API #122 by @Mini256

🐛 Bug Fixes

  • Fixed label got an empty value error #100 by @yahonda
  • Fixed pull model manifest: file does not exist error in docs #102 by @yahonda
  • Fixed default logic for choosing text column in search #120 by @Mini256
  • Added missing pytidb[models] dependency to in-memory example #101 by @yahonda

📚 Documentation

New Contributors

Full Changelog: v0.0.7...v0.0.8

v0.0.7

04 Jun 13:56
293e81e

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.6...v0.0.7