A Python SDK for interacting with the Cosdata Vector Database.
pip install cosdata-client
from cosdata import Client # Import the Client class
# Initialize the client (all parameters are optional)
client = Client(
host="http://127.0.0.1:8443", # Default host
username="admin", # Default username
password="admin", # Default password
verify=False # SSL verification
)
# Create a collection
collection = client.create_collection(
name="my_collection",
dimension=768, # Vector dimension
description="My vector collection"
)
# Create an index (all parameters are optional)
index = collection.create_index(
distance_metric="cosine", # Default: cosine
num_layers=10, # Default: 10
max_cache_size=1000, # Default: 1000
ef_construction=128, # Default: 128
ef_search=64, # Default: 64
neighbors_count=32, # Default: 32
level_0_neighbors_count=64 # Default: 64
)
# Generate some vectors (example with random data)
import numpy as np
def generate_random_vector(id: int, dimension: int) -> dict:
values = np.random.uniform(-1, 1, dimension).tolist()
return {
"id": f"vec_{id}",
"dense_values": values,
"document_id": f"doc_{id//10}", # Group vectors into documents
"metadata": { # Optional metadata
"created_at": "2024-03-20",
"category": "example"
}
}
# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]
# Add vectors using a transaction
with collection.transaction() as txn:
# Single vector upsert
txn.upsert_vector(vectors[0])
# Batch upsert for remaining vectors
txn.batch_upsert_vectors(vectors[1:], max_workers=8, max_retries=3)
# Search for similar vectors
results = collection.search.dense(
query_vector=vectors[0]["dense_values"], # Use first vector as query
top_k=5, # Number of nearest neighbors
return_raw_text=True
)
# Fetch a specific vector
vector = collection.vectors.get("vec_1")
# Get collection information
collection_info = collection.get_info()
print(f"Collection info: {collection_info}")
# List all collections
print("Available collections:")
for coll in client.collections():
print(f" - {coll.name}")
# Version management
current_version = collection.versions.get_current()
print(f"Current version: {current_version}")
The main client for interacting with the Vector Database API.
client = Client(
host="http://127.0.0.1:8443", # Optional
username="admin", # Optional
password="admin", # Optional
verify=False # Optional
)
Methods:
create_collection(...) -> Collection
- Returns a
Collection
object. Collection info can be accessed viacollection.get_info()
:{ "name": str, "description": str, "dense_vector": {"enabled": bool, "dimension": int}, "sparse_vector": {"enabled": bool}, "tf_idf_options": {"enabled": bool} }
- Returns a
collections() -> List[Collection]
- Returns a list of
Collection
objects.
- Returns a list of
get_collection(name: str) -> Collection
- Returns a
Collection
object for the given name.
- Returns a
The Collection class provides access to all collection-specific operations.
collection = client.create_collection(
name="my_collection",
dimension=768,
description="My collection"
)
Methods:
create_index(...) -> Index
- Returns an
Index
object. Index info can be fetched (if implemented) as:{ "dense": {...}, "sparse": {...}, "tf-idf": {...} }
- Returns an
create_sparse_index(...) -> Index
create_tf_idf_index(...) -> Index
get_index(name: str) -> Index
get_info() -> dict
- Returns collection metadata as above.
delete() -> None
load() -> None
unload() -> None
transaction() -> Transaction
(context manager)
The Transaction class provides methods for vector operations.
with collection.transaction() as txn:
txn.upsert_vector(vector) # Single vector
txn.batch_upsert_vectors(vectors, max_workers=8, max_retries=3) # Multiple vectors, with parallelism and retries
Methods:
upsert_vector(vector: Dict[str, Any]) -> None
batch_upsert_vectors(vectors: List[Dict[str, Any]], max_workers: Optional[int] = None, max_retries: int = 3) -> None
vectors
: List of vector dictionaries to upsertmax_workers
: Number of threads to use for parallel upserts (default: all available CPU threads)max_retries
: Number of times to retry a failed batch (default: 3)
commit() -> None
abort() -> None
The Search class provides methods for vector similarity search.
results = collection.search.dense(
query_vector=vector,
top_k=5,
return_raw_text=True
)
Methods:
dense(query_vector: List[float], top_k: int = 5, return_raw_text: bool = False) -> dict
- Returns:
{ "results": [ { "id": str, "document_id": str, "score": float, "text": str | None }, ... ] }
- Returns:
sparse(query_terms: List[dict], top_k: int = 5, early_terminate_threshold: float = 0.0, return_raw_text: bool = False) -> dict
- Same structure as above.
text(query_text: str, top_k: int = 5, return_raw_text: bool = False) -> dict
- Same structure as above.
The Vectors class provides methods for vector operations.
vector = collection.vectors.get("vec_1")
exists = collection.vectors.exists("vec_1")
Methods:
get(vector_id: str) -> Vector
- Returns a
Vector
dataclass object with attributes:vector.id: str vector.document_id: Optional[str] vector.dense_values: Optional[List[float]] vector.sparse_indices: Optional[List[int]] vector.sparse_values: Optional[List[float]] vector.text: Optional[str]
- Returns a
get_by_document_id(document_id: str) -> List[Vector]
- Returns a list of
Vector
objects as above.
- Returns a list of
exists(vector_id: str) -> bool
- Returns
True
if the vector exists, elseFalse
.
- Returns
The Versions class provides methods for version management.
current_version = collection.versions.get_current()
all_versions = collection.versions.list()
Methods:
list() -> dict
- Returns:
{ "versions": [ { "hash": str, "version_number": int, "timestamp": int, "vector_count": int }, ... ], "current_hash": str }
- Returns:
get_current() -> Version
- Returns a
Version
dataclass object with attributes:version.hash: str version.version_number: int version.timestamp: int version.vector_count: int version.created_at: datetime # property for creation time
- Returns a
get(version_hash: str) -> Version
- Same as above.
-
Connection Management
- Reuse the client instance across your application
- The client automatically handles authentication and token management
-
Vector Operations
- Use transactions for batch operations
- The context manager (
with
statement) automatically handles commit/abort - Maximum batch size is 200 vectors per transaction
-
Error Handling
- All operations raise exceptions on failure
- Use try/except blocks for error handling
- Transactions automatically abort on exceptions when using the context manager
-
Performance
- Adjust index parameters based on your use case
- Use appropriate vector dimensions
- Consider batch sizes for large operations
-
Version Management
- Create versions before major changes
- Use versions to track collection evolution
- Clean up old versions when no longer needed
This project is licensed under the MIT License - see the LICENSE file for details.