Important
This repo is experimental. Use it as an example to implement your own solutions, or clone and install it as a local dependency.
/ˈkaɪ ˈdʒiː/ – Kai rhymes with sky, and G like the letter G.
Hi! Let me handle your DB needs for your AI project. If you need vector search, or graph queries, I've got you covered. I use SurrealDB under the hood, which is a multi-model DB that greatly simplifies your architecture.
# Set up your vector indexes and graph relations
db = DB(
"ws://localhost:8000/rpc",
username,
password,
ns,
db,
Embedder("all-minilm:22m", "F32")
LLM(),
vector_tables=[
VectorTableDefinition("document", "HNSW", "COSINE"),
VectorTableDefinition("keyword", "HNSW", "COSINE"),
VectorTableDefinition("category", "HNSW", "COSINE"),
],
graph_relations=[
Relation("has_keyword", "document", "keyword"),
Relation("in_category", "document", "category"),
Relation("stored_in", "document|container", "container"),
],
)
db.init_db()This will generate a schema similar to this (which you can see in the Designer tab of Surrealist):
This sample code inserts documents in the vector store, and creates a graph with documents related to keywords.
keywords: set[str] = set()
doc_to_keywords: dict[str, set[str]] = {}
for doc in documents:
# This function generated the embeddings for the document
db.embed_and_insert(doc)
# Collect keywords
keywords.update(doc.keywords)
# Link documents with keywords
if doc.id not in doc_to_keywords:
doc_to_keywords[doc.id] = set()
for keyword in doc.keywords:
doc_to_keywords[doc.id].add(keyword)
# This function generates embeddings for the keywords (destination nodes)
db.add_graph_nodes_with_embeddings(
src_table: "document",
dest_table: "keyword",
destinations: keywords,
edge_name: "has_keyword",
relations: doc_to_keywords
)res, time = db.vector_search_from_text(
Document, # results are validated-against- and cast-to- this type
"Dalinar Kholin",
table="document",
k=5,
score_threshold=0.5,
effort=40,
)
for x, score in res:
print(f"• {score:.0%}: {x.content}")
print(f"Query took {time}ms")| Setup functions | Description |
|---|---|
| init_db | initialize DB schema/indexes (vector tables, graph relations, analytics/docs tables) |
| clear | drop tables/indexes created/used by this instance |
| original_docs_table | name of the original documents table |
| async_conn | get an authenticated async connection (lazy) |
| sync_conn | get an authenticated sync connection (lazy) |
| Data functions | Description |
|---|---|
| execute | run a SurrealQL query loaded from a .surql file (sync) |
| async_execute | run a SurrealQL query loaded from a .surql file (async) |
| query | query a list of records and validate them as the expected type |
| query_one | query a single record and validate it as the expected type |
| count | count how many records match a query (optionally grouped) |
| exists | check if a record exists by record id |
| insert_analytics_data | insert a record in the analytics table |
| safe_insert_error | insert a record in the errors table (async, best-effort) |
| error_exists | check if an error record exists for a given id (async) |
| store_original_document | store an original file (as bytes) and dedupe by hash |
| store_original_document_from_bytes | store an original file from bytes and dedupe by hash |
| get_document | get a document/chunk by id (async) |
| list_documents | list documents/chunks with pagination (async) |
| async_insert_document | insert a document/chunk asynchronously |
| insert_document | insert a document/chunk synchronously |
| embed_and_insert | generate an embedding (if needed) and insert the document/chunk |
| vector_search_from_text | embed query text and run a vector search |
| vector_search | run a vector search with a provided embedding |
| async_vector_search | run a vector search with a provided embedding (async) |
| relate | create graph edges between records |
| add_graph_nodes | upsert destination nodes and relate them |
| add_graph_nodes_with_embeddings | embed + upsert destination nodes and relate them |
| recursive_graph_query | fetch children recursively up to N levels |
| graph_query_inward | fetch parent nodes (optionally using an embedding for ranking) |
| graph_siblings | fetch nodes that share the same parent |
| Function | Description |
|---|---|
| gen_name_from_desc | generate a name from a description |
| gen_answer | generate an answer from a question and a context |
| infer_attributes | use a pydantic BaseModel to have the LLM infer the attributes |
- Take a look at the packages folder.
- Get familiar with SurrealQL:
Using Surrealist
Example query from all documents connected by any edge (?) to any other nodes (?):
SELECT *, ->?->? FROM document;

