Replies: 3 comments
-
|
LlamaIndex's WeaviateVectorStore does not natively support Weaviate's multi-tenancy feature—there's no tenant or tenant_id parameter exposed in its API or documented usage. Instead, the recommended approach is to store a tenant_id as a metadata property on each object and use metadata filtering to isolate data per tenant. This lets you achieve tenant-like isolation, but it's not the same as Weaviate's built-in multi-tenancy mechanism and doesn't leverage Weaviate's tenant management or isolation guarantees. You can see examples of metadata filtering in the official LlamaIndex docs and notebooks, such as using MetadataFilters to filter queries by a specific key/value (e.g., tenant_id) [example notebook]. There is no mention of native multi-tenancy support in recent code, docs, or PRs as of August 2025 [source]. If you need strict tenant isolation as provided by Weaviate's multi-tenancy, you would need to manage this outside of LlamaIndex for now. To reply, just mention @dosu. Docs are dead. Just use Dosu. |
Beta Was this translation helpful? Give feedback.
-
|
LlamaIndex's Approach 1: Use Weaviate's native multi-tenancy directlyCreate your Weaviate collection with multi-tenancy enabled, then pass a tenant-aware client: import weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
client = weaviate.connect_to_local()
# Create collection with multi-tenancy
collection = client.collections.create(
name="Documents",
multi_tenancy_config=weaviate.classes.config.Configure.multi_tenancy(enabled=True)
)
# Add tenants
collection.tenants.create([
weaviate.classes.tenants.Tenant(name="tenant_A"),
weaviate.classes.tenants.Tenant(name="tenant_B"),
])
# Use tenant-specific collection with LlamaIndex
tenant_collection = client.collections.get("Documents").with_tenant("tenant_A")
vector_store = WeaviateVectorStore(
weaviate_client=client,
index_name="Documents",
)The limitation here is that
Approach 2: Metadata filtering (simpler, works today)Store tenant_id as metadata and filter at query time: from llama_index.core import Document, VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters
# Add documents with tenant metadata
docs = [
Document(text="...", metadata={"tenant_id": "tenant_A"}),
Document(text="...", metadata={"tenant_id": "tenant_B"}),
]
index = VectorStoreIndex.from_documents(docs, vector_store=vector_store)
# Query filtered by tenant
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[MetadataFilter(key="tenant_id", value="tenant_A")]
)
)
response = query_engine.query("your question")Approach 2 is recommended until LlamaIndex adds native multi-tenancy support to the Weaviate integration. It's simpler and works reliably with the current API. The main tradeoff is that all tenant data shares the same vector space, so it's slightly less efficient than Weaviate's native tenant isolation at very large scale. |
Beta Was this translation helpful? Give feedback.
-
|
For a multi-tenant Weaviate setup I would make the tenant id part of the write path and the read path, not just a normal metadata filter. The important distinction is:
If all tenants share one schema/index, every insert should require Something like: def retriever_for_tenant(index, tenant_id: str):
return index.as_retriever(
filters=MetadataFilters(filters=[
ExactMatchFilter(key="tenant_id", value=tenant_id),
])
)I would also add one integration test that inserts nodes for tenant A and tenant B, then asserts tenant A cannot retrieve tenant B even with a very similar query. That test is boring, but it catches the class of bugs that matter most in SaaS RAG systems. If Weaviate native multi-tenancy is enabled in your collection, prefer using that boundary as well. Metadata filters are still useful, but they should not be your only protection against cross-tenant leakage. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Weaviate offers a feature called "multi-tenancy" which allows to have the data of different customers (or tenants) in the same collection and then allows to only query this subset of data in the collection. Is this somehow supported by LlamaIndex or the "WeaviateVectorStore"?
Beta Was this translation helpful? Give feedback.
All reactions