How to use Weaviate multi-tenancy with LlamaIndex? #19607

torbenw · 2025-08-07T12:15:15Z

torbenw
Aug 7, 2025

Weaviate offers a feature called "multi-tenancy" which allows to have the data of different customers (or tenants) in the same collection and then allows to only query this subset of data in the collection. Is this somehow supported by LlamaIndex or the "WeaviateVectorStore"?

2025-08-07T12:21:12Z

dosubot[bot]
Bot Aug 7, 2025

LlamaIndex's WeaviateVectorStore does not natively support Weaviate's multi-tenancy feature—there's no tenant or tenant_id parameter exposed in its API or documented usage. Instead, the recommended approach is to store a tenant_id as a metadata property on each object and use metadata filtering to isolate data per tenant. This lets you achieve tenant-like isolation, but it's not the same as Weaviate's built-in multi-tenancy mechanism and doesn't leverage Weaviate's tenant management or isolation guarantees.

You can see examples of metadata filtering in the official LlamaIndex docs and notebooks, such as using MetadataFilters to filter queries by a specific key/value (e.g., tenant_id) [example notebook]. There is no mention of native multi-tenancy support in recent code, docs, or PRs as of August 2025 [source].

If you need strict tenant isolation as provided by Weaviate's multi-tenancy, you would need to manage this outside of LlamaIndex for now.

To reply, just mention @dosu.

Docs are dead. Just use Dosu.

0 replies

omni-front · 2026-03-09T00:23:55Z

omni-front
Mar 9, 2026

LlamaIndex's WeaviateVectorStore doesn't have built-in multi-tenancy support, but there are two practical approaches:

Approach 1: Use Weaviate's native multi-tenancy directly

Create your Weaviate collection with multi-tenancy enabled, then pass a tenant-aware client:

import weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore

client = weaviate.connect_to_local()

# Create collection with multi-tenancy
collection = client.collections.create(
    name="Documents",
    multi_tenancy_config=weaviate.classes.config.Configure.multi_tenancy(enabled=True)
)

# Add tenants
collection.tenants.create([
    weaviate.classes.tenants.Tenant(name="tenant_A"),
    weaviate.classes.tenants.Tenant(name="tenant_B"),
])

# Use tenant-specific collection with LlamaIndex
tenant_collection = client.collections.get("Documents").with_tenant("tenant_A")

vector_store = WeaviateVectorStore(
    weaviate_client=client,
    index_name="Documents",
)

The limitation here is that WeaviateVectorStore doesn't pass the tenant parameter through its queries. You'd need to either:

Subclass WeaviateVectorStore and override the query/add methods to include the tenant
Or use approach 2

Approach 2: Metadata filtering (simpler, works today)

Store tenant_id as metadata and filter at query time:

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters

# Add documents with tenant metadata
docs = [
    Document(text="...", metadata={"tenant_id": "tenant_A"}),
    Document(text="...", metadata={"tenant_id": "tenant_B"}),
]

index = VectorStoreIndex.from_documents(docs, vector_store=vector_store)

# Query filtered by tenant
query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[MetadataFilter(key="tenant_id", value="tenant_A")]
    )
)
response = query_engine.query("your question")

Approach 2 is recommended until LlamaIndex adds native multi-tenancy support to the Weaviate integration. It's simpler and works reliably with the current API. The main tradeoff is that all tenant data shares the same vector space, so it's slightly less efficient than Weaviate's native tenant isolation at very large scale.

0 replies

yudin-s · 2026-05-14T08:28:43Z

yudin-s
May 14, 2026

For a multi-tenant Weaviate setup I would make the tenant id part of the write path and the read path, not just a normal metadata filter.

The important distinction is:

metadata filter: useful for business filtering
tenant boundary: should be treated as an isolation boundary

If all tenants share one schema/index, every insert should require tenant_id, and every retriever should be created from a tenant-scoped factory so it is impossible to accidentally query without that scope.

Something like:

def retriever_for_tenant(index, tenant_id: str):
    return index.as_retriever(
        filters=MetadataFilters(filters=[
            ExactMatchFilter(key="tenant_id", value=tenant_id),
        ])
    )

I would also add one integration test that inserts nodes for tenant A and tenant B, then asserts tenant A cannot retrieve tenant B even with a very similar query. That test is boring, but it catches the class of bugs that matter most in SaaS RAG systems.

If Weaviate native multi-tenancy is enabled in your collection, prefer using that boundary as well. Metadata filters are still useful, but they should not be your only protection against cross-tenant leakage.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use Weaviate multi-tenancy with LlamaIndex? #19607

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use Weaviate multi-tenancy with LlamaIndex? #19607

Uh oh!

torbenw Aug 7, 2025

Replies: 3 comments

Uh oh!

Uh oh!

dosubot[bot] Bot Aug 7, 2025

Uh oh!

omni-front Mar 9, 2026

Approach 1: Use Weaviate's native multi-tenancy directly

Approach 2: Metadata filtering (simpler, works today)

Uh oh!

yudin-s May 14, 2026

torbenw
Aug 7, 2025

dosubot[bot]
Bot Aug 7, 2025

omni-front
Mar 9, 2026

yudin-s
May 14, 2026