Skip to content

Bug: concurrent AsyncMemory writes corrupt Qdrant HNSW index ('index N is out of bounds') #4892

@MattGyver

Description

@MattGyver

Summary

When multiple coroutines call AsyncMemory.add(), AsyncMemory.update(), or AsyncMemory.delete() concurrently (e.g. an async agent processing events in parallel), the Qdrant HNSW index becomes corrupted, producing:

IndexError: index N is out of bounds for axis 0 with size M

Subsequent reads (search(), get()) either return wrong results or raise the same error until the process is restarted and the index is rebuilt.

Root cause

AsyncMemory's write methods dispatch the synchronous vector_store.insert/update/delete calls via asyncio.to_thread(). When multiple callers hit these paths concurrently the thread-pool runs several write tasks in parallel.

Qdrant's upsert (and the underlying HNSW graph builder) is not re-entrant — it can return before the internal graph construction is complete. A second writer arriving while the first graph update is still in-flight reads a partially-built HNSW structure, which triggers an out-of-bounds index access.

Reproduction

import asyncio
from mem0 import AsyncMemory

m = AsyncMemory()  # default Qdrant embedded

async def writer(i):
    await m.add(f"fact {i}", user_id="test")

async def main():
    # 20 concurrent writers → reproducible corruption
    await asyncio.gather(*[writer(i) for i in range(20)])
    results = await m.search("fact", filters={"user_id": "test"})
    print(results)

asyncio.run(main())

Fix

Add self._write_lock = asyncio.Lock() in AsyncMemory.__init__ and acquire it around the three vector-store write sites:

  • Phase 6 batch insert in _add_to_vector_store
  • vector_store.update in _update_memory
  • vector_store.delete in _delete_memory

Reads (search, get, get_all), embeddings, and LLM calls are unaffected — only the actual write to the vector store is serialised.

A PR with this fix is attached.

Discovery

This bug was identified and the fix developed collaboratively by @MattGyver and Cipher (an AI agent running on top of this stack). The corruption was observed in sustained production workloads with 20+ simultaneous add() calls over several months before the root cause was pinpointed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions