Summary
When multiple coroutines call AsyncMemory.add(), AsyncMemory.update(), or AsyncMemory.delete() concurrently (e.g. an async agent processing events in parallel), the Qdrant HNSW index becomes corrupted, producing:
IndexError: index N is out of bounds for axis 0 with size M
Subsequent reads (search(), get()) either return wrong results or raise the same error until the process is restarted and the index is rebuilt.
Root cause
AsyncMemory's write methods dispatch the synchronous vector_store.insert/update/delete calls via asyncio.to_thread(). When multiple callers hit these paths concurrently the thread-pool runs several write tasks in parallel.
Qdrant's upsert (and the underlying HNSW graph builder) is not re-entrant — it can return before the internal graph construction is complete. A second writer arriving while the first graph update is still in-flight reads a partially-built HNSW structure, which triggers an out-of-bounds index access.
Reproduction
import asyncio
from mem0 import AsyncMemory
m = AsyncMemory() # default Qdrant embedded
async def writer(i):
await m.add(f"fact {i}", user_id="test")
async def main():
# 20 concurrent writers → reproducible corruption
await asyncio.gather(*[writer(i) for i in range(20)])
results = await m.search("fact", filters={"user_id": "test"})
print(results)
asyncio.run(main())
Fix
Add self._write_lock = asyncio.Lock() in AsyncMemory.__init__ and acquire it around the three vector-store write sites:
- Phase 6 batch insert in
_add_to_vector_store
vector_store.update in _update_memory
vector_store.delete in _delete_memory
Reads (search, get, get_all), embeddings, and LLM calls are unaffected — only the actual write to the vector store is serialised.
A PR with this fix is attached.
Discovery
This bug was identified and the fix developed collaboratively by @MattGyver and Cipher (an AI agent running on top of this stack). The corruption was observed in sustained production workloads with 20+ simultaneous add() calls over several months before the root cause was pinpointed.
Summary
When multiple coroutines call
AsyncMemory.add(),AsyncMemory.update(), orAsyncMemory.delete()concurrently (e.g. an async agent processing events in parallel), the Qdrant HNSW index becomes corrupted, producing:Subsequent reads (
search(),get()) either return wrong results or raise the same error until the process is restarted and the index is rebuilt.Root cause
AsyncMemory's write methods dispatch the synchronousvector_store.insert/update/deletecalls viaasyncio.to_thread(). When multiple callers hit these paths concurrently the thread-pool runs several write tasks in parallel.Qdrant's
upsert(and the underlying HNSW graph builder) is not re-entrant — it can return before the internal graph construction is complete. A second writer arriving while the first graph update is still in-flight reads a partially-built HNSW structure, which triggers an out-of-bounds index access.Reproduction
Fix
Add
self._write_lock = asyncio.Lock()inAsyncMemory.__init__and acquire it around the three vector-store write sites:_add_to_vector_storevector_store.updatein_update_memoryvector_store.deletein_delete_memoryReads (
search,get,get_all), embeddings, and LLM calls are unaffected — only the actual write to the vector store is serialised.A PR with this fix is attached.
Discovery
This bug was identified and the fix developed collaboratively by @MattGyver and Cipher (an AI agent running on top of this stack). The corruption was observed in sustained production workloads with 20+ simultaneous
add()calls over several months before the root cause was pinpointed.