Summary
Two distinct shapes of on-disk corruption in persisted HNSW vector segments cause chromadb_rust_bindings to crash the whole process with SIGSEGV (KERN_INVALID_ADDRESS at 0x84, exit 139) instead of raising a Python exception. The crash fires on the first collection.count() (and any other call that loads the vector segment), in the Rust segment-loader worker threads. PersistentClient(...) and list_collections() succeed, so the failure is undiagnosable from Python without a crash reporter / PYTHONFAULTHANDLER.
Related (same "segfault instead of error" class, different corruption shapes): #6949, #7069, #6984.
Environment
- ChromaDB: 1.5.8 (
chromadb_rust_bindings.abi3.so)
- Python: 3.14.5 (Homebrew, pipx venv)
- OS: macOS 26.3.1 ARM64 (Apple Silicon, Mac16,5)
- Embedding function:
DefaultEmbeddingFunction (all-MiniLM-L6-v2, 384 dims)
PRAGMA integrity_check on chroma.sqlite3: ok (sqlite metadata fully intact in both cases)
Corruption shape 1 — truncated segment (interrupted flush)
Vector segment directory left in a partially-flushed state, most likely by the writing process being killed mid-flush:
data_level0.bin 167600 bytes (~100 vectors present)
header.bin 100 bytes
length.bin 400 bytes
link_lists.bin 0 bytes <-- empty
index_metadata.pickle <-- MISSING entirely
The segment also had no row in max_seq_id, so its collection's WAL entries (627) were never purged. Loading this segment → SIGSEGV.
Corruption shape 2 — corrupt graph in a fully-present segment
A 392 MB segment with all five files present and plausible sizes (data_level0.bin 385 MB, index_metadata.pickle 22 MB, length.bin 920 KB ≈ 230K vectors, link_lists.bin 1.9 MB). Loading it crashes with a near-NULL byte read at offset 0x84:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000084
far: 0x0000000000000084 esr: 0x92000006 (Data Abort) byte read Translation fault
Thread 23 Crashed:
0 chromadb_rust_bindings.abi3.so 0x11a908000 + 25958564
1 chromadb_rust_bindings.abi3.so 0x11a908000 + 25943260
2 chromadb_rust_bindings.abi3.so 0x11a908000 + 25931232
3 chromadb_rust_bindings.abi3.so 0x11a908000 + 9018588
4 chromadb_rust_bindings.abi3.so 0x11a908000 + 10211600
...
15 chromadb_rust_bindings.abi3.so 0x11a908000 + 24364112
16 libsystem_pthread.dylib _pthread_start + 136
Several sibling worker threads were in the same loader code path concurrently (parallel segment load); two of them show the same faulting frames. Binary UUID: 0140fcd8-8f81-3a8b-80fa-0fec930e00d6.
Python-side stack at crash (faulthandler):
File "chromadb/api/rust.py", line 397 in _count
File "chromadb/api/models/Collection.py", line 55 in count
Reproduction
Shape 1 reproduces by simulating an interrupted flush on any persisted collection:
# after creating/persisting a collection with a few hundred docs:
import os
seg = "/path/to/persist_dir/<vector-segment-uuid>"
open(os.path.join(seg, "link_lists.bin"), "w").close() # truncate to 0
os.remove(os.path.join(seg, "index_metadata.pickle"))
import chromadb
col = chromadb.PersistentClient(path="/path/to/persist_dir").get_collection("my_collection")
col.count() # SIGSEGV, exit 139 — no Python exception
Expected behavior
Malformed/truncated HNSW segment files should produce a catchable Python exception (e.g. InternalError: vector segment <uuid> failed to load: link_lists.bin truncated), ideally with guidance to rebuild. A validation pass over the five segment files (sizes/consistency against header.bin + pickle) before the graph walk would catch both shapes cheaply.
Workaround that recovered both collections (no data loss)
- Shape 1: quarantine (move out) the corrupt segment dir → loader starts fresh and replays the un-purged WAL → collection self-heals.
- Shape 2 (WAL already purged): export ids/documents/metadatas directly from
chroma.sqlite3 (embeddings + embedding_metadata, document under the chroma:document key), quarantine the segment dir, delete_collection + create_collection, re-upsert (re-embed).
Summary
Two distinct shapes of on-disk corruption in persisted HNSW vector segments cause
chromadb_rust_bindingsto crash the whole process with SIGSEGV (KERN_INVALID_ADDRESS at 0x84, exit 139) instead of raising a Python exception. The crash fires on the firstcollection.count()(and any other call that loads the vector segment), in the Rust segment-loader worker threads.PersistentClient(...)andlist_collections()succeed, so the failure is undiagnosable from Python without a crash reporter /PYTHONFAULTHANDLER.Related (same "segfault instead of error" class, different corruption shapes): #6949, #7069, #6984.
Environment
chromadb_rust_bindings.abi3.so)DefaultEmbeddingFunction(all-MiniLM-L6-v2, 384 dims)PRAGMA integrity_checkonchroma.sqlite3: ok (sqlite metadata fully intact in both cases)Corruption shape 1 — truncated segment (interrupted flush)
Vector segment directory left in a partially-flushed state, most likely by the writing process being killed mid-flush:
The segment also had no row in
max_seq_id, so its collection's WAL entries (627) were never purged. Loading this segment → SIGSEGV.Corruption shape 2 — corrupt graph in a fully-present segment
A 392 MB segment with all five files present and plausible sizes (
data_level0.bin385 MB,index_metadata.pickle22 MB,length.bin920 KB ≈ 230K vectors,link_lists.bin1.9 MB). Loading it crashes with a near-NULL byte read at offset0x84:Several sibling worker threads were in the same loader code path concurrently (parallel segment load); two of them show the same faulting frames. Binary UUID:
0140fcd8-8f81-3a8b-80fa-0fec930e00d6.Python-side stack at crash (faulthandler):
Reproduction
Shape 1 reproduces by simulating an interrupted flush on any persisted collection:
Expected behavior
Malformed/truncated HNSW segment files should produce a catchable Python exception (e.g.
InternalError: vector segment <uuid> failed to load: link_lists.bin truncated), ideally with guidance to rebuild. A validation pass over the five segment files (sizes/consistency againstheader.bin+ pickle) before the graph walk would catch both shapes cheaply.Workaround that recovered both collections (no data loss)
chroma.sqlite3(embeddings+embedding_metadata, document under thechroma:documentkey), quarantine the segment dir,delete_collection+create_collection, re-upsert (re-embed).