This project is the follow-up to the HIDE paper ("The Geometry of Forgetting: Memory Phenomena as Mathematical Inevitabilities of High-Dimensional Retrieval"), which demonstrated that a single embedding space — a flat array of cosine-similarity vectors — reproduces four hallmark phenomena of human memory: power-law forgetting, DRM false recall, the spacing effect, and tip-of-tongue states.
HIDE showed these phenomena emerge from the geometry of similarity-based retrieval. But it left the strongest question unanswered: is this specific to cosine similarity over embeddings, or is it unavoidable for ANY useful memory system?
This project answers that question definitively: it is unavoidable.
We prove — formally, with three theorems and a corollary — that any memory system satisfying a minimal definition of "usefulness" (the Semantic Proximity Property: semantically related items must be represented more similarly than unrelated items) must exhibit:
- Power-law forgetting driven by interference from competing memories
- False recall of semantically related lures
- Partial retrieval states (tip-of-tongue phenomena)
These are not engineering failures. They are mathematical consequences of organising information by meaning.
Five architecturally distinct memory systems:
- Vector Database (BGE-large, 1024-dim) — cosine similarity retrieval. The calibration baseline against HIDE.
- Attention-Based Context Memory (Qwen2.5-7B) — facts placed in an LLM's context window, retrieved via generation.
- Filesystem Agent Memory (BM25 + Qwen re-ranking) — JSON records retrieved by keyword search then LLM relevance scoring.
- Graph-Based Memory (MiniLM + NetworkX PageRank) — sentence embeddings as nodes, edges weighted by cosine similarity, retrieval via personalised PageRank.
- Parametric Memory (Qwen2.5-7B weights) — factual knowledge stored in model weights, probed via direct Q&A without RAG.
The no-escape theorem operates at two levels, and the distinction between them is the paper's central contribution:
Every SPP-satisfying system has low effective dimensionality (d_eff = 17.9 even from d_nom = 3,584 for Qwen hidden states), non-negligible spherical cap volumes, and representation-space vulnerability to interference. This is proven mathematically and confirmed in all five architectures. There is literally no escape from this.
The behavioural manifestation splits into three categories:
- Pure geometric systems (Vector DB, Graph): Express the vulnerability directly as smooth power-law forgetting (b = 0.440, 0.478 — in the human range) and graded false recall.
- Systems with reasoning overlays (Attention, Parametric): Can behaviourally override some symptoms — an LLM can parse a word list and correctly reject a semantic lure. But interference manifests differently: phase transitions (perfect → catastrophic at ~100 competitors) and parametric interference (accuracy drops from 1.000 to 0.113 as neighbour density increases).
- Systems that abandon SPP (BM25 keyword matching): Achieve complete immunity (b = 0.000, FA = 0.000) at the cost of semantic usefulness (15.5% retrieval agreement with cosine similarity).
We tested four proposed "cures":
- Increase dimensionality: Zero-padding doesn't help (d_eff unchanged)
- Keyword retrieval: Eliminates false recall but destroys semantic usefulness
- Orthogonalisation: Eliminates interference but destroys semantic structure
- Compression: Reduces interference but degrades specific-fact retrieval
No solution achieves both immunity and usefulness. The no-escape corollary holds.
| Architecture | Forgetting b | DRM FA | d_eff |
|---|---|---|---|
| Vector DB | 0.440 ± 0.030 | 0.583 | 158 (PR) / 10.6 (LB) |
| Graph | 0.478 ± 0.028 | 0.208 | 127 |
| Attention | phase transition | 0.000 (behavioral) | 17.9 |
| Parametric | 0.215 (PopQA) | 0.000 (behavioral) | 17.9 |
| Filesystem | 0.000 | 0.000 | 158 |
| Human | ~0.5 | ~0.55 | 100-500 |
- Single NVIDIA A100-SXM4-80GB GPU
- ~10 GPU-hours total
- All models open-weight (BGE-large, MiniLM, Qwen2.5-7B)
- All datasets public (Wikipedia, DRM word lists, PopQA)
For AI system designers: every RAG system, every agent memory store, every knowledge graph that organises by semantic similarity is subject to the no-escape theorem. The engineering question is not "how do we prevent interference?" but "how do we manage a system that will inevitably interfere?"
For cognitive science: the "flaws" of human memory — forgetting, false recall, tip-of-tongue — are not errors. They are the system working correctly under the constraints of meaning. Any system that organises by similarity must exhibit them.
Ashwin Gopinath (Sentra.app / MIT)
Computational experiments and manuscript preparation were assisted by Claude (Anthropic).