Skip to content

Latest commit

 

History

History
82 lines (54 loc) · 5.13 KB

File metadata and controls

82 lines (54 loc) · 5.13 KB

No Escape: Project Context and Motivation

Why This Project Exists

This project is the follow-up to the HIDE paper ("The Geometry of Forgetting: Memory Phenomena as Mathematical Inevitabilities of High-Dimensional Retrieval"), which demonstrated that a single embedding space — a flat array of cosine-similarity vectors — reproduces four hallmark phenomena of human memory: power-law forgetting, DRM false recall, the spacing effect, and tip-of-tongue states.

HIDE showed these phenomena emerge from the geometry of similarity-based retrieval. But it left the strongest question unanswered: is this specific to cosine similarity over embeddings, or is it unavoidable for ANY useful memory system?

This project answers that question definitively: it is unavoidable.

The Central Claim

We prove — formally, with three theorems and a corollary — that any memory system satisfying a minimal definition of "usefulness" (the Semantic Proximity Property: semantically related items must be represented more similarly than unrelated items) must exhibit:

  1. Power-law forgetting driven by interference from competing memories
  2. False recall of semantically related lures
  3. Partial retrieval states (tip-of-tongue phenomena)

These are not engineering failures. They are mathematical consequences of organising information by meaning.

What We Tested

Five architecturally distinct memory systems:

  1. Vector Database (BGE-large, 1024-dim) — cosine similarity retrieval. The calibration baseline against HIDE.
  2. Attention-Based Context Memory (Qwen2.5-7B) — facts placed in an LLM's context window, retrieved via generation.
  3. Filesystem Agent Memory (BM25 + Qwen re-ranking) — JSON records retrieved by keyword search then LLM relevance scoring.
  4. Graph-Based Memory (MiniLM + NetworkX PageRank) — sentence embeddings as nodes, edges weighted by cosine similarity, retrieval via personalised PageRank.
  5. Parametric Memory (Qwen2.5-7B weights) — factual knowledge stored in model weights, probed via direct Q&A without RAG.

What We Found

The no-escape theorem operates at two levels, and the distinction between them is the paper's central contribution:

Level 1: Geometric (Universal)

Every SPP-satisfying system has low effective dimensionality (d_eff = 17.9 even from d_nom = 3,584 for Qwen hidden states), non-negligible spherical cap volumes, and representation-space vulnerability to interference. This is proven mathematically and confirmed in all five architectures. There is literally no escape from this.

Level 2: Behavioural (Architecture-Dependent)

The behavioural manifestation splits into three categories:

  • Pure geometric systems (Vector DB, Graph): Express the vulnerability directly as smooth power-law forgetting (b = 0.440, 0.478 — in the human range) and graded false recall.
  • Systems with reasoning overlays (Attention, Parametric): Can behaviourally override some symptoms — an LLM can parse a word list and correctly reject a semantic lure. But interference manifests differently: phase transitions (perfect → catastrophic at ~100 competitors) and parametric interference (accuracy drops from 1.000 to 0.113 as neighbour density increases).
  • Systems that abandon SPP (BM25 keyword matching): Achieve complete immunity (b = 0.000, FA = 0.000) at the cost of semantic usefulness (15.5% retrieval agreement with cosine similarity).

The Solution Analysis

We tested four proposed "cures":

  1. Increase dimensionality: Zero-padding doesn't help (d_eff unchanged)
  2. Keyword retrieval: Eliminates false recall but destroys semantic usefulness
  3. Orthogonalisation: Eliminates interference but destroys semantic structure
  4. Compression: Reduces interference but degrades specific-fact retrieval

No solution achieves both immunity and usefulness. The no-escape corollary holds.

Key Numbers (All From Real Experiments)

Architecture Forgetting b DRM FA d_eff
Vector DB 0.440 ± 0.030 0.583 158 (PR) / 10.6 (LB)
Graph 0.478 ± 0.028 0.208 127
Attention phase transition 0.000 (behavioral) 17.9
Parametric 0.215 (PopQA) 0.000 (behavioral) 17.9
Filesystem 0.000 0.000 158
Human ~0.5 ~0.55 100-500

Compute Resources

  • Single NVIDIA A100-SXM4-80GB GPU
  • ~10 GPU-hours total
  • All models open-weight (BGE-large, MiniLM, Qwen2.5-7B)
  • All datasets public (Wikipedia, DRM word lists, PopQA)

Implications

For AI system designers: every RAG system, every agent memory store, every knowledge graph that organises by semantic similarity is subject to the no-escape theorem. The engineering question is not "how do we prevent interference?" but "how do we manage a system that will inevitably interfere?"

For cognitive science: the "flaws" of human memory — forgetting, false recall, tip-of-tongue — are not errors. They are the system working correctly under the constraints of meaning. Any system that organises by similarity must exhibit them.

Authors

Ashwin Gopinath (Sentra.app / MIT)

Computational experiments and manuscript preparation were assisted by Claude (Anthropic).