No Escape: Project Context and Motivation

Why This Project Exists

This project is the follow-up to the HIDE paper ("The Geometry of Forgetting: Memory Phenomena as Mathematical Inevitabilities of High-Dimensional Retrieval"), which demonstrated that a single embedding space — a flat array of cosine-similarity vectors — reproduces four hallmark phenomena of human memory: power-law forgetting, DRM false recall, the spacing effect, and tip-of-tongue states.

HIDE showed these phenomena emerge from the geometry of similarity-based retrieval. But it left the strongest question unanswered: is this specific to cosine similarity over embeddings, or is it unavoidable for ANY useful memory system?

This project answers that question definitively: it is unavoidable.

The Central Claim

We prove — formally, with three theorems and a corollary — that any memory system satisfying a minimal definition of "usefulness" (the Semantic Proximity Property: semantically related items must be represented more similarly than unrelated items) must exhibit:

Power-law forgetting driven by interference from competing memories
False recall of semantically related lures
Partial retrieval states (tip-of-tongue phenomena)

These are not engineering failures. They are mathematical consequences of organising information by meaning.

What We Tested

Five architecturally distinct memory systems:

Vector Database (BGE-large, 1024-dim) — cosine similarity retrieval. The calibration baseline against HIDE.
Attention-Based Context Memory (Qwen2.5-7B) — facts placed in an LLM's context window, retrieved via generation.
Filesystem Agent Memory (BM25 + Qwen re-ranking) — JSON records retrieved by keyword search then LLM relevance scoring.
Graph-Based Memory (MiniLM + NetworkX PageRank) — sentence embeddings as nodes, edges weighted by cosine similarity, retrieval via personalised PageRank.
Parametric Memory (Qwen2.5-7B weights) — factual knowledge stored in model weights, probed via direct Q&A without RAG.

What We Found

The no-escape theorem operates at two levels, and the distinction between them is the paper's central contribution:

Level 1: Geometric (Universal)

Every SPP-satisfying system has low effective dimensionality (d_eff = 17.9 even from d_nom = 3,584 for Qwen hidden states), non-negligible spherical cap volumes, and representation-space vulnerability to interference. This is proven mathematically and confirmed in all five architectures. There is literally no escape from this.

Level 2: Behavioural (Architecture-Dependent)

The behavioural manifestation splits into three categories:

Pure geometric systems (Vector DB, Graph): Express the vulnerability directly as smooth power-law forgetting (b = 0.440, 0.478 — in the human range) and graded false recall.
Systems with reasoning overlays (Attention, Parametric): Can behaviourally override some symptoms — an LLM can parse a word list and correctly reject a semantic lure. But interference manifests differently: phase transitions (perfect → catastrophic at ~100 competitors) and parametric interference (accuracy drops from 1.000 to 0.113 as neighbour density increases).
Systems that abandon SPP (BM25 keyword matching): Achieve complete immunity (b = 0.000, FA = 0.000) at the cost of semantic usefulness (15.5% retrieval agreement with cosine similarity).

The Solution Analysis

We tested four proposed "cures":

Increase dimensionality: Zero-padding doesn't help (d_eff unchanged)
Keyword retrieval: Eliminates false recall but destroys semantic usefulness
Orthogonalisation: Eliminates interference but destroys semantic structure
Compression: Reduces interference but degrades specific-fact retrieval

No solution achieves both immunity and usefulness. The no-escape corollary holds.

Key Numbers (All From Real Experiments)

Architecture	Forgetting b	DRM FA	d_eff
Vector DB	0.440 ± 0.030	0.583	158 (PR) / 10.6 (LB)
Graph	0.478 ± 0.028	0.208	127
Attention	phase transition	0.000 (behavioral)	17.9
Parametric	0.215 (PopQA)	0.000 (behavioral)	17.9
Filesystem	0.000	0.000	158
Human	~0.5	~0.55	100-500

Compute Resources

Single NVIDIA A100-SXM4-80GB GPU
~10 GPU-hours total
All models open-weight (BGE-large, MiniLM, Qwen2.5-7B)
All datasets public (Wikipedia, DRM word lists, PopQA)

Implications

For AI system designers: every RAG system, every agent memory store, every knowledge graph that organises by semantic similarity is subject to the no-escape theorem. The engineering question is not "how do we prevent interference?" but "how do we manage a system that will inevitably interfere?"

For cognitive science: the "flaws" of human memory — forgetting, false recall, tip-of-tongue — are not errors. They are the system working correctly under the constraints of meaning. Any system that organises by similarity must exhibit them.

Authors

Ashwin Gopinath (Sentra.app / MIT)

Computational experiments and manuscript preparation were assisted by Claude (Anthropic).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Escape: Project Context and Motivation

Why This Project Exists

The Central Claim

What We Tested

What We Found

Level 1: Geometric (Universal)

Level 2: Behavioural (Architecture-Dependent)

The Solution Analysis

Key Numbers (All From Real Experiments)

Compute Resources

Implications

Authors

FilesExpand file tree

PROJECT_CONTEXT.md

Latest commit

History

PROJECT_CONTEXT.md

File metadata and controls

No Escape: Project Context and Motivation

Why This Project Exists

The Central Claim

What We Tested

What We Found

Level 1: Geometric (Universal)

Level 2: Behavioural (Architecture-Dependent)

The Solution Analysis

Key Numbers (All From Real Experiments)

Compute Resources

Implications

Authors