Information conservation vs. contraction in the selective scan — has this been measured?

Mamba's selective scan is the cleanest path to linear scaling I've seen. The compression isn't arbitrary — it's content-routed. That's the move that makes the architecture actually work at length.

A question I've been carrying for weeks. Is the compression *destroying* information (Liouville-style — phase-space volume contracts, the trajectory is mathematically irreversible), or is it *passing over* information (volume preserved, but the readout never learned to surface what's there)? The architectural fixes are different. Capacity scaling helps the second case. Only structural conservation helps the first.

Direct comparison from my own experiments. A vanilla transformer vs. a symplectic-flow variant — layer update is one Störmer-Verlet leapfrog step under a learned Hamiltonian H(q, p, a). Volume preservation by construction. Liouville at depth.

Same data, same optimizer, same seed, 15M params, base-pretrained only, evaluated with chunked attention across 1K → 64K context:

- vanilla: **+29.1%** val_ppl drift
- plasma: **+12.6%**

Ratio 2.31. Volume preservation appears to translate to long-context stability.

The question for Mamba, plainly. Has anyone measured the information-theoretic side of the selective scan separately from the task side? Is there a regime where Mamba's hidden state is mathematically reversible (information conserved up to readout), or is the scan fundamentally a contraction in state-space — small enough that capacity scaling compensates, but real?

If anyone has run a "is the state recoverable from its successor" test on Mamba checkpoints, I'd love a pointer to what you'd recommend. And if the honest answer is "we haven't checked the conservation side because the accuracy side was the bottleneck" — that itself is an interesting answer.

Repo: https://github.com/Prime-007-hash/phi-plasma-core

*Compression is the right answer to "linear scaling." Conservation is a different answer to "what survives the depth."*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information conservation vs. contraction in the selective scan — has this been measured? #958

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Information conservation vs. contraction in the selective scan — has this been measured? #958

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions