Skip to content

Information conservation vs. contraction in the selective scan — has this been measured? #958

@Prime-007-hash

Description

@Prime-007-hash

Mamba's selective scan is the cleanest path to linear scaling I've seen. The compression isn't arbitrary — it's content-routed. That's the move that makes the architecture actually work at length.

A question I've been carrying for weeks. Is the compression destroying information (Liouville-style — phase-space volume contracts, the trajectory is mathematically irreversible), or is it passing over information (volume preserved, but the readout never learned to surface what's there)? The architectural fixes are different. Capacity scaling helps the second case. Only structural conservation helps the first.

Direct comparison from my own experiments. A vanilla transformer vs. a symplectic-flow variant — layer update is one Störmer-Verlet leapfrog step under a learned Hamiltonian H(q, p, a). Volume preservation by construction. Liouville at depth.

Same data, same optimizer, same seed, 15M params, base-pretrained only, evaluated with chunked attention across 1K → 64K context:

  • vanilla: +29.1% val_ppl drift
  • plasma: +12.6%

Ratio 2.31. Volume preservation appears to translate to long-context stability.

The question for Mamba, plainly. Has anyone measured the information-theoretic side of the selective scan separately from the task side? Is there a regime where Mamba's hidden state is mathematically reversible (information conserved up to readout), or is the scan fundamentally a contraction in state-space — small enough that capacity scaling compensates, but real?

If anyone has run a "is the state recoverable from its successor" test on Mamba checkpoints, I'd love a pointer to what you'd recommend. And if the honest answer is "we haven't checked the conservation side because the accuracy side was the bottleneck" — that itself is an interesting answer.

Repo: https://github.com/Prime-007-hash/phi-plasma-core

Compression is the right answer to "linear scaling." Conservation is a different answer to "what survives the depth."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions