Skip to content

bassrehab/mhc-visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manifold Dial: Visualizing mHC Stability

Live Demo GitHub stars Open In Colab Python Tests License: MIT

Interactive visualization demonstrating why Manifold-Constrained Hyper-Connections (mHC) stabilize deep neural networks. Check the LIVE Demo Here

Author: Subhadip Mitra (contact@subhadipmitra.com)

Hero Plot

What This Shows

DeepSeek's mHC paper reveals a critical insight:

Method Behavior Composite Gain @ Depth 64
Baseline Identity skip connections 1.0
HC Unconstrained mixing matrices 10^16 (explosion!)
mHC Doubly stochastic matrices ~1.0 (stable!)

The Problem: Hyper-Connections use learnable matrices to mix residual streams. When signals propagate through many layers, these matrices compound their gains exponentially—leading to catastrophic instability.

The Solution: mHC projects mixing matrices onto the Birkhoff polytope (set of doubly stochastic matrices) using Sinkhorn-Knopp. Since doubly stochastic matrices are closed under multiplication, signals stay bounded regardless of depth.

The Manifold Dial: Our visualization lets you sweep Sinkhorn iterations from 0→30 and watch the transition from explosive HC behavior to stable mHC behavior in real-time.

Quick Start

Interactive Demo

cd react-demo
npm install
npm run dev
# Open http://localhost:5173

Python Exploration

cd python
pip install -r requirements.txt
pytest tests/  # Run 32 tests
python generate_plots.py  # Generate figures

Colab Notebook

Open In Colab

Run the full exploration in your browser—no setup required.

Marimo Notebook

pip install marimo
marimo run notebook/mhc_exploration.py

A reactive Python notebook with real-time sliders and instant feedback-no cell re-execution needed.

Key Visualizations

The Manifold Dial

The core interactive element: a slider controlling Sinkhorn iterations (0-30).

  • k=0: No projection → HC-like behavior → explosion
  • k=5: Partial projection → partial stability
  • k=20: Full projection → mHC stability

Composite Gain Plot

Shows forward gain through the composite mapping $H_L \cdot H_{L-1} \cdot ... \cdot H_1$:

  • Green (Baseline): Flat at 1 (identity matrices)
  • Red (HC): Exponential explosion
  • Blue (mHC): Bounded near 1

Matrix Heatmaps

Side-by-side comparison of HC vs mHC residual matrices:

  • Row/column sums displayed
  • Visual confirmation of doubly stochastic property

Interactive Features

Guided Tour

First-time visitors get an automatic walkthrough highlighting key controls and visualizations. Replay anytime via the "Tour" button.

Tabbed Charts

Switch between Gain, Eigenvalues, Uniformity, or view All charts at once. On mobile, charts are shown one at a time for better usability.

Eigenvalue Decay

The eigenvalue plot shows |λ₂| (second-largest eigenvalue magnitude) vs depth. Toggle the theoretical decay overlay (|λ₂|^L) to compare predicted vs actual convergence.

Animation Controls

Click Play to animate Sinkhorn iterations from 0→30 and watch the transition in real-time. Choose from Slow, Normal, or Fast playback speeds.

Spectral Gap Indicator

The metrics panel shows the spectral gap (1 - |λ₂|) with color-coded convergence speed: green (fast), yellow (moderate), red (slow).

Mobile Support

Larger touch targets, collapsible controls, and responsive chart sizing for comfortable use on phones and tablets.

The Mathematics

Why Doubly Stochastic?

A matrix is doubly stochastic if:

  1. All entries are non-negative
  2. All rows sum to 1
  3. All columns sum to 1

Key properties:

  • Spectral norm ≤ 1: Cannot amplify signals
  • Closed under multiplication: Products remain doubly stochastic
  • Birkhoff-von Neumann: Convex combination of permutation matrices

Sinkhorn-Knopp Algorithm

def sinkhorn_knopp(M, iterations=20):
    P = exp(M)  # Ensure positive
    for _ in range(iterations):
        P = P / P.sum(axis=1, keepdims=True)  # Normalize rows
        P = P / P.sum(axis=0, keepdims=True)  # Normalize cols
    return P

This iterative algorithm projects any matrix onto the doubly stochastic manifold.

Use in Your Code

Python

from mhc import sinkhorn_knopp, run_comparison

# Project a matrix
H_mhc = sinkhorn_knopp(random_matrix, iterations=20)

# Run full simulation
results = run_comparison(depth=64, n=4, sinkhorn_iters=20, seed=42)
print(f"HC final gain: {results['hc']['composite'][-1]['forward_gain']:.2e}")
print(f"mHC final gain: {results['mhc']['composite'][-1]['forward_gain']:.2f}")

PyTorch

from mhc import mHCResidual, mHCBlock

# Drop-in residual replacement
residual = mHCResidual(dim=512, n_streams=4, sinkhorn_iters=20)
output = residual(hidden_states, layer_output)

# Full transformer block
block = mHCBlock(dim=512, n_heads=8, n_streams=4)
output = block(x)

Repository Structure

mhc-visualizer/
├── python/
│   ├── mhc/
│   │   ├── sinkhorn.py      # Sinkhorn-Knopp implementation
│   │   ├── metrics.py       # Stability metrics
│   │   ├── simulation.py    # Depth simulation
│   │   └── torch_module.py  # PyTorch modules
│   ├── tests/               # 32 unit tests
│   ├── examples/            # Usage examples
│   └── generate_plots.py    # Figure generation
├── react-demo/
│   ├── src/
│   │   ├── components/      # React components
│   │   └── lib/             # TypeScript core library
│   └── dist-embed/          # Embeddable widget
└── notebook/
    ├── mhc_exploration.ipynb  # Colab notebook
    └── mhc_exploration.py     # Marimo reactive notebook

Requirements

Python

  • Python 3.8+
  • NumPy
  • PyTorch (optional, for neural network modules)
  • Matplotlib (for plot generation)
  • pytest (for testing)

React Demo

  • Node.js 18+
  • npm or yarn

References

  • mHC Paper: DeepSeek-AI, arXiv:2512.24880
  • Original HC Paper: Hyper-Connections
  • Sinkhorn-Knopp: Sinkhorn, R., & Knopp, P. (1967). "Concerning nonnegative matrices and doubly stochastic matrices"
  • Birkhoff-von Neumann: Birkhoff, G. (1946). "Three observations on linear algebra"

License

MIT License - see LICENSE for details.

Acknowledgments

Based on research by DeepSeek-AI. This visualization was built to make their insights more accessible and interactive.


Built by Subhadip Mitra | Blog Post | Colab Notebook

About

Interactive visualization of Manifold-Constrained Hyper-Connections (mHC) for stable deep network training

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors