HOPE: Nested Learning Reimplementation

This repository contains a PyTorch reimplementation of HOPE (High-order Optimization & Perception Engine) , based on the paper "Nested Learning: The Illusion of Deep Learning".

Core Components

1. Nested Learning Paradigm

Unlike traditional Deep Learning which stacks layers, Nested Learning (NL) decomposes the architecture into "levels" based on update frequency.

Code Location: src/utils/trainer.py
Implementation Note: The training loop must handle Multi-Time Scale Updates. Parameters in different levels are updated with specific frequencies. You cannot use a standard optimizer.step() for all parameters every iteration. The trainer tracks the global step and only updates specific parameter groups (e.g., "High Frequency Neurons" vs "Low Frequency Neurons") when step % frequency == 0.

2. Continuum Memory System (CMS)

The CMS replaces the traditional Feed-Forward Network (FFN). It is a chain of MLP blocks where the $l$-th MLP is updated every $C^{(l)}$ steps.

Code Location: src/memory/cms.py
Key Equation: Implement Equation 30 for the forward pass and Equation 31 for the update logic.
Structure: A nn.ModuleList of MLPs. The forward pass is: $y_t = MLP^{(f_k)}(...MLP^{(f_1)}(x_t))$.

3. Self-Modifying Titans

This is the sequence mixing core. It uses a "Neural Learning Module" that learns to modify itself by learning its own update algorithm.

Code Location: src/models/titans.py
Internal Optimizer: The model uses a variant of gradient descent with regression loss as its internal forward pass mechanism.
Key Equation: The update rule for the weight $W_{t+1}$ follows Equation 28: $$W_{t+1} = W_t(I - x_t x_t^\top) - \eta_{t+1} \nabla_{y_t} \mathcal{L}(W_t; x_t) \otimes x_t$$.

4. Deep Optimizers

The paper argues that optimizers (like Adam or Momentum) are associative memory modules.

Code Location: src/optimizers/deep_opt.py
Deep Momentum Gradient Descent (DMGD): Implement Equation 23, where the momentum term is replaced by a neural network (e.g., an MLP) that compresses gradients.

Configuration (`config/hope_config.yaml`)

You need to define the frequency hierarchy for the CMS layers.

model:
  dim: 768
  depth: 12
  vocab_size: 50257

cms:
  # Define levels (Low to High frequency)
  # Frequencies are relative to the unit time step
  levels:
    - name: "level_1"
      frequency: 1       # Updates every step (High Freq)
      chunk_size: 16
    - name: "level_2"
      frequency: 16      # Updates every 16 steps
      chunk_size: 1000   # 1M in paper, scaled down for testing
    - name: "level_3"
      frequency: 1000    # Updates rarely (Low Freq)
      chunk_size: 0      # 0 implies rarely updated (Pre-training knowledge)

Training

To train the HOPE model on Wikitext:

python train.py --config config/hope_config.yaml --dataset wikitext-103

Note: If you are using a less powerful computer, then you can use config/hope_mini_config.yaml to train.

Key Architectural Differences

When implementing src/models/hope.py, ensure you distinguish it from a standard Transformer:

No fixed FFN: Replace standard FFN blocks with the CMS module.
Dynamic Projections: The $Q, K, V$ projections are not static; they are part of the optimization flow described in the "Self-Modifying Titans" section.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOPE: Nested Learning Reimplementation

Core Components

1. Nested Learning Paradigm

2. Continuum Memory System (CMS)

3. Self-Modifying Titans

4. Deep Optimizers

Configuration (`config/hope_config.yaml`)

Training

Key Architectural Differences

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HOPE: Nested Learning Reimplementation

Core Components

1. Nested Learning Paradigm

2. Continuum Memory System (CMS)

3. Self-Modifying Titans

4. Deep Optimizers

Configuration (config/hope_config.yaml)

Training

Key Architectural Differences

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`config/hope_config.yaml`)

Packages