Skip to content

Abinesh-Mathivanan/beens-trm-5M

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiny Recursive Model (5M Parameters)

architecture reference - Less is More: Recursive Reasoning with Tiny Networks (https://arxiv.org/pdf/2510.04871)


Project Structure

  • configs/: yaml configuration files for training.
  • dataset/: scripts for building and loading the Sudoku dataset.
  • models/: core model implementation, including layers, embeddings, and the main recursive reasoning architecture.
  • utils/: helper functions.
  • pretrain.py: main file

Training Notes

The 5M parameter TRM was trained on a single NVIDIA P100 GPU with 16GB of VRAM. The training ran successfully for 888 steps, after which the process crashed due to a CUDA Out of Memory (OOM) error. I am currently investigating the root cause.

I suspect recursive carry state is the primary suspect. Even though we explicitly move it to the CPU after each step, a reference to the GPU tensor might still be stored in the computation graph or elsewhere, ig.


Usage

build dataset

python -m dataset.build_sudoku_dataset \
  --output-dir data/sudoku-extreme-1k-aug-1000 \
  --subsample-size 1000 \
  --num-aug 1000

then run the train file

python pretrain.py

Accuracy Log

Accuracy Chart

About

tiny recursive model with 5M parameters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages