Transformer²: Self-Adaptive LLMs with Docker Support

The Revolution in AI Adaptation

Transformer² represents a paradigm shift in artificial intelligence, introducing a revolutionary approach to how Large Language Models (LLMs) adapt and learn. While traditional LLMs remain static after training, Transformer² brings the concept of "living intelligence" to AI systems through several groundbreaking innovations:

🧠 Dynamic Neural Architecture

At its core, Transformer² introduces a sophisticated approach to neural weight manipulation through Singular Value Decomposition (SVD). Unlike traditional static weight matrices, Transformer² decomposes these matrices into independent components, each representing different aspects of the model's knowledge. This decomposition enables the system to selectively enhance or suppress specific components in real-time, similar to how biological neural networks reconfigure themselves for different tasks.

🔄 Two-Pass Adaptive Processing

The system employs a revolutionary two-pass mechanism that mimics biological cognitive processes:

Task Analysis Phase:
- The system first analyzes and identifies task properties
- Employs sophisticated pattern recognition to understand task requirements
- Uses one of three increasingly powerful adaptation methods
Dynamic Adaptation Phase:
- Combines specialized "expert" vectors trained through reinforcement learning
- Optimizes neural pathways specifically for the current task
- Achieves real-time weight matrix modification without retraining

🎯 Singular Value Finetuning (SVF)

SVF represents a breakthrough in parameter-efficient training:

Uses reinforcement learning to develop task-specific expertise
Creates compact "expert" z-vectors for different domains
Requires orders of magnitude fewer parameters than traditional methods
Enables natural compositionality for complex task adaptation

🔬 Three-Tier Adaptation Framework

The system implements three increasingly sophisticated adaptation strategies:

Prompt Engineering:
- Uses carefully crafted prompts for task classification
- Dynamically selects appropriate pre-trained expertise
- Provides efficient baseline adaptation
Classification Expert:
- Employs a specialized SVF-tuned classifier
- Offers more nuanced task identification
- Enables more precise adaptation selection
Few-shot Adaptation:
- Represents the most advanced adaptation strategy
- Combines multiple expert vectors through weighted interpolation
- Uses Cross-Entropy Method (CEM) for optimal weight discovery
- Achieves superior performance through sophisticated blending of expertise

🌟 Technical Breakthroughs

The system achieves several technical innovations:

Real-time Adaptation: Modifies behavior during inference without retraining
Compositionality: Combines different types of expertise for novel tasks
Efficiency: Maintains high performance with minimal parameter overhead
Cross-Model Transfer: Enables knowledge sharing between different models
Biological Inspiration: Mirrors natural adaptive systems

💡 Implementation Excellence

The technical implementation showcases several architectural innovations:

Sophisticated SVD-based weight matrix decomposition (U⋅Σ⋅V^T)
Selective modification of singular values through z-vectors
Maintenance of full rank information unlike low-rank approaches
Built-in regularization through controlled component modification

🎯 Practical Advantages

The system delivers numerous practical benefits:

Consistently outperforms traditional methods like LoRA
Functions effectively with limited training data
Avoids catastrophic forgetting in continuous learning
Enables efficient knowledge transfer between models
Supports sustainable AI development practices

🚀 Future Implications

Transformer² points toward a future of truly adaptive AI:

Enables continuous, lifelong learning capabilities
Supports dynamic task adaptation without retraining
Provides a framework for self-organizing AI systems
Opens new possibilities for efficient model development
Paves the way for more sustainable AI scaling

This implementation provides a Docker-based deployment of Transformer², specifically designed for Windows environments while maintaining full GPU support through NVIDIA Container Toolkit.

Introduction

Transformer² (Transformer-squared) represents a paradigm shift in how Large Language Models (LLMs) adapt to diverse tasks. Traditional fine-tuning approaches often struggle with computational intensity and static behavior across varied tasks. This implementation introduces dynamic, real-time adaptation by selectively modifying singular components of weight matrices, enabling LLMs to optimize their behavior for specific tasks without extensive retraining.

Core Innovation: Two-Pass Adaptation Mechanism

The framework employs a sophisticated two-pass mechanism during inference:

Task Analysis: A dispatch system identifies task properties and requirements
Dynamic Adaptation: Task-specific "expert" vectors, trained through reinforcement learning, are combined to optimize model behavior for the incoming prompt

Technical Architecture

At its core, Transformer² leverages Singular Value Decomposition (SVD) to decompose the LLM's weight matrices into independent components. This decomposition allows for:

Identification of principal components in the model's knowledge representation
Selective enhancement/suppression of specific components for task optimization
Minimal parameter overhead while maintaining adaptability

The framework introduces Singular Value Finetuning (SVF), which uses reinforcement learning to learn task-specific z-vectors. These vectors act as "amplifiers" or "dampeners" for different components of the weight matrices, enabling precise task-specific adaptations.

This Implementation

This repository provides my Docker-based implementation of Transformer², specifically designed for Windows environments. The containerized approach ensures consistent behavior across different systems while maintaining full GPU support through NVIDIA Container Toolkit.

Key Features

Containerized Linux environment for Windows compatibility
CUDA-enabled runtime for GPU acceleration
Persistent model caching
Streamlined deployment process
Support for all original Transformer² evaluation methods

Prerequisites

Windows 10/11 with WSL2 enabled
Docker Desktop for Windows
NVIDIA Container Toolkit
NVIDIA GPU with CUDA support
At least 16GB RAM recommended
Hugging Face account with access to the Llama model family
Hugging Face API token with read access

Quick Start

Clone the repository:

git clone https://github.com/HarleyCoops/self-adaptive-llms.git
cd self-adaptive-llms

Set up Hugging Face authentication:
- Create a .env file in the root directory
- Add your Hugging Face token: HUGGING_FACE_TOKEN=your_token_here
Build and run the container:

# Build the container
docker-compose build

# Start an interactive shell
docker-compose run --rm self-adaptive-llm

Run evaluations:

# Few-shot evaluation
./run.sh bash scripts/eval_few_shot.sh

# Prompt-based evaluation
./run.sh bash scripts/eval_prompt_based.sh

Technical Documentation

📊 Interactive Function Flowcharts

This implementation includes a comprehensive suite of interactive flowcharts located in the /docs directory. These flowcharts use flowchart.js to provide detailed visualizations of the system's architecture and processes.

Viewing the Flowcharts

The flowcharts are interactive HTML files that can be viewed in several ways:

Option 1: Direct Browser Access After cloning the repository, open any flowchart HTML file directly in your browser:
# Windows
start docs/math_flowchart.html

# macOS
open docs/math_flowchart.html
Option 2: Local Development Server For a development environment with auto-refresh:
cd docs
python -m http.server 8000
# Visit http://localhost:8000 in your browser
Each flowchart provides an interactive visualization of different system components:

Math Module: Implementation flow of the math task handler

Base Classes: Core system architecture and interfaces

SVD Reinforcement: Weight matrix manipulation and RL loop

And many more...

🔄 Core System Components

SVD Reinforcement Learning Loop: Visualizes the main reinforcement learning loop and SVD-based weight matrix manipulation
Optimization Modules: Details the optimization modules for z-vector training
Weighted Combination: Illustrates the weighted interpolation mechanism for combining expert vectors

📝 Task-Specific Implementations

Mathematical Reasoning:
Abstract Reasoning:
- ARC Framework
- AI2 ARC Implementation
Specialized Math:
- Language Restricted Math
- Competition Math

🛠 Utility and Infrastructure

Model Integration:
- VLLM Integration: VLLM integration and model serving
- Tokenization Utils: Token processing
Core Utils:
- Utility Functions
- Logging System

Each interactive flowchart provides:

📋 Detailed function signatures and parameter descriptions
🔄 Control flow visualization with animated transitions
🔗 Component interdependencies with clickable navigation
⚡ Data transformation pipeline visualization
🚨 Error handling pathways and edge cases
💡 Inline documentation and implementation notes

The flowcharts are designed to be both educational and practical:

🎓 Perfect for understanding the system architecture
🔍 Useful for debugging and development
📚 Valuable for academic research and documentation
🤝 Helpful for new contributors

Dynamic Visualizations

The implementation includes sophisticated animations that visualize the model's internal processes:

Animation Components

Located in animations/transformer2_animations.py, the visualization system provides:

Real-time SVD decomposition visualization
Z-vector adaptation trajectories
Weight matrix transformation animations
Task-specific adaptation visualization
Performance metric evolution

The animations are rendered using state-of-the-art visualization libraries and can be used for:

Research presentations
Educational purposes
Debugging and analysis
Performance monitoring

Media assets in animations/media/ support these visualizations with:

Component diagrams
State transition animations
Performance graphs
Architecture schematics

Container Structure

The Docker implementation includes:

Ubuntu 22.04 base image with CUDA 12.1 support
Conda environment with Python 3.11
PyTorch with CUDA support
All project dependencies pre-configured
Mounted volumes for code and model caching

Evaluation Methods

Transformer² supports three adaptation methods:

Prompt-based Adaptation
- Uses specific prompts to classify tasks
- Selects appropriate pre-trained z-vectors
Classifier-based Adaptation
- Employs a trained task classifier
- Automatically identifies tasks during inference
Few-shot Adaptation
- Combines multiple pre-trained z-vectors through weighted interpolation
- Optimizes weights based on few-shot evaluation performance

Configuration

Key configuration files:

environment.yml: Conda environment specification
docker-compose.yml: Container orchestration settings
Dockerfile: Container build instructions
requirements.txt: Python dependencies

Performance Considerations

As noted in the original paper, the framework shows significant improvements across various tasks:

Outperforms LoRA on text-based tasks
Shows strong performance in vision-language tasks
Demonstrates effective cross-model knowledge transfer

For detailed performance metrics and comparisons, refer to the original paper.

Extending Compute Resources

For users with limited local GPU resources, several cloud platforms offer free or cost-effective GPU access:

🌩️ Google Colab Integration

Setup Steps:

!git clone https://github.com/HarleyCoops/self-adaptive-llms.git
!cd self-adaptive-llms
!pip install -r requirements.txt

Environment Variables:

import os
os.environ['HUGGING_FACE_TOKEN'] = 'your_token_here'

Running Evaluations:

!python svd_reinforce_hydra.py --config-dir=cfgs --config-name=config \
    base_model@_global_=llama3i8b optimization@_global_=cem \
    task@_global_=few_shot_math

📊 Kaggle Notebooks

Setup:
- Create a new Notebook with GPU (T4/P100)
- Select "Docker" as the accelerator
- Enable internet access

Installation:

!git clone https://github.com/HarleyCoops/self-adaptive-llms.git
!cd self-adaptive-llms
!pip install -r requirements.txt

Configuration:

import os
os.environ['HUGGING_FACE_TOKEN'] = 'your_token_here'

☁️ Vast.ai (Pay-as-you-go Option)

Create Instance:
- Select an instance with 12+ GB VRAM
- Choose Ubuntu 22.04 with CUDA support

Setup Commands:

git clone https://github.com/HarleyCoops/self-adaptive-llms.git
cd self-adaptive-llms
pip install -r requirements.txt

Environment Setup:

export HUGGING_FACE_TOKEN='your_token_here'

🔄 Code Modifications for Cloud

When using cloud resources, consider these adjustments:

Memory Optimization:

# In tasks/math.py, adjust GPU memory usage based on available VRAM
gpu_memory_utilization=0.8  # Increase if more VRAM available

Batch Size Adjustment:

# Increase for better performance with more VRAM
max_num_batched_tokens=4096

Checkpoint Saving:

# Add to your training loop to save progress
model.save_checkpoint('/content/checkpoints/')

📝 Best Practices

Resource Management:
- Monitor GPU memory usage with nvidia-smi
- Use persistent storage for model checkpoints
- Implement early stopping for efficient resource use
Data Handling:
- Cache downloaded models and datasets
- Use efficient data loading techniques
- Implement proper cleanup procedures
Cost Optimization:
- Use free tiers when possible (Colab, Kaggle)
- Monitor usage on pay-as-you-go platforms
- Implement automatic shutdown on completion

Citation

@misc{sun2025texttransformer2selfadaptivellms,
      title={$\text{Transformer}^2$: Self-adaptive LLMs}, 
      author={Qi Sun and Edoardo Cetin and Yujin Tang},
      year={2025},
      eprint={2501.06252},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

This implementation builds upon the original work by Sakana AI, adapting it for Windows environments through containerization.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
animations		animations
assets		assets
base_model		base_model
cfgs		cfgs
docs		docs
evaluation/fishfarm		evaluation/fishfarm
policy		policy
scripts		scripts
tasks		tasks
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
logging_utils.py		logging_utils.py
optim_modules.py		optim_modules.py
requirements.txt		requirements.txt
run.sh		run.sh
sakanaai-self-adaptive-llms.git.txt		sakanaai-self-adaptive-llms.git.txt
svd_reinforce_hydra.py		svd_reinforce_hydra.py
utils.py		utils.py

License

HarleyCoops/self-adaptive-llms

Folders and files

Latest commit

History

Repository files navigation

Transformer²: Self-Adaptive LLMs with Docker Support

The Revolution in AI Adaptation

🧠 Dynamic Neural Architecture

🔄 Two-Pass Adaptive Processing

🎯 Singular Value Finetuning (SVF)

🔬 Three-Tier Adaptation Framework

🌟 Technical Breakthroughs

💡 Implementation Excellence

🎯 Practical Advantages

🚀 Future Implications

Introduction

Core Innovation: Two-Pass Adaptation Mechanism

Technical Architecture

This Implementation

Key Features

Prerequisites

Quick Start

Technical Documentation

📊 Interactive Function Flowcharts

🔄 Core System Components

📝 Task-Specific Implementations

🛠 Utility and Infrastructure

Dynamic Visualizations

Animation Components

Container Structure

Evaluation Methods

Configuration

Performance Considerations

Extending Compute Resources

🌩️ Google Colab Integration

📊 Kaggle Notebooks

☁️ Vast.ai (Pay-as-you-go Option)

🔄 Code Modifications for Cloud

📝 Best Practices

Citation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages