Transformer² represents a paradigm shift in artificial intelligence, introducing a revolutionary approach to how Large Language Models (LLMs) adapt and learn. While traditional LLMs remain static after training, Transformer² brings the concept of "living intelligence" to AI systems through several groundbreaking innovations:
At its core, Transformer² introduces a sophisticated approach to neural weight manipulation through Singular Value Decomposition (SVD). Unlike traditional static weight matrices, Transformer² decomposes these matrices into independent components, each representing different aspects of the model's knowledge. This decomposition enables the system to selectively enhance or suppress specific components in real-time, similar to how biological neural networks reconfigure themselves for different tasks.
The system employs a revolutionary two-pass mechanism that mimics biological cognitive processes:
-
Task Analysis Phase:
- The system first analyzes and identifies task properties
- Employs sophisticated pattern recognition to understand task requirements
- Uses one of three increasingly powerful adaptation methods
-
Dynamic Adaptation Phase:
- Combines specialized "expert" vectors trained through reinforcement learning
- Optimizes neural pathways specifically for the current task
- Achieves real-time weight matrix modification without retraining
SVF represents a breakthrough in parameter-efficient training:
- Uses reinforcement learning to develop task-specific expertise
- Creates compact "expert" z-vectors for different domains
- Requires orders of magnitude fewer parameters than traditional methods
- Enables natural compositionality for complex task adaptation
The system implements three increasingly sophisticated adaptation strategies:
-
Prompt Engineering:
- Uses carefully crafted prompts for task classification
- Dynamically selects appropriate pre-trained expertise
- Provides efficient baseline adaptation
-
Classification Expert:
- Employs a specialized SVF-tuned classifier
- Offers more nuanced task identification
- Enables more precise adaptation selection
-
Few-shot Adaptation:
- Represents the most advanced adaptation strategy
- Combines multiple expert vectors through weighted interpolation
- Uses Cross-Entropy Method (CEM) for optimal weight discovery
- Achieves superior performance through sophisticated blending of expertise
The system achieves several technical innovations:
- Real-time Adaptation: Modifies behavior during inference without retraining
- Compositionality: Combines different types of expertise for novel tasks
- Efficiency: Maintains high performance with minimal parameter overhead
- Cross-Model Transfer: Enables knowledge sharing between different models
- Biological Inspiration: Mirrors natural adaptive systems
The technical implementation showcases several architectural innovations:
- Sophisticated SVD-based weight matrix decomposition (U⋅Σ⋅V^T)
- Selective modification of singular values through z-vectors
- Maintenance of full rank information unlike low-rank approaches
- Built-in regularization through controlled component modification
The system delivers numerous practical benefits:
- Consistently outperforms traditional methods like LoRA
- Functions effectively with limited training data
- Avoids catastrophic forgetting in continuous learning
- Enables efficient knowledge transfer between models
- Supports sustainable AI development practices
Transformer² points toward a future of truly adaptive AI:
- Enables continuous, lifelong learning capabilities
- Supports dynamic task adaptation without retraining
- Provides a framework for self-organizing AI systems
- Opens new possibilities for efficient model development
- Paves the way for more sustainable AI scaling
This implementation provides a Docker-based deployment of Transformer², specifically designed for Windows environments while maintaining full GPU support through NVIDIA Container Toolkit.
Transformer² (Transformer-squared) represents a paradigm shift in how Large Language Models (LLMs) adapt to diverse tasks. Traditional fine-tuning approaches often struggle with computational intensity and static behavior across varied tasks. This implementation introduces dynamic, real-time adaptation by selectively modifying singular components of weight matrices, enabling LLMs to optimize their behavior for specific tasks without extensive retraining.
The framework employs a sophisticated two-pass mechanism during inference:
- Task Analysis: A dispatch system identifies task properties and requirements
- Dynamic Adaptation: Task-specific "expert" vectors, trained through reinforcement learning, are combined to optimize model behavior for the incoming prompt
At its core, Transformer² leverages Singular Value Decomposition (SVD) to decompose the LLM's weight matrices into independent components. This decomposition allows for:
- Identification of principal components in the model's knowledge representation
- Selective enhancement/suppression of specific components for task optimization
- Minimal parameter overhead while maintaining adaptability
The framework introduces Singular Value Finetuning (SVF), which uses reinforcement learning to learn task-specific z-vectors. These vectors act as "amplifiers" or "dampeners" for different components of the weight matrices, enabling precise task-specific adaptations.
This repository provides my Docker-based implementation of Transformer², specifically designed for Windows environments. The containerized approach ensures consistent behavior across different systems while maintaining full GPU support through NVIDIA Container Toolkit.
- Containerized Linux environment for Windows compatibility
- CUDA-enabled runtime for GPU acceleration
- Persistent model caching
- Streamlined deployment process
- Support for all original Transformer² evaluation methods
- Windows 10/11 with WSL2 enabled
- Docker Desktop for Windows
- NVIDIA Container Toolkit
- NVIDIA GPU with CUDA support
- At least 16GB RAM recommended
- Hugging Face account with access to the Llama model family
- Hugging Face API token with read access
- Clone the repository:
git clone https://github.com/HarleyCoops/self-adaptive-llms.git
cd self-adaptive-llms
-
Set up Hugging Face authentication:
- Create a
.env
file in the root directory - Add your Hugging Face token:
HUGGING_FACE_TOKEN=your_token_here
- Create a
-
Build and run the container:
# Build the container
docker-compose build
# Start an interactive shell
docker-compose run --rm self-adaptive-llm
- Run evaluations:
# Few-shot evaluation
./run.sh bash scripts/eval_few_shot.sh
# Prompt-based evaluation
./run.sh bash scripts/eval_prompt_based.sh
This implementation includes a comprehensive suite of interactive flowcharts located in the /docs
directory. These flowcharts use flowchart.js to provide detailed visualizations of the system's architecture and processes.
Viewing the Flowcharts
The flowcharts are interactive HTML files that can be viewed in several ways:
Option 1: Direct Browser Access After cloning the repository, open any flowchart HTML file directly in your browser:
# Windows start docs/math_flowchart.html # macOS open docs/math_flowchart.htmlOption 2: Local Development Server For a development environment with auto-refresh:
cd docs python -m http.server 8000 # Visit http://localhost:8000 in your browserEach flowchart provides an interactive visualization of different system components:
- Math Module: Implementation flow of the math task handler
- Base Classes: Core system architecture and interfaces
- SVD Reinforcement: Weight matrix manipulation and RL loop
- And many more...
- SVD Reinforcement Learning Loop: Visualizes the main reinforcement learning loop and SVD-based weight matrix manipulation
- Optimization Modules: Details the optimization modules for z-vector training
- Weighted Combination: Illustrates the weighted interpolation mechanism for combining expert vectors
- Mathematical Reasoning:
- Abstract Reasoning:
- Specialized Math:
- Model Integration:
- VLLM Integration: VLLM integration and model serving
- Tokenization Utils: Token processing
- Core Utils:
Each interactive flowchart provides:
- 📋 Detailed function signatures and parameter descriptions
- 🔄 Control flow visualization with animated transitions
- 🔗 Component interdependencies with clickable navigation
- ⚡ Data transformation pipeline visualization
- 🚨 Error handling pathways and edge cases
- 💡 Inline documentation and implementation notes
The flowcharts are designed to be both educational and practical:
- 🎓 Perfect for understanding the system architecture
- 🔍 Useful for debugging and development
- 📚 Valuable for academic research and documentation
- 🤝 Helpful for new contributors
The implementation includes sophisticated animations that visualize the model's internal processes:
Located in animations/transformer2_animations.py
, the visualization system provides:
- Real-time SVD decomposition visualization
- Z-vector adaptation trajectories
- Weight matrix transformation animations
- Task-specific adaptation visualization
- Performance metric evolution
The animations are rendered using state-of-the-art visualization libraries and can be used for:
- Research presentations
- Educational purposes
- Debugging and analysis
- Performance monitoring
Media assets in animations/media/
support these visualizations with:
- Component diagrams
- State transition animations
- Performance graphs
- Architecture schematics
The Docker implementation includes:
- Ubuntu 22.04 base image with CUDA 12.1 support
- Conda environment with Python 3.11
- PyTorch with CUDA support
- All project dependencies pre-configured
- Mounted volumes for code and model caching
Transformer² supports three adaptation methods:
-
Prompt-based Adaptation
- Uses specific prompts to classify tasks
- Selects appropriate pre-trained z-vectors
-
Classifier-based Adaptation
- Employs a trained task classifier
- Automatically identifies tasks during inference
-
Few-shot Adaptation
- Combines multiple pre-trained z-vectors through weighted interpolation
- Optimizes weights based on few-shot evaluation performance
Key configuration files:
environment.yml
: Conda environment specificationdocker-compose.yml
: Container orchestration settingsDockerfile
: Container build instructionsrequirements.txt
: Python dependencies
As noted in the original paper, the framework shows significant improvements across various tasks:
- Outperforms LoRA on text-based tasks
- Shows strong performance in vision-language tasks
- Demonstrates effective cross-model knowledge transfer
For detailed performance metrics and comparisons, refer to the original paper.
For users with limited local GPU resources, several cloud platforms offer free or cost-effective GPU access:
-
Setup Steps:
!git clone https://github.com/HarleyCoops/self-adaptive-llms.git !cd self-adaptive-llms !pip install -r requirements.txt
-
Environment Variables:
import os os.environ['HUGGING_FACE_TOKEN'] = 'your_token_here'
-
Running Evaluations:
!python svd_reinforce_hydra.py --config-dir=cfgs --config-name=config \ base_model@_global_=llama3i8b optimization@_global_=cem \ task@_global_=few_shot_math
-
Setup:
- Create a new Notebook with GPU (T4/P100)
- Select "Docker" as the accelerator
- Enable internet access
-
Installation:
!git clone https://github.com/HarleyCoops/self-adaptive-llms.git !cd self-adaptive-llms !pip install -r requirements.txt
-
Configuration:
import os os.environ['HUGGING_FACE_TOKEN'] = 'your_token_here'
-
Create Instance:
- Select an instance with 12+ GB VRAM
- Choose Ubuntu 22.04 with CUDA support
-
Setup Commands:
git clone https://github.com/HarleyCoops/self-adaptive-llms.git cd self-adaptive-llms pip install -r requirements.txt
-
Environment Setup:
export HUGGING_FACE_TOKEN='your_token_here'
When using cloud resources, consider these adjustments:
-
Memory Optimization:
# In tasks/math.py, adjust GPU memory usage based on available VRAM gpu_memory_utilization=0.8 # Increase if more VRAM available
-
Batch Size Adjustment:
# Increase for better performance with more VRAM max_num_batched_tokens=4096
-
Checkpoint Saving:
# Add to your training loop to save progress model.save_checkpoint('/content/checkpoints/')
-
Resource Management:
- Monitor GPU memory usage with
nvidia-smi
- Use persistent storage for model checkpoints
- Implement early stopping for efficient resource use
- Monitor GPU memory usage with
-
Data Handling:
- Cache downloaded models and datasets
- Use efficient data loading techniques
- Implement proper cleanup procedures
-
Cost Optimization:
- Use free tiers when possible (Colab, Kaggle)
- Monitor usage on pay-as-you-go platforms
- Implement automatic shutdown on completion
@misc{sun2025texttransformer2selfadaptivellms,
title={$\text{Transformer}^2$: Self-adaptive LLMs},
author={Qi Sun and Edoardo Cetin and Yujin Tang},
year={2025},
eprint={2501.06252},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This implementation builds upon the original work by Sakana AI, adapting it for Windows environments through containerization.