Successfully reorganized the monolithic llm.py (735 lines) into a clean, modular structure:
blueberry-llm-t4-gpu/
├── configs/
│ ├── __init__.py
│ └── moe_config.py # MoEModelConfig class
├── models/
│ ├── __init__.py
│ ├── components.py # Expert, TopKRouter, MixtureOfExperts
│ ├── layers.py # MultiHeadAttention, Rotary, MoETransformerBlock
│ └── moe_llm.py # MoEMinimalLLM main model
├── optimizers/
│ ├── __init__.py
│ └── muon.py # Muon optimizer + zeropower_via_newtonschulz5
├── data/
│ ├── __init__.py
│ ├── dataset.py # TextTokenDataset
│ └── loader.py # load_and_cache_data function
├── training/
│ ├── __init__.py
│ ├── trainer.py # train_moe_model function
│ └── evaluation.py # evaluate_model function
├── utils/
│ ├── __init__.py
│ └── helpers.py # set_seed function
├── legacy/
│ └── llm_original.py # Original monolithic file (backup)
└── train_moe.py # New simplified main training script
- Clean Separation: Each module has a single, clear responsibility
- No Bloat: Only essential files, no unnecessary abstractions
- Easy Navigation: Clear naming conventions make code easy to find
- Maintainable: Easy to modify specific components without affecting others
- Scalable: Can easily add new components without restructuring
- Importable: All modules properly configured with
__init__.pyfiles
Run the new organized training:
python train_moe.pyImport specific components:
from configs import MoEModelConfig
from models import MoEMinimalLLM
from data import TextTokenDataset
from optimizers import Muon
from training import train_moe_model- configs/: Configuration management (1 file)
- models/: Neural network components (4 files)
- optimizers/: Custom optimizers (1 file)
- data/: Data handling (2 files)
- training/: Training logic (2 files)
- utils/: Helper functions (1 file)
- Main script: Clean orchestration (1 file)
Total: 12 focused files vs 1 monolithic file
The code is now well-organized, maintainable, and ready for further development! 🎯