This document outlines the features implemented in ZigFormer and the future goals for the project.
Important
This roadmap is a work in progress and is subject to change.
- Tokenization (word-based)
- Vocabulary building
- Embedding layer (token + positional)
- Multi-head self-attention
- Feed-forward network
- Layer normalization
- Residual connections
- Optimizer (Adam)
- Gradient clipping
- Cross-entropy loss
- Training loop (pretraining and fine-tuning)
- Learning rate scheduling
- Model checkpointing (save and load)
- Mini-batch training
- Gradient accumulation
- Greedy decoding
- KV caching
- Top-k and top-p sampling
- Beam search
- Command line interface
- Multi-threading support
- SIMD optimizations
- Model loading from a checkpoint
- Configuration file
- Improved error handling and validation
- Web server
- Web interface
- Model loading from a checkpoint
- Configuration file
- Improved error handling and validation
- Markdown rendering and syntax highlighting
- Interactive sampling controls (Top-k and Top-p)
- Dark and light mode toggle
- Model statistics display