Skip to content

Latest commit

 

History

History
120 lines (90 loc) · 3.91 KB

File metadata and controls

120 lines (90 loc) · 3.91 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[2.0.1] - 2025-10-26

Fixed

Position Embedding Bug Fix

  • Critical Fix: Resolved IndexError: index out of range in self that occurred when validation sequences exceeded model's max_length
  • Added automatic sequence truncation in GPT model forward method with warning logs
  • Implemented max_length parameter in DataLoader with custom collate function
  • Added position embedding size validation in checkpoint loading
  • Enhanced error handling with detailed diagnostic messages and actionable solutions
  • Fixed tensor contiguity issues by using reshape() instead of view() for loss calculation

Improvements

  • Added model configuration logging in Evaluator class (displays max_length, vocab_size, position embedding size)
  • Evaluation script now extracts and uses max_length from loaded model
  • Enhanced error messages provide clear guidance on fixing sequence length issues
  • Added comprehensive unit and integration tests for sequence length validation

Documentation

  • Updated README with position embedding troubleshooting section
  • Added implementation guide for applying fixes to existing projects
  • Created detailed test results documentation
  • Updated smart defaults section to mention automatic sequence length handling

Testing

  • Added 3 unit tests for sequence length validation
  • Added 5 integration tests for evaluation with various sequence lengths
  • All tests pass successfully with sequences at, exceeding, and far beyond max_length

[1.0.0] - 2025-01-24

Added

Templates

  • NANO Template (1M params) - For learning and quick experiments
  • TINY Template (6M params) - For prototyping and small projects
  • SMALL Template (100M params) - For production applications
  • BASE Template (1B params) - For research and high-quality models

Core Features

  • Complete PyTorch training infrastructure
  • Data preprocessing pipeline
  • Tokenizer training (BPE, WordPiece, Unigram)
  • Checkpoint management with auto-save
  • TensorBoard integration
  • Live training dashboard
  • Interactive chat interface
  • Model comparison tools
  • Deployment scripts

Smart Features

  • Automatic vocab size detection from tokenizer
  • Model/data size mismatch warnings
  • Overfitting detection during training
  • Cross-platform path handling
  • UTF-8 encoding support for Windows

Plugins

  • WandB integration for experiment tracking
  • HuggingFace Hub integration for model sharing
  • SynthexAI integration for synthetic data

Documentation

  • Comprehensive README with examples
  • Detailed project READMEs for scaffolded projects
  • Contributing guidelines
  • Troubleshooting guides

Technical Details

  • Node.js 18+ required
  • Python 3.8+ required
  • PyTorch 2.0+ required
  • Cross-platform support (Windows, macOS, Linux)

Bug Fixes

  • Fixed data loading with 2D tensors
  • Fixed vocab size mismatch (32K to auto-detect)
  • Fixed Windows UTF-8 encoding issues
  • Fixed deploy.py unicode escape errors
  • Fixed chat.py cross-platform path handling
  • Fixed model forward method to accept attention_mask

Known Issues

  • Dashboard may show garbled emojis in Windows PowerShell (functionality works)
  • PyTorch FutureWarning about torch.load (will be addressed in PyTorch 2.x)

[Unreleased]

Planned Features

  • More model architectures (BERT, T5)
  • Distributed training support
  • Model quantization tools
  • Fine-tuning templates
  • Web UI for project management
  • Automatic hyperparameter tuning

Version History

  • 2.0.1 (2025-10-26) - Position embedding bug fix
  • 1.0.0 (2025-01-24) - Initial release
  • 0.1.0 (2025-01-20) - Beta release (internal)

For more details, see the full commit history.