Initialization on devices; shape inference and projections with strides
Report by Claude with main focus since start of July:
- Major Release 0.6.0 with comprehensive new features for deep learning
- Added support for Bfloat16 and FP8 precisions, critical for modern ML training
efficiency - Implemented convolution support with affine indexing expressions in projections,
einsum notation, and shape inference - Added counter-based randomness via Threefry4x32 operation for reproducible random
number generation - Introduced bidirectional precision inference (both top-down and bottom-up) for
automatic type optimization - Enhanced %cd syntax with .forward, .backprop, .zero_grads support and automatic
comment generation
- Added support for Bfloat16 and FP8 precisions, critical for modern ML training
- New Datasets and Examples
- Added MNIST and CIFAR10 datasets (borrowed from Raven)
- Created Names dataset with bigram use-case helper for language modeling
- Implemented Half-moons synthetic dataset for classification tasks
- Developed comprehensive test examples including bigram language models
- Performance and Memory Improvements
- Fixed critical memory leak in builtins.c
- Resolved bus error on large datasets
- Migrated from heap-local to on-stack allocation by default
- Improved virtual nodes and inlining to work across routines
- Enhanced shape inference with better Total_elems constraint handling and LUB support
- Backend Stabilization
- Fixed numerous CUDA backend regressions and missing constructs
- Resolved Metal backend issues with session-level bugs
- Added Float16 emulation for systems without native _Float16 support
- Fixed host-device synchronization issues with proper devices_not_lagging_host
semantics