Skip to content

Initialization on devices; shape inference and projections with strides

Choose a tag to compare

@lukstafi lukstafi released this 20 Aug 15:00
· 461 commits to master since this release

Report by Claude with main focus since start of July:

  • Major Release 0.6.0 with comprehensive new features for deep learning
    • Added support for Bfloat16 and FP8 precisions, critical for modern ML training
      efficiency
    • Implemented convolution support with affine indexing expressions in projections,
      einsum notation, and shape inference
    • Added counter-based randomness via Threefry4x32 operation for reproducible random
      number generation
    • Introduced bidirectional precision inference (both top-down and bottom-up) for
      automatic type optimization
    • Enhanced %cd syntax with .forward, .backprop, .zero_grads support and automatic
      comment generation
  • New Datasets and Examples
    • Added MNIST and CIFAR10 datasets (borrowed from Raven)
    • Created Names dataset with bigram use-case helper for language modeling
    • Implemented Half-moons synthetic dataset for classification tasks
    • Developed comprehensive test examples including bigram language models
  • Performance and Memory Improvements
    • Fixed critical memory leak in builtins.c
    • Resolved bus error on large datasets
    • Migrated from heap-local to on-stack allocation by default
    • Improved virtual nodes and inlining to work across routines
    • Enhanced shape inference with better Total_elems constraint handling and LUB support
  • Backend Stabilization
    • Fixed numerous CUDA backend regressions and missing constructs
    • Resolved Metal backend issues with session-level bugs
    • Added Float16 emulation for systems without native _Float16 support
    • Fixed host-device synchronization issues with proper devices_not_lagging_host
      semantics