v0.9.0 - Performance Optimizations and Architectural Improvements
π DUAL API BATCH PROCESSING SYSTEM
- NEW: Auto-dispatch batch processing API with intelligent strategy selection
- NEW: Power-user explicit strategy control for fine-tuned performance optimization
- SIMD and parallel processing strategies automatically selected based on data size and CPU capabilities
- Performance hints system for guiding optimization decisions (MINIMIZE_LATENCY, MAXIMIZE_THROUGHPUT, etc.)
- Thread-safe batch operations: getProbability(), getLogProbability(), getCumulativeProbability()
- Comprehensive strategy options: SCALAR, SIMD_BATCH, PARALLEL_SIMD, WORK_STEALING, CACHE_AWARE
π HEADER ARCHITECTURE CONSOLIDATION
- MAJOR: Consolidated header architecture reducing redundant includes by ~60%
- NEW: Modular header system with clear dependency levels (0-6)
- NEW: Consolidated convenience headers: distribution_common.h, distribution_platform_common.h
- Enhanced build performance through better header organization and dependency management
- Maintained backward compatibility while optimizing compilation efficiency
π DOCUMENTATION OVERHAUL
- MAJOR: Updated README.md to be concise while directing to comprehensive documentation
- NEW: Four detailed documentation guides covering all aspects of the library:
- BUILD_SYSTEM_GUIDE.md - Complete build system, cross-platform support, SIMD detection
- HEADER_ARCHITECTURE_GUIDE.md - Modular headers, dependency management, usage patterns
- PARALLEL_BATCH_PROCESSING_GUIDE.md - High-performance APIs, optimization guidelines
- WINDOWS_SUPPORT_GUIDE.md - Windows development environment support
- Clear separation between quick-start content and detailed reference material
β BUILD SYSTEM ENHANCEMENTS
- Enhanced CMake configuration with better error handling and cross-platform support
- Improved parallel build detection and automatic optimization
- Better SIMD detection and configuration across platforms
- Comprehensive threading system detection (TBB, OpenMP, pthreads, GCD, Windows Thread Pool)
π― PERFORMANCE IMPROVEMENTS
- Intelligent auto-dispatch eliminates need for manual performance optimization in most cases
- SIMD optimization: 2-70x speedup for suitable operations depending on distribution complexity
- Parallel processing: Up to NΓ speedup where N = CPU cores for large batch operations
- Work-stealing thread pools provide superior load balancing for irregular workloads