v0.9.1 - Critical Performance Fix Release
This release addresses critical performance issues and provides a foundation for future GPU acceleration:
🎯 CRITICAL FIXES:
- Replaced CACHE_AWARE strategy causing 100x performance regression with GPU_ACCELERATED
- GPU_ACCELERATED safely falls back to WORK_STEALING for optimal CPU performance pending implementation
- Fixed auto-dispatch first-call initialization for consistent performance
🔧 TECHNICAL IMPROVEMENTS:
- Enhanced performance dispatcher with GPU fallback logic and informative logging
- Updated all distribution implementations with consistent fallback behavior
- Comprehensive strategy migration across entire codebase (29 files changed)
📈 PERFORMANCE BENEFITS:
- Eliminates severe performance regression from problematic cache-aware parallelism
- Maintains optimal CPU performance through work-stealing fallback
- Positions library for future GPU acceleration without breaking changes
🛡️ QUALITY ASSURANCE:
- 100% API compatibility maintained
- All tests, examples, tools, and documentation updated consistently
- Legitimate cache-related code (hardware caches) preserved unchanged
This release ensures optimal performance for v0.9.1 while establishing the foundation for GPU acceleration in future versions.