New Features
-
Event API: Added an event API for fine-grained timing and synchronization of GPU kernels. This enables more detailed performance profiling and better control over asynchronous operations.
(Dr.Jit PR #441, Dr.Jit-Core PR #174). -
OpenGL Interoperability: Improved CUDA-OpenGL interoperability with simplified APIs. This enables efficient sharing of data between CUDA kernels and OpenGL rendering.
(Dr.Jit PR #429, Dr.Jit-Core PR #164, contributed by Merlin Nimier-David). -
Enhanced Int8/UInt8 Support: Improved support for 8-bit integer types with better casting and bitcast operations.
(Dr.Jit PR #428, Dr.Jit-Core PR #163, contributed by Merlin Nimier-David).
Performance Improvements
-
Register Spilling to Shared Memory: CUDA backend now supports spilling registers to shared memory, improving performance for kernels with high register pressure. (Dr.Jit-Core commit
fdc7cae7). -
Memory View Support: Arrays can now be converted to Python
memoryviewobjects for efficient zero-copy data access. (commitb7039184). -
DLPack GIL Release: The
dr.ArrayBase.dlpack()method now releases the GIL while waiting, improving multi-threaded performance. (commit0adf9b4a). -
Thread Synchronization:
dr.sync_thread()now releases the GIL while waiting, preventing unnecessary blocking in multi-threaded applications. (commit956d2f57).
API Improvements
-
Spherical Direction Utilities: Added Python implementation of spherical direction utilities (
dr.sphdir). (PR #432, contributed by Sébastien Speierer). -
Matrix Conversions: Added support for converting between 3D and 4D matrices:
Matrix4fcan be constructed from a 3D matrix andMatrix3ffrom a 4D matrix. (commit7f8ea890). -
Quaternion API: Improved the quaternion Python API for better usability and consistency. (commit
282da88a). -
Type casts: Allow casting between Dr.Jit types to properly allow AD<->non-AD conversions when required. (commit
72f1e6b2).
Bug Fixes
-
Fixed deadlock issues in
@dr.freezedecorator. (commite8fc555e). -
Fixed gradient tracking in
Texture.tensor()to ensure gradients are never dropped inadvertently. (PR #444). -
Fixed AD support for C++
repeatandtileoperations with proper gradient propagation. (commitsfd693056,282da88a). -
Fixed Python object traversal to check that
__dict__exists before accessing it, preventing crashes with certain object types. (commit433adaf0). -
Fixed symbolic loop size calculation to properly account for side-effects. (Dr.Jit-Core commit
31bf911). -
Fixed read-after-free issue in OptiX SBT data loading. (Dr.Jit-Core commit
009adef, contributed by Merlin Nimier-David).
Other Improvements
-
Updated to nanobind v2.9.2
-
Improved error messages by adding function names to vectorized call errors. (Dr.Jit-Core PR #165, contributed by Sébastien Speierer).
-
Added missing checks for JIT leak warnings. (Dr.Jit-Core PR #166, contributed by Sébastien Speierer).
-
Added warning for LLVM API initialization failures. (Dr.Jit-Core PR #168, contributed by Sébastien Speierer).
-
Fixed pytest warnings and improved test infrastructure. (PR #436).