Skip to content

Release (v1.2.0)

Latest

Choose a tag to compare

@njroussel njroussel released this 17 Sep 20:40
· 47 commits to master since this release

New Features

  • Event API: Added an event API for fine-grained timing and synchronization of GPU kernels. This enables more detailed performance profiling and better control over asynchronous operations.
    (Dr.Jit PR #441, Dr.Jit-Core PR #174).

  • OpenGL Interoperability: Improved CUDA-OpenGL interoperability with simplified APIs. This enables efficient sharing of data between CUDA kernels and OpenGL rendering.
    (Dr.Jit PR #429, Dr.Jit-Core PR #164, contributed by Merlin Nimier-David).

  • Enhanced Int8/UInt8 Support: Improved support for 8-bit integer types with better casting and bitcast operations.
    (Dr.Jit PR #428, Dr.Jit-Core PR #163, contributed by Merlin Nimier-David).

Performance Improvements

  • Register Spilling to Shared Memory: CUDA backend now supports spilling registers to shared memory, improving performance for kernels with high register pressure. (Dr.Jit-Core commit fdc7cae7).

  • Memory View Support: Arrays can now be converted to Python memoryview objects for efficient zero-copy data access. (commit b7039184).

  • DLPack GIL Release: The dr.ArrayBase.dlpack() method now releases the GIL while waiting, improving multi-threaded performance. (commit 0adf9b4a).

  • Thread Synchronization: dr.sync_thread() now releases the GIL while waiting, preventing unnecessary blocking in multi-threaded applications. (commit 956d2f57).

API Improvements

  • Spherical Direction Utilities: Added Python implementation of spherical direction utilities (dr.sphdir). (PR #432, contributed by Sébastien Speierer).

  • Matrix Conversions: Added support for converting between 3D and 4D matrices: Matrix4f can be constructed from a 3D matrix and Matrix3f from a 4D matrix. (commit 7f8ea890).

  • Quaternion API: Improved the quaternion Python API for better usability and consistency. (commit 282da88a).

  • Type casts: Allow casting between Dr.Jit types to properly allow AD<->non-AD conversions when required. (commit 72f1e6b2).

Bug Fixes

  • Fixed deadlock issues in @dr.freeze decorator. (commit e8fc555e).

  • Fixed gradient tracking in Texture.tensor() to ensure gradients are never dropped inadvertently. (PR #444).

  • Fixed AD support for C++ repeat and tile operations with proper gradient propagation. (commits fd693056, 282da88a).

  • Fixed Python object traversal to check that __dict__ exists before accessing it, preventing crashes with certain object types. (commit 433adaf0).

  • Fixed symbolic loop size calculation to properly account for side-effects. (Dr.Jit-Core commit 31bf911).

  • Fixed read-after-free issue in OptiX SBT data loading. (Dr.Jit-Core commit 009adef, contributed by Merlin Nimier-David).

Other Improvements

  • Updated to nanobind v2.9.2

  • Improved error messages by adding function names to vectorized call errors. (Dr.Jit-Core PR #165, contributed by Sébastien Speierer).

  • Added missing checks for JIT leak warnings. (Dr.Jit-Core PR #166, contributed by Sébastien Speierer).

  • Added warning for LLVM API initialization failures. (Dr.Jit-Core PR #168, contributed by Sébastien Speierer).

  • Fixed pytest warnings and improved test infrastructure. (PR #436).