Skip to content

Commit 6f12fd1

Browse files
committed
Enable inlining for dlp_execute_kernel with LTO
Add always_inline to dlp_execute_kernel to ensure it is inlined when building with LTO. With LLVM 19, this attribute only takes effect under LTO; in non-LTO builds, inlining is not guaranteed. This improves performance for tiny shapes.
1 parent 1956162 commit 6f12fd1

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

src/frame/bindings/c_wrappers/capi_kernel_frame_wrappers.cc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,15 @@ dlp_init_and_get_kernel_hndl(kernel_datatype_t k_dtype,
239239
// Experimentally derived alignment, needs further analysis but gives
240240
// consistent good performance on zen5 machines.
241241
[[gnu::aligned(512)]]
242+
// Force inlining of dlp_execute_kernel to ensure optimal performance, especially when
243+
// building with Link Time Optimization (LTO). Without the always_inline attribute,
244+
// some compilers may not inline this function even with LTO enabled, which can lead
245+
// to suboptimal performance in tiny shape scenarios. Explicitly marking this
246+
// function as always_inline guarantees that the optimizer can inline it as intended
247+
// when LTO is enabled.
248+
// Note: With LLVM 19, this attribute has no effect unless LTO is enabled; in non-LTO
249+
// builds, the compiler may still choose not to inline this function.
250+
__attribute__((always_inline))
242251
void
243252
dlp_execute_kernel(dlp_kernel_hndl_t kernel_hndl,
244253
md_t m,

0 commit comments

Comments
 (0)