Skip to content

Commit 0c5077e

Browse files
committed
Enable inlining for dlp_execute_kernel with llvm19&LTO
Add always_inline to dlp_execute_kernel to ensure it is inlined when building with llvm19&LTO. This attribute only takes effect under LTO; in non-LTO builds, inlining is not guaranteed. This improves performance for tiny shapes.
1 parent 1956162 commit 0c5077e

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

src/frame/bindings/c_wrappers/capi_kernel_frame_wrappers.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,17 @@ dlp_init_and_get_kernel_hndl(kernel_datatype_t k_dtype,
239239
// Experimentally derived alignment, needs further analysis but gives
240240
// consistent good performance on zen5 machines.
241241
[[gnu::aligned(512)]]
242+
// Force inlining of dlp_execute_kernel to ensure optimal performance, especially when
243+
// building with Link Time Optimization (LTO). Without the always_inline attribute,
244+
// some compilers may not inline this function even with LTO enabled, which can lead
245+
// to suboptimal performance in tiny shape scenarios. Explicitly marking this
246+
// function as always_inline guarantees that the optimizer can inline it as intended
247+
// when LTO is enabled.
248+
// Note: With LLVM 19, this attribute has no effect unless LTO is enabled; in non-LTO
249+
// builds, the compiler may still choose not to inline this function.
250+
#if defined(__clang__) && __clang_major__ == 19
251+
__attribute__((always_inline))
252+
#endif
242253
void
243254
dlp_execute_kernel(dlp_kernel_hndl_t kernel_hndl,
244255
md_t m,

0 commit comments

Comments
 (0)