Skip to content

Commit 74f438f

Browse files
committed
Enable inlining for dlp_execute_kernel with llvm19&LTO
Add always_inline to dlp_execute_kernel to ensure it is inlined when building with llvm19&LTO. This attribute only takes effect under LTO; in non-LTO builds, inlining is not guaranteed. This improves performance for tiny shapes.
1 parent 3256da1 commit 74f438f

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

src/frame/bindings/c_wrappers/capi_kernel_frame_wrappers.cc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,17 @@ dlp_init_and_get_kernel_hndl(kernel_datatype_t k_dtype,
239239
// Experimentally derived alignment, needs further analysis but gives
240240
// consistent good performance on zen5 machines.
241241
[[gnu::aligned(64)]]
242+
// Force inlining of dlp_execute_kernel to ensure optimal performance,
243+
// especially when building with Link Time Optimization (LTO). Without the
244+
// always_inline attribute, some compilers may not inline this function even
245+
// with LTO enabled, which can lead to suboptimal performance in tiny shape
246+
// scenarios. Explicitly marking this function as always_inline guarantees that
247+
// the optimizer can inline it as intended when LTO is enabled. Note: With LLVM
248+
// 19, this attribute has no effect unless LTO is enabled; in non-LTO builds,
249+
// the compiler may still choose not to inline this function.
250+
#if defined(__clang__) && __clang_major__ >= 19
251+
__attribute__((always_inline))
252+
#endif
242253
void
243254
dlp_execute_kernel(dlp_kernel_hndl_t kernel_hndl,
244255
md_t m,

0 commit comments

Comments
 (0)