Skip to content

meta_kernel::compile is very slow, even for cached source #768

Open
@Ulfgard

Description

@Ulfgard

Hi,

I am currently benchmarking the overhead of boost::compute. My test-program is computing nothing big and the equivalent cpu program finishes in no measurable runtime. The OpenCL version takes several seconds, so i decided to check with valgrind-callgrind where this time is lost. My program utilizes 11 different kernels and the runtime is the Intel OpenCL cpu implementation. I am using callgrind with O1 optimizations.

71% of total run-time is spent in 8149 calls to meta_kernel::compile.
Inside, we have over all 8149 calls:
total-runtime% function
43.21% clCreateKernel
14.5% compute::detail::sha1::sha1
8.7% program::build_with_source

As a sanity check:
1.6% meta_kernel::source
1.2% clEnqueueNDRangeKernel

so, roughly 58% of total running time is spent on figuring out whether the program is already build and creating the kernel afterwards. Compiling and caching works fine, we have exactly 11 calls to build_with_source.

The problem is that compute assumes that kernels are throw-away objects. This is unfortunately not the case. e.g. on NVIDIA, clCreateKernel is reported to have an overhead of roughly 1ms. My runtime seems to be in the same order. The programs built with meta_kernel have only one kernel function, so one easy workaround would be to cache the kernel together with the program.

Also, sha1 is terribly slow. The hash function should not take an order of magnitude longer than creating the source in the first place.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions