Open
Description
The ServerGC threads in my application spend approx. 50% of their time in SVR::memcopy
In x64 the relevant loop looks like this:
@loop:
mov rax, qword ptr [r10+r9*1]
mov qword ptr [r9], rax
lea r9, ptr [r9+0x8]
sub r11, 0x1
jnz 0x18016b901 <loop>
If I'm not mistaken, this is a regular memcpy, but without vectorization or other optimizations? Perhaps SVR::memcopy could be implemented with memcpy
instead, which is more optimized?