Skip to content

Conversation

@JDarnley
Copy link
Contributor

This time I managed to get things right. Again "add a basic benchmark for ubuf_pic_clear" is not supposed to be committed.

It compiles to what I expect on gcc as far back as 4.8. That being a hot loop of 4 instructions: movdqu, add, cmp, conditional jump

The performance increase is much more modest than the original improvements. From ~6500 to ~7000 calls per second on an AMD Ryzen 7 3700X desktop and from ~2600 to ~2900 on an Intel Xeon E3-1245 v5 server and from ~2000 to ~2200 on an Intel Xeon CPU E3-1265L v3 server with just 1 memory channel populated.

@nto if your previous measurements were done on an x86 system would you like to look at this patch set?

JDarnley added 5 commits July 27, 2022 17:20
…buf_pic_plane_set_color on x86

Using gcc 12 it manages to get compiled into a hot loop of: movups, add,
cmp, jb.  Increases the number of calls to ubuf_pic_clear from ~6200 to
~7000 per second.

Other compilers are less successful.
…sics

Compiles to what is desired on gcc as old as 4.8
@nto nto self-requested a review July 27, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant