more optimization for ubuf_pic and upipe_set_color #869
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This time I managed to get things right. Again "add a basic benchmark for ubuf_pic_clear" is not supposed to be committed.
It compiles to what I expect on gcc as far back as 4.8. That being a hot loop of 4 instructions: movdqu, add, cmp, conditional jump
The performance increase is much more modest than the original improvements. From ~6500 to ~7000 calls per second on an AMD Ryzen 7 3700X desktop and from ~2600 to ~2900 on an Intel Xeon E3-1245 v5 server and from ~2000 to ~2200 on an Intel Xeon CPU E3-1265L v3 server with just 1 memory channel populated.
@nto if your previous measurements were done on an x86 system would you like to look at this patch set?