Open
Description
It is common, especially in floating point code, to unroll loops by four or two and thus ensuring concurrent execution of instructions on the same CPU core.
Here is an example of program that does this:
Notice how many instructions are repeated using different registers.
The task is to execute multiple at once and thus improving performance extensively for these programs.