Description
I've looked again at the fastest mandelbrot implementation on benchmarksgame and made a list of things we need to get before we can even begin to think about challenging the top.
-
the x values are the same for each line, only the y changes. We should generate them in the beginning instead of dynamically recalculating them in the loop.
-
the benchmark only requires that we output a black/white image to stdout. Currently, our output code determines at runtime whether to print in colors or in B/W. We should either remove this feature, or keep it behind a feature flag. We can make massive gains in performance if we only store a true/false bitmap result.
-
after that is done, we can simply
write
the raw bytes tostdout
if it's a B/W image. -
their code manages to convert the Mandelbrot result to the image's bytes very quickly by creating a mask (with
movmskpd
), then doing some bit fiddling with it. -
the fastest code beats the second fastest one by having a "pruning" mechanism. The final image is very uneven, some pixels diverge almost immediately, others take quite a while to diverge. There are some heuristics we can use to guess if the current SIMD block is likely to diverge.
All of these are already implemented by the currently fastest Rust implementation, which is a simple, unidiomatic port of the fastest C/C++ code; it's just lacking SIMD :)