fix Rust fannkuch redux performance #16
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I saw some oddities in the graphs comparing the performances between Rust and C, so I decided to fix it at least a little. The main problems seemed to be lots of i32 to usize conversions, Rust wasnt too happy about those and it also doesn't make sense to keep non-decrementing integers as i32, since they cannot be negative.
Another one was having
self.t[..current_max_n].copy_from_slice(&self.s[..current_max_n]);in a hot loop with tiny ranges, having a simple for loop is just faster and actually closer to the other implementationsI also removed the allocation for the input args, since they don't need to be allocated and they aren't allocated in the C implementation
I got around ~20% performance improvement with just these simple changes
I also noticed that you are doing printing in the benchmarks, this is pretty bad for quick programs, for example in rust fannkuch-redux with input of
6, the actual calculation takes ~25µs but printing takes 530µs, so 95% of the benchmark time is just the printing. This seems pretty inaccurate when it comes to comparing languages, you are essentially just comparing printing functions of eachI didn't do anything to the print function, but you really should run the benchmarks in a way where they do not contain any printing or such
also I saw that you are running the benchmarks with
-O2optimizations but you are running Zig with-OReleaseSafe, these are most certainly not 1 to 1 comparable, the Zig's release safe is more like-O3+ bounds checks, so all the benchmarks are just twitsted into Zigs favor. Rust having bounds checks on normal release means that-O3for Rust is much more comparable to-OReleaseSafeI would honestly just run everyone of them with
-O3/releasebut have additional benchmark for Zig for-OReleaseSafe, this would be much more accurate to the real life performance than-O2