Open
Description
Description
I noticed that when running the gold models like SIR multiple times it can can anywhere from 100 seconds to 150 seconds on my machine with the develop branch. That's a pretty big variance! I think for the performance benchmarks we report back via Jenkins it would be a good idea to run the models 15-20 times and then report the mean and variance of the performance benchmark for each model.
@serban-nicusor-toptal can you point me to the code that the Jenkins uses to set off the performance tests? I can also modify the compare-hash.py file to report the mean and sd. If I remember right you had something like this in the works?
Current Version:
v2.20.0