When compare two results it would be helpful to output a probability of one result be faster then other.
If times1 and times2 are sets of measured times, then the probability of the first benchmark being faster than the second one is estimated as:
sum(x < y for x in times1 for y in times2)/len(times1)/len(times2)
Actually you can sort one of sets and use binary search for optimization.