Description
This PR: #760, introduced a bug that proved to be notoriously hard to diagnose.
The issue at hand seems to be that the metric used to determine whether something should run once or go through the pilot phase, is flaky. In fact, it appears that the JIT time is taken into account.
In short: if one invocation is ~7ns (this example) and the IterationTime
is set to 120ms, the change in #760 assumes that it can skip the pilot phase and sets invocation and unroll to 1
.
I think JIT'ing is skewing the metric, because once you change sprintf "test %A" "test"
to sprintf "test %s" "test"
, the problem goes away (and the difference is the JIT time). It also goes away when giving foo
a body with sprintf "test %A" "test"
. I don't know why.
Proposal
@adamsitnik, I can see a few ways forward:
-
Add a line in the output why the pilot phase is skipped. I spend two days searching for a cause, it is very puzzling if one irrelevant change in your own code leads to attributes being ignored
-
Improve the metric. Hopefully the two examples here and in Trivial methods can cause BDN to run only 1 iteration, skip pilot phase, and ignore most attributes #1337 will help analyzing the cause
-
Perhaps simplest of all: add an attribute, like
ForcePilotPhase
to override behavior of don't execute long operations more than once per iteration #760, or allowUnrollFactor
as an attribute, which would also override it, if I understand the PR correctly.
Minimal working example
Here's an example:
namespace BenchMarkBug
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open System.Globalization
module Test =
let getFunctionToTestOneTimeOverhead =
let foo = sprintf "test %A" "test"
fun() -> true
// you may need a slightly lower threshold to see the behavior mentioned
[<IterationTime(120.)>]
[<SimpleJob()>]
type BenchmarkPatterns() =
let foo =
()
[<Benchmark>]
member _.basicTest() =
Test.getFunctionToTestOneTimeOverhead ()
|> ignore
module Main =
[<EntryPoint>]
let main _ =
BenchmarkRunner.Run<BenchmarkPatterns>()
|> ignore
0 // return success
Output
The output shows that the pilot stage is skipped. However, the iteration time was set to 120ms, and there isn't any metric in the output below that tops that.
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.3928.0
// GC=Concurrent Workstation
// Job: Job-KABBQW(IterationTime=120.0000 ms)
OverheadJitting 1: 1 op, 234052.26 ns, 234.0523 us/op
WorkloadJitting 1: 1 op, 86470300.92 ns, 86.4703 ms/op
WorkloadWarmup 1: 1 op, 4311.49 ns, 4.3115 us/op
WorkloadWarmup 2: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup 3: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup 4: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup 5: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadWarmup 6: 1 op, 615.93 ns, 615.9270 ns/op
// BeforeActualRun
WorkloadActual 1: 1 op, 615.93 ns, 615.9270 ns/op
WorkloadActual 2: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadActual 3: 1 op, 615.93 ns, 615.9270 ns/op
WorkloadActual 4: 1 op, 307.96 ns, 307.9635 ns/op
WorkloadActual 5: 1 op, 307.96 ns, 307.9635 ns/op
Once I change the IterationTime
to 150ms, I get the expected output:
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.8.3928.0
// GC=Concurrent Workstation
// Job: Job-OTJYUL(IterationTime=150.0000 ms)
OverheadJitting 1: 1 op, 226969.10 ns, 226.9691 us/op
WorkloadJitting 1: 1 op, 89779676.75 ns, 89.7797 ms/op
WorkloadPilot 1: 4 op, 7699.09 ns, 1.9248 us/op
WorkloadPilot 2: 77931 op, 542323.73 ns, 6.9590 ns/op
WorkloadPilot 3: 21554745 op, 149653325.48 ns, 6.9429 ns/op
WorkloadPilot 4: 21604677 op, 149687201.47 ns, 6.9285 ns/op
WorkloadPilot 5: 21649824 op, 149388476.87 ns, 6.9002 ns/op
WorkloadPilot 6: 21738448 op, 152644575.01 ns, 7.0219 ns/op
WorkloadPilot 7: 21361828 op, 148062078.05 ns, 6.9312 ns/op
WorkloadPilot 8: 21641424 op, 149757725.11 ns, 6.9200 ns/op
WorkloadPilot 9: 21676435 op, 151683112.94 ns, 6.9976 ns/op
WorkloadPilot 10: 21435908 op, 147993094.23 ns, 6.9040 ns/op
WorkloadPilot 11: 21726596 op, 150739820.73 ns, 6.9380 ns/op
WorkloadWarmup 1: 21726596 op, 152429924.44 ns, 7.0158 ns/op
WorkloadWarmup 2: 21726596 op, 151122003.44 ns, 6.9556 ns/op
WorkloadWarmup 3: 21726596 op, 151391163.54 ns, 6.9680 ns/op
WorkloadWarmup 4: 21726596 op, 150467580.99 ns, 6.9255 ns/op
WorkloadWarmup 5: 21726596 op, 149482405.74 ns, 6.8802 ns/op
WorkloadWarmup 6: 21726596 op, 150461729.68 ns, 6.9252 ns/op
WorkloadWarmup 7: 21726596 op, 150266172.86 ns, 6.9162 ns/op
// BeforeActualRun
WorkloadActual 1: 21726596 op, 150254162.28 ns, 6.9157 ns/op
WorkloadActual 2: 21726596 op, 152880783.02 ns, 7.0366 ns/op
WorkloadActual 3: 21726596 op, 149523672.85 ns, 6.8821 ns/op
WorkloadActual 4: 21726596 op, 149548617.89 ns, 6.8832 ns/op
WorkloadActual 5: 21726596 op, 150369032.67 ns, 6.9210 ns/op