Skip to content

perf(benchmark): add executor-aware metrics, automated benchmarking pipeline, and performance report#1533

Open
neetance wants to merge 1 commit intohyperledger-labs:mainfrom
neetance:perf/benchmark-executor-metrics-and-report
Open

perf(benchmark): add executor-aware metrics, automated benchmarking pipeline, and performance report#1533
neetance wants to merge 1 commit intohyperledger-labs:mainfrom
neetance:perf/benchmark-executor-metrics-and-report

Conversation

@neetance
Copy link
Copy Markdown
Contributor

Summary

Extends the benchmarking framework to support executor-aware analysis and improves visibility into system behavior under different execution strategies.

Changes

run_benchmarks.py:

  • New --executor flag (serial|unbounded|pool|all, default all) loops over
    all three executor strategies for every parallel benchmark
  • New --proof_type flag (bf|csp|all, default bf) loops over proof systems
  • New --duration and --cpus flags for easy CLI control
  • Column naming: TestParallelBenchmarkSender[pool]/8 tps encodes executor
    so all strategies coexist in one CSV row
  • Goroutine count parsed from 'Goroutines Created' field in runner output
    and stored as TestParallelBenchmarkSender[pool]/8 goroutines

plot_benchmark_results.py:

  • Left plot: TPS vs workers with one coloured line per executor strategy
  • Right plot: TPS vs mean latency (error bars = std, X = p95) with worker
    count annotations, coloured by executor strategy
  • Backward compatible: skips gracefully if executor columns are absent

runner.go:

  • GoRoutinesCreated field added to Result, captured as net delta of runtime.NumGoroutine() across the recording window
  • Printed in System Health section as Goroutines Created

Benchmark Results

I ran the benchmark across the 3 different strategies with 10 workers to show the number of goroutines created and here are the results:

  • Serial

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="serial"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10395     (Robust Sample)
Duration         30.024s   (Good Duration)
Real Throughput  346.22/s  Observed Ops/sec (Wall Clock)
Pure Throughput  346.48/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           19.877017ms  
 P50 (Median)  28.467144ms  
 Average       28.861544ms  
 P95           34.022228ms  
 P99           39.539328ms  
 P99.9         48.759763ms  
 Max           54.36025ms   (Stable Tail)

Stability Metrics:
 Std Dev  3.181664ms  
 IQR      3.871098ms  Interquartile Range
 Jitter   2.765293ms  Avg delta per worker
 CV       11.02%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              711008 B/op     Allocated bytes per operation
 Allocs              7755 allocs/op  Allocations per operation
 Alloc Rate          233.02 MB/s     Memory pressure on system
 GC Overhead         4.50%           (High GC Pressure)
 GC Pause            1.350151185s    Total Stop-The-World time
 GC Cycles           3414            Full garbage collection cycles
 Goroutines Created  49              Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 19.877017ms-20.902475ms  1      (0.0%)
 20.902475ms-21.980837ms  4      (0.0%)
 21.980837ms-23.114832ms  50     (0.5%)
 23.114832ms-24.307329ms  300   █████ (2.9%)
 24.307329ms-25.561348ms  920   ██████████████████ (8.9%)
 25.561348ms-26.880062ms  1596  ███████████████████████████████ (15.4%)
 26.880062ms-28.266809ms  2028  ████████████████████████████████████████ (19.5%)
 28.266809ms-29.725098ms  1919  █████████████████████████████████████ (18.5%)
 29.725098ms-31.258621ms  1629  ████████████████████████████████ (15.7%)
 31.258621ms-32.871258ms  1035  ████████████████████ (10.0%)
 32.871258ms-34.567091ms  503   █████████ (4.8%)
 34.567091ms-36.350413ms  190   ███ (1.8%)
 36.350413ms-38.225736ms  92    █ (0.9%)
 38.225736ms-40.197808ms  37     (0.4%)
 40.197808ms-42.271619ms  38     (0.4%)
 42.271619ms-44.452418ms  19     (0.2%)
 44.452418ms-46.745726ms  14     (0.1%)
 46.745726ms-49.157345ms  10     (0.1%)
 49.157345ms-51.69338ms   5      (0.0%)
 51.69338ms-54.36025ms    5      (0.0%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7755/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆] (Max: 370 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (55.99s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (55.97s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      56.018s

  • Pool

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="pool"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        11106     (Robust Sample)
Duration         30.018s   (Good Duration)
Real Throughput  369.98/s  Observed Ops/sec (Wall Clock)
Pure Throughput  370.31/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           17.50233ms   
 P50 (Median)  26.55083ms   
 Average       27.00462ms   
 P95           32.95457ms   
 P99           37.392969ms  
 P99.9         52.041343ms  
 Max           71.166264ms  (Stable Tail)

Stability Metrics:
 Std Dev  3.583152ms  
 IQR      4.125824ms  Interquartile Range
 Jitter   3.26569ms   Avg delta per worker
 CV       13.27%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              711489 B/op     Allocated bytes per operation
 Allocs              7766 allocs/op  Allocations per operation
 Alloc Rate          249.11 MB/s     Memory pressure on system
 GC Overhead         5.09%           (Severe GC Thrashing)
 GC Pause            1.528197863s    Total Stop-The-World time
 GC Cycles           3396            Full garbage collection cycles
 Goroutines Created  201             Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 17.50233ms-18.773912ms   11     (0.1%)
 18.773912ms-20.137877ms  49     (0.4%)
 20.137877ms-21.600938ms  235   ███ (2.1%)
 21.600938ms-23.170293ms  847   ████████████ (7.6%)
 23.170293ms-24.853665ms  1907  ████████████████████████████ (17.2%)
 24.853665ms-26.659337ms  2659  ████████████████████████████████████████ (23.9%)
 26.659337ms-28.596196ms  2418  ████████████████████████████████████ (21.8%)
 28.596196ms-30.673772ms  1557  ███████████████████████ (14.0%)
 30.673772ms-32.902288ms  855   ████████████ (7.7%)
 32.902288ms-35.29271ms   355   █████ (3.2%)
 35.29271ms-37.856802ms   117   █ (1.1%)
 37.856802ms-40.607181ms  38     (0.3%)
 40.607181ms-43.557381ms  15     (0.1%)
 43.557381ms-46.721919ms  13     (0.1%)
 46.721919ms-50.116368ms  13     (0.1%)
 50.116368ms-53.757431ms  9      (0.1%)
 53.757431ms-57.663025ms  5      (0.0%)
 57.663025ms-61.852368ms  1      (0.0%)
 66.346077ms-71.166264ms  2      (0.0%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7766/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█▇▇▇▇▆▇▇] (Max: 388 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.47s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.45s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.492s

  • Unbounded

go test ./token/core/zkatdlog/nogh/v1/validator     -test.run=TestParallelBenchmarkValidatorTransfer     -test.v -test.timeout 0     -bits="32" -curves="BLS12_381_BBS_GURVY"     -num_inputs="2" -num_outputs="2"     -workers="10" -duration="30s" -setup_samples=128     -executor="unbounded"
=== RUN   TestParallelBenchmarkValidatorTransfer
=== RUN   TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers
Metric           Value     Description
------           -----     -----------
Workers          10        
Total Ops        10595     (Robust Sample)
Duration         30.022s   (Good Duration)
Real Throughput  352.91/s  Observed Ops/sec (Wall Clock)
Pure Throughput  353.17/s  Theoretical Max (Low Overhead)

Latency Distribution:
 Min           18.687629ms  
 P50 (Median)  27.975221ms  
 Average       28.315086ms  
 P95           34.032515ms  
 P99           38.579637ms  
 P99.9         60.769941ms  
 Max           70.516977ms  (Stable Tail)

Stability Metrics:
 Std Dev  3.66511ms   
 IQR      4.111613ms  Interquartile Range
 Jitter   3.322761ms  Avg delta per worker
 CV       12.94%      Moderate Variance (10-20%)

System Health & Reliability:
 Error Rate          0.0000%         (100% Success) (0 errors)
 Memory              713795 B/op     Allocated bytes per operation
 Allocs              7755 allocs/op  Allocations per operation
 Alloc Rate          237.49 MB/s     Memory pressure on system
 GC Overhead         5.29%           (Severe GC Thrashing)
 GC Pause            1.589542611s    Total Stop-The-World time
 GC Cycles           3497            Full garbage collection cycles
 Goroutines Created  0               Net goroutines above baseline during recording

Latency Heatmap (Dynamic Range):
Range                     Freq  Distribution Graph
 18.687629ms-19.970602ms  15     (0.1%)
 19.970602ms-21.341657ms  79    █ (0.7%)
 21.341657ms-22.80684ms   223   ███ (2.1%)
 22.80684ms-24.372613ms   704   ███████████ (6.6%)
 24.372613ms-26.045882ms  1653  ███████████████████████████ (15.6%)
 26.045882ms-27.834027ms  2434  ████████████████████████████████████████ (23.0%)
 27.834027ms-29.744934ms  2431  ███████████████████████████████████████ (22.9%)
 29.744934ms-31.787033ms  1688  ███████████████████████████ (15.9%)
 31.787033ms-33.969329ms  827   █████████████ (7.8%)
 33.969329ms-36.301447ms  363   █████ (3.4%)
 36.301447ms-38.793674ms  79    █ (0.7%)
 38.793674ms-41.457002ms  34     (0.3%)
 41.457002ms-44.303176ms  15     (0.1%)
 44.303176ms-47.344751ms  13     (0.1%)
 47.344751ms-50.595141ms  12     (0.1%)
 50.595141ms-54.068682ms  5      (0.0%)
 54.068682ms-57.780695ms  4      (0.0%)
 57.780695ms-61.747551ms  7      (0.1%)
 61.747551ms-65.986745ms  2      (0.0%)
 65.986745ms-70.516977ms  7      (0.1%)

--- Analysis & Recommendations ---
[INFO] High Allocations (7755/op). This will trigger frequent GC cycles and increase Max Latency.
----------------------------------

--- Throughput Timeline ---
Timeline: [▇▇▇▇█▇▇▇▇▇▇▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇] (Max: 374 ops/s)

--- PASS: TestParallelBenchmarkValidatorTransfer (44.57s)
    --- PASS: TestParallelBenchmarkValidatorTransfer/Setup(bits_32,_curve_BLS12_381_BBS_GURVY,_#i_2,_#o_2)_with_10_workers (44.55s)
PASS
ok      github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/validator      44.594s

Comparison report

I have attached the pdf of the comparison report of the results of different benchmarks with different configurations
benchmark_results.pdf

Let me know if this is good 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

Hi @neetance , great effort. I really appreciate.
Please, open a Github Issue about this feature.

Thanks a lot 🙏

@adecaro adecaro self-requested a review April 13, 2026 05:49
@adecaro adecaro self-assigned this Apr 13, 2026
@adecaro adecaro added this to the Q2/26 milestone Apr 13, 2026
…d comparison plots

Signed-off-by: Ankit Basu <ankitbasu14@gmail.com>
@adecaro adecaro force-pushed the perf/benchmark-executor-metrics-and-report branch from a4921ee to 72b7588 Compare April 13, 2026 05:49
@neetance
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @adecaro 🙏
I have opened issue #1542 related to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants