|
| 1 | +# ZKAT DLog No Graph Hiding Benchmark |
| 2 | + |
| 3 | +Packages with benchmark tests: |
| 4 | + |
| 5 | +- `token/core/zkatdlog/nogh/v1/transfer`: |
| 6 | + - `BenchmarkSender`, `BenchmarkVerificationSenderProof`, `TestParallelBenchmarkSender`, and `TestParallelBenchmarkVerificationSenderProof` are used to benchmark the generation of a transfer action. This includes also the generation of ZK proof for a transfer operation. |
| 7 | + - `BenchmarkTransferProofGeneration`, `TestParallelBenchmarkTransferProofGeneration` are used to benchmark the generation of ZK proof alone. |
| 8 | +- `token/core/zkatdlog/nogh/v1/issue`: `BenchmarkIssuer` and `BenchmarkProofVerificationIssuer` |
| 9 | +- `token/core/zkatdlog/nogh/v1`: `BenchmarkTransfer` |
| 10 | + |
| 11 | +The steps necessary to run the benchmarks are very similar. |
| 12 | +We give two examples here: |
| 13 | +- `token/core/zkatdlog/nogh/v1/transfer#BenchmarkSender`, and |
| 14 | +- `token/core/zkatdlog/nogh/v1/transfer#TestParallelBenchmarkSender` |
| 15 | + |
| 16 | +## Benchmark: `token/core/zkatdlog/nogh/v1/transfer#BenchmarkSender` |
| 17 | + |
| 18 | +In this Section, we go through the steps necessary to run the benchmark and interpret the results. |
| 19 | +For the other benchmarks the process is the same. |
| 20 | + |
| 21 | +### Overview |
| 22 | + |
| 23 | +`BenchmarkSender` measures the cost of generating a zero-knowledge transfer (ZK transfer) using the DLog no-graph-hiding sender implementation and serializing the resulting transfer object. |
| 24 | +Concretely, each benchmark iteration constructs the required sender environment, invokes `GenerateZKTransfer(...)`, and calls `Serialize()` on the returned transfer - so the measured time includes ZK transfer construction and serialization. |
| 25 | + |
| 26 | +The benchmark is implemented to run the same workload across a matrix of parameters (bit sizes, curve choices, number of inputs and outputs). |
| 27 | +A helper inside the test (`generateBenchmarkCases`) programmatically generates all combinations of the selected parameters. |
| 28 | + |
| 29 | +### Parameters |
| 30 | + |
| 31 | +The benchmark accepts the following tunable parameters: |
| 32 | + |
| 33 | +- Bits: integer bit sizes used for some setup (e.g., 32, 64). This is passed to the test setup code. |
| 34 | +- CurveID: the `math.CurveID` used (examples: `BN254`, `BLS12_381_BBS_GURVY`). |
| 35 | +- NumInputs: number of input tokens provided to the sender (1, 2, ...). |
| 36 | +- NumOutputs: number of outputs produced by the transfer (1, 2, ...). |
| 37 | + |
| 38 | +These parameters can be configured from the command line using the following flags: |
| 39 | + |
| 40 | +```shell |
| 41 | + -bits string |
| 42 | + a comma-separated list of bit sizes (32, 64,...) |
| 43 | + -curves string |
| 44 | + comma-separated list of curves. Supported curves are: FP256BN_AMCL, BN254, FP256BN_AMCL_MIRACL, BLS12_381_BBS, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG |
| 45 | + -num_inputs string |
| 46 | + a comma-separate list of number of inputs (1,2,3,...) |
| 47 | + -num_outputs string |
| 48 | + a comma-separate list of number of outputs (1,2,3,...) |
| 49 | +``` |
| 50 | + |
| 51 | +### Default parameter set used in the benchmark |
| 52 | + |
| 53 | +If no flag is used, the test file currently uses the following parameter slices (so the resulting combinations are the Cartesian product of these lists): |
| 54 | + |
| 55 | +- bits: [32, 64] |
| 56 | +- curves: [BN254, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG] |
| 57 | +- inputs: [1, 2, 3] |
| 58 | +- outputs: [1, 2, 3] |
| 59 | + |
| 60 | +This produces 2 (bits) * 3 (curves) * 3 (inputs) * 3 (outputs) = 54 sub-benchmarks. |
| 61 | +Each sub-benchmark runs the standard `b.N` iterations and reports time and allocation statistics. |
| 62 | + |
| 63 | +### How to run |
| 64 | + |
| 65 | +Run the benchmark for the package containing the sender benchmarks: |
| 66 | + |
| 67 | +```sh |
| 68 | +# run the BenchmarkSender benchmarks in the transfer package |
| 69 | +go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=1 -cpu=1 -timeout 0 -run=^$ |
| 70 | +``` |
| 71 | +> Notice that: |
| 72 | +> - `-run=^$` has the effect to avoid running any other unit-test present in the package. |
| 73 | +> - `-timeout 0` disables the test timeout. |
| 74 | +
|
| 75 | +If you want to run the benchmark repeatedly and save results to a file: |
| 76 | + |
| 77 | +```sh |
| 78 | +go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=10 -cpu=1 -timeout 0 -run=^$ | tee bench.txt |
| 79 | +``` |
| 80 | + |
| 81 | +Note: `-count` controls how many times the test binary is executed (useful to reduce variance); `-benchmem` reports allocation statistics. |
| 82 | + |
| 83 | +You can also change the parameters: |
| 84 | + |
| 85 | +```shell |
| 86 | +go test ./token/core/zkatdlog/nogh/v1/transfer -test.bench=BenchmarkSender -test.benchmem -test.count=10 -test.cpu=1 -test.timeout 0 -test.run=^$ -bits="32" -curves="BN254" -num_inputs="2" -num_outputs="2" | tee bench.txt |
| 87 | +``` |
| 88 | + |
| 89 | +> Notice that in this the above case, the `go test` options must be prefixed with `test.` otherwise the tool will fail. |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | +### Notes and best practices |
| 94 | + |
| 95 | +- Be mindful of the Cartesian explosion: combining many bit sizes, curves, input counts and output counts can produce many sub-benchmarks. |
| 96 | + For CI or quick local runs, reduce the parameter lists to a small subset (for example: one bit size, one curve, and 1-2 input/output sizes). |
| 97 | +- The benchmark creates `b.N` independent sender environments (via `NewBenchmarkSenderEnv`) and runs `GenerateZKTransfer` for each environment in the inner loop — so memory and setup cost scale with `b.N` during setup. |
| 98 | +- If you need to measure only the transfer-generation time and omit setup, consider modifying the benchmark to move expensive one-time setup out of the measured region and call `b.ResetTimer()` appropriately (the current benchmark already calls `b.ResetTimer()` before the inner loop). |
| 99 | + |
| 100 | +### Collecting and interpreting results |
| 101 | + |
| 102 | +A typical run prints timings per sub-benchmark (ns/op) and allocation statistics. Example command to persist results: |
| 103 | + |
| 104 | +```sh |
| 105 | +go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=10 -cpu=1 -timeout 0 -run=^$ | tee bench.txt |
| 106 | +``` |
| 107 | + |
| 108 | +You can then aggregate/parse the output (e.g., benchstat) to compute averages across `-count` repetitions. |
| 109 | + |
| 110 | +### Results |
| 111 | + |
| 112 | +Example results have been produced on an Apple M1 Max and can be consulted [here](./transfer_BenchmarkSender_results.md). |
| 113 | + |
| 114 | +## Benchmark: `token/core/zkatdlog/nogh/v1/transfer#TestParallelBenchmarkSender` |
| 115 | + |
| 116 | +This is a test that runs multiple instances of the above benchmark in parallel. |
| 117 | +This allows the analyst to understand if shared data structures are actual bottlenecks. |
| 118 | + |
| 119 | +It uses a custom-made runner whose documentation can be found [here](../../../token/core/common/benchmark/runner.md). |
| 120 | + |
| 121 | +```shell |
| 122 | +go test ./token/core/zkatdlog/nogh/v1/transfer -test.run=TestParallelBenchmarkSender -test.v -test.benchmem -test.timeout 0 -bits="32" -curves="BN254" -num_inputs="2" -num_outputs="2" -workers="1,10" -duration="10s" | tee bench.txt |
| 123 | +``` |
| 124 | + |
| 125 | +The test supports the following flags: |
| 126 | +```shell |
| 127 | + -bits string |
| 128 | + a comma-separated list of bit sizes (32, 64,...) |
| 129 | + -curves string |
| 130 | + comma-separated list of curves. Supported curves are: FP256BN_AMCL, BN254, FP256BN_AMCL_MIRACL, BLS12_381_BBS, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG |
| 131 | + -duration duration |
| 132 | + test duration (1s, 1m, 1h,...) (default 1s) |
| 133 | + -num_inputs string |
| 134 | + a comma-separate list of number of inputs (1,2,3,...) |
| 135 | + -num_outputs string |
| 136 | + a comma-separate list of number of outputs (1,2,3,...) |
| 137 | + -workers string |
| 138 | + a comma-separate list of workers (1,2,3,...,NumCPU), where NumCPU is converted to the number of available CPUs |
| 139 | +``` |
| 140 | + |
| 141 | +### Results |
| 142 | + |
| 143 | +```go |
| 144 | +=== RUN TestParallelBenchmarkSender |
| 145 | +=== RUN TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_1_workers |
| 146 | +Metric Value Description |
| 147 | +------ ----- ----------- |
| 148 | +Workers 1 |
| 149 | +Total Ops 168 (Low Sample Size) |
| 150 | +Duration 10.023390959s (Good Duration) |
| 151 | +Real Throughput 16.76/s Observed Ops/sec (Wall Clock) |
| 152 | +Pure Throughput 17.77/s Theoretical Max (Low Overhead) |
| 153 | + |
| 154 | +Latency Distribution: |
| 155 | + Min 55.180375ms |
| 156 | + P50 (Median) 55.945812ms |
| 157 | + Average 56.290356ms |
| 158 | + P95 58.108814ms |
| 159 | + P99 58.758087ms |
| 160 | + Max 59.089958ms (Stable Tail) |
| 161 | + |
| 162 | +Stability Metrics: |
| 163 | + Std Dev 898.087µs |
| 164 | + IQR 1.383083ms Interquartile Range |
| 165 | + Jitter 590.076µs Avg delta per worker |
| 166 | + CV 1.60% Excellent Stability (<5%) |
| 167 | + |
| 168 | +Memory 1301420 B/op Allocated bytes per operation |
| 169 | +Allocs 18817 allocs/op Allocations per operation |
| 170 | + |
| 171 | +Latency Heatmap (Dynamic Range): |
| 172 | +Range Freq Distribution Graph |
| 173 | + 55.180375ms-55.369563ms 17 █████████████████████████ (10.1%) |
| 174 | + 55.369563ms-55.5594ms 18 ██████████████████████████ (10.7%) |
| 175 | + 55.5594ms-55.749887ms 27 ████████████████████████████████████████ (16.1%) |
| 176 | + 55.749887ms-55.941028ms 20 █████████████████████████████ (11.9%) |
| 177 | + 55.941028ms-56.132824ms 13 ███████████████████ (7.7%) |
| 178 | + 56.132824ms-56.325277ms 9 █████████████ (5.4%) |
| 179 | + 56.325277ms-56.51839ms 4 █████ (2.4%) |
| 180 | + 56.51839ms-56.712165ms 6 ████████ (3.6%) |
| 181 | + 56.712165ms-56.906605ms 9 █████████████ (5.4%) |
| 182 | + 56.906605ms-57.101711ms 13 ███████████████████ (7.7%) |
| 183 | + 57.101711ms-57.297486ms 10 ██████████████ (6.0%) |
| 184 | + 57.297486ms-57.493933ms 3 ████ (1.8%) |
| 185 | + 57.493933ms-57.691053ms 3 ████ (1.8%) |
| 186 | + 57.691053ms-57.888849ms 4 █████ (2.4%) |
| 187 | + 57.888849ms-58.087323ms 3 ████ (1.8%) |
| 188 | + 58.087323ms-58.286478ms 2 ██ (1.2%) |
| 189 | + 58.286478ms-58.486315ms 2 ██ (1.2%) |
| 190 | + 58.486315ms-58.686837ms 2 ██ (1.2%) |
| 191 | + 58.686837ms-58.888047ms 2 ██ (1.2%) |
| 192 | + 58.888047ms-59.089958ms 1 █ (0.6%) |
| 193 | + |
| 194 | +--- Analysis & Recommendations --- |
| 195 | +[WARN] Low sample size (168). Results may not be statistically significant. Run for longer. |
| 196 | +[INFO] High Allocations (18817/op). This will trigger frequent GC cycles and increase Max Latency. |
| 197 | +---------------------------------- |
| 198 | +=== RUN TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_10_workers |
| 199 | +Metric Value Description |
| 200 | +------ ----- ----------- |
| 201 | +Workers 10 |
| 202 | +Total Ops 1232 (Low Sample Size) |
| 203 | +Duration 10.070877291s (Good Duration) |
| 204 | +Real Throughput 122.33/s Observed Ops/sec (Wall Clock) |
| 205 | +Pure Throughput 130.12/s Theoretical Max (Low Overhead) |
| 206 | + |
| 207 | +Latency Distribution: |
| 208 | + Min 61.2545ms |
| 209 | + P50 (Median) 75.461375ms |
| 210 | + Average 76.852256ms |
| 211 | + P95 93.50851ms |
| 212 | + P99 106.198982ms |
| 213 | + Max 144.872375ms (Stable Tail) |
| 214 | + |
| 215 | +Stability Metrics: |
| 216 | + Std Dev 9.28799ms |
| 217 | + IQR 10.909229ms Interquartile Range |
| 218 | + Jitter 9.755984ms Avg delta per worker |
| 219 | + CV 12.09% Moderate Variance (10-20%) |
| 220 | + |
| 221 | +Memory 1282384 B/op Allocated bytes per operation |
| 222 | +Allocs 18668 allocs/op Allocations per operation |
| 223 | + |
| 224 | +Latency Heatmap (Dynamic Range): |
| 225 | +Range Freq Distribution Graph |
| 226 | + 61.2545ms-63.948502ms 36 ███████ (2.9%) |
| 227 | + 63.948502ms-66.760987ms 86 █████████████████ (7.0%) |
| 228 | + 66.760987ms-69.697167ms 152 ███████████████████████████████ (12.3%) |
| 229 | + 69.697167ms-72.762481ms 181 █████████████████████████████████████ (14.7%) |
| 230 | + 72.762481ms-75.962609ms 195 ████████████████████████████████████████ (15.8%) |
| 231 | + 75.962609ms-79.303481ms 179 ████████████████████████████████████ (14.5%) |
| 232 | + 79.303481ms-82.791286ms 152 ███████████████████████████████ (12.3%) |
| 233 | + 82.791286ms-86.432486ms 94 ███████████████████ (7.6%) |
| 234 | + 86.432486ms-90.233828ms 59 ████████████ (4.8%) |
| 235 | + 90.233828ms-94.202355ms 40 ████████ (3.2%) |
| 236 | + 94.202355ms-98.345419ms 29 █████ (2.4%) |
| 237 | + 98.345419ms-102.670697ms 9 █ (0.7%) |
| 238 | + 102.670697ms-107.186203ms 8 █ (0.6%) |
| 239 | + 107.186203ms-111.900303ms 4 (0.3%) |
| 240 | + 111.900303ms-116.821732ms 2 (0.2%) |
| 241 | + 116.821732ms-121.959608ms 3 (0.2%) |
| 242 | + 121.959608ms-127.32345ms 1 (0.1%) |
| 243 | + 127.32345ms-132.923196ms 1 (0.1%) |
| 244 | + 138.769222ms-144.872375ms 1 (0.1%) |
| 245 | + |
| 246 | +--- Analysis & Recommendations --- |
| 247 | +[WARN] Low sample size (1232). Results may not be statistically significant. Run for longer. |
| 248 | +[INFO] High Allocations (18668/op). This will trigger frequent GC cycles and increase Max Latency. |
| 249 | +---------------------------------- |
| 250 | +--- PASS: TestParallelBenchmarkSender (20.83s) |
| 251 | + --- PASS: TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_1_workers (10.39s) |
| 252 | + --- PASS: TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_10_workers (10.44s) |
| 253 | +PASS |
| 254 | +ok github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/transfer 21.409s |
| 255 | +``` |
0 commit comments