Skip to content

Commit 39a2b82

Browse files
authored
dlog transfer service: benchmark #1281 (#1282)
Signed-off-by: Angelo De Caro <adc@zurich.ibm.com>
1 parent bc2617d commit 39a2b82

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4489
-938
lines changed

docs/benchmark/benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Benchmark
22

33
- [Go Tools for Benchmarks](./tools.md)
4-
- [ZKAT DLog No Graph-Hiding](./dlognogh.md)
4+
- [ZKAT DLog No Graph-Hiding](dlognogh/dlognogh.md)

docs/benchmark/dlognogh.md

Lines changed: 0 additions & 605 deletions
This file was deleted.
Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# ZKAT DLog No Graph Hiding Benchmark
2+
3+
Packages with benchmark tests:
4+
5+
- `token/core/zkatdlog/nogh/v1/transfer`:
6+
- `BenchmarkSender`, `BenchmarkVerificationSenderProof`, `TestParallelBenchmarkSender`, and `TestParallelBenchmarkVerificationSenderProof` are used to benchmark the generation of a transfer action. This includes also the generation of ZK proof for a transfer operation.
7+
- `BenchmarkTransferProofGeneration`, `TestParallelBenchmarkTransferProofGeneration` are used to benchmark the generation of ZK proof alone.
8+
- `token/core/zkatdlog/nogh/v1/issue`: `BenchmarkIssuer` and `BenchmarkProofVerificationIssuer`
9+
- `token/core/zkatdlog/nogh/v1`: `BenchmarkTransfer`
10+
11+
The steps necessary to run the benchmarks are very similar.
12+
We give two examples here:
13+
- `token/core/zkatdlog/nogh/v1/transfer#BenchmarkSender`, and
14+
- `token/core/zkatdlog/nogh/v1/transfer#TestParallelBenchmarkSender`
15+
16+
## Benchmark: `token/core/zkatdlog/nogh/v1/transfer#BenchmarkSender`
17+
18+
In this Section, we go through the steps necessary to run the benchmark and interpret the results.
19+
For the other benchmarks the process is the same.
20+
21+
### Overview
22+
23+
`BenchmarkSender` measures the cost of generating a zero-knowledge transfer (ZK transfer) using the DLog no-graph-hiding sender implementation and serializing the resulting transfer object.
24+
Concretely, each benchmark iteration constructs the required sender environment, invokes `GenerateZKTransfer(...)`, and calls `Serialize()` on the returned transfer - so the measured time includes ZK transfer construction and serialization.
25+
26+
The benchmark is implemented to run the same workload across a matrix of parameters (bit sizes, curve choices, number of inputs and outputs).
27+
A helper inside the test (`generateBenchmarkCases`) programmatically generates all combinations of the selected parameters.
28+
29+
### Parameters
30+
31+
The benchmark accepts the following tunable parameters:
32+
33+
- Bits: integer bit sizes used for some setup (e.g., 32, 64). This is passed to the test setup code.
34+
- CurveID: the `math.CurveID` used (examples: `BN254`, `BLS12_381_BBS_GURVY`).
35+
- NumInputs: number of input tokens provided to the sender (1, 2, ...).
36+
- NumOutputs: number of outputs produced by the transfer (1, 2, ...).
37+
38+
These parameters can be configured from the command line using the following flags:
39+
40+
```shell
41+
-bits string
42+
a comma-separated list of bit sizes (32, 64,...)
43+
-curves string
44+
comma-separated list of curves. Supported curves are: FP256BN_AMCL, BN254, FP256BN_AMCL_MIRACL, BLS12_381_BBS, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG
45+
-num_inputs string
46+
a comma-separate list of number of inputs (1,2,3,...)
47+
-num_outputs string
48+
a comma-separate list of number of outputs (1,2,3,...)
49+
```
50+
51+
### Default parameter set used in the benchmark
52+
53+
If no flag is used, the test file currently uses the following parameter slices (so the resulting combinations are the Cartesian product of these lists):
54+
55+
- bits: [32, 64]
56+
- curves: [BN254, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG]
57+
- inputs: [1, 2, 3]
58+
- outputs: [1, 2, 3]
59+
60+
This produces 2 (bits) * 3 (curves) * 3 (inputs) * 3 (outputs) = 54 sub-benchmarks.
61+
Each sub-benchmark runs the standard `b.N` iterations and reports time and allocation statistics.
62+
63+
### How to run
64+
65+
Run the benchmark for the package containing the sender benchmarks:
66+
67+
```sh
68+
# run the BenchmarkSender benchmarks in the transfer package
69+
go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=1 -cpu=1 -timeout 0 -run=^$
70+
```
71+
> Notice that:
72+
> - `-run=^$` has the effect to avoid running any other unit-test present in the package.
73+
> - `-timeout 0` disables the test timeout.
74+
75+
If you want to run the benchmark repeatedly and save results to a file:
76+
77+
```sh
78+
go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=10 -cpu=1 -timeout 0 -run=^$ | tee bench.txt
79+
```
80+
81+
Note: `-count` controls how many times the test binary is executed (useful to reduce variance); `-benchmem` reports allocation statistics.
82+
83+
You can also change the parameters:
84+
85+
```shell
86+
go test ./token/core/zkatdlog/nogh/v1/transfer -test.bench=BenchmarkSender -test.benchmem -test.count=10 -test.cpu=1 -test.timeout 0 -test.run=^$ -bits="32" -curves="BN254" -num_inputs="2" -num_outputs="2" | tee bench.txt
87+
```
88+
89+
> Notice that in this the above case, the `go test` options must be prefixed with `test.` otherwise the tool will fail.
90+
91+
92+
93+
### Notes and best practices
94+
95+
- Be mindful of the Cartesian explosion: combining many bit sizes, curves, input counts and output counts can produce many sub-benchmarks.
96+
For CI or quick local runs, reduce the parameter lists to a small subset (for example: one bit size, one curve, and 1-2 input/output sizes).
97+
- The benchmark creates `b.N` independent sender environments (via `NewBenchmarkSenderEnv`) and runs `GenerateZKTransfer` for each environment in the inner loop — so memory and setup cost scale with `b.N` during setup.
98+
- If you need to measure only the transfer-generation time and omit setup, consider modifying the benchmark to move expensive one-time setup out of the measured region and call `b.ResetTimer()` appropriately (the current benchmark already calls `b.ResetTimer()` before the inner loop).
99+
100+
### Collecting and interpreting results
101+
102+
A typical run prints timings per sub-benchmark (ns/op) and allocation statistics. Example command to persist results:
103+
104+
```sh
105+
go test ./token/core/zkatdlog/nogh/v1/transfer -bench=BenchmarkSender -benchmem -count=10 -cpu=1 -timeout 0 -run=^$ | tee bench.txt
106+
```
107+
108+
You can then aggregate/parse the output (e.g., benchstat) to compute averages across `-count` repetitions.
109+
110+
### Results
111+
112+
Example results have been produced on an Apple M1 Max and can be consulted [here](./transfer_BenchmarkSender_results.md).
113+
114+
## Benchmark: `token/core/zkatdlog/nogh/v1/transfer#TestParallelBenchmarkSender`
115+
116+
This is a test that runs multiple instances of the above benchmark in parallel.
117+
This allows the analyst to understand if shared data structures are actual bottlenecks.
118+
119+
It uses a custom-made runner whose documentation can be found [here](../../../token/core/common/benchmark/runner.md).
120+
121+
```shell
122+
go test ./token/core/zkatdlog/nogh/v1/transfer -test.run=TestParallelBenchmarkSender -test.v -test.benchmem -test.timeout 0 -bits="32" -curves="BN254" -num_inputs="2" -num_outputs="2" -workers="1,10" -duration="10s" | tee bench.txt
123+
```
124+
125+
The test supports the following flags:
126+
```shell
127+
-bits string
128+
a comma-separated list of bit sizes (32, 64,...)
129+
-curves string
130+
comma-separated list of curves. Supported curves are: FP256BN_AMCL, BN254, FP256BN_AMCL_MIRACL, BLS12_381_BBS, BLS12_381_BBS_GURVY, BLS12_381_BBS_GURVY_FAST_RNG
131+
-duration duration
132+
test duration (1s, 1m, 1h,...) (default 1s)
133+
-num_inputs string
134+
a comma-separate list of number of inputs (1,2,3,...)
135+
-num_outputs string
136+
a comma-separate list of number of outputs (1,2,3,...)
137+
-workers string
138+
a comma-separate list of workers (1,2,3,...,NumCPU), where NumCPU is converted to the number of available CPUs
139+
```
140+
141+
### Results
142+
143+
```go
144+
=== RUN TestParallelBenchmarkSender
145+
=== RUN TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_1_workers
146+
Metric Value Description
147+
------ ----- -----------
148+
Workers 1
149+
Total Ops 168 (Low Sample Size)
150+
Duration 10.023390959s (Good Duration)
151+
Real Throughput 16.76/s Observed Ops/sec (Wall Clock)
152+
Pure Throughput 17.77/s Theoretical Max (Low Overhead)
153+
154+
Latency Distribution:
155+
Min 55.180375ms
156+
P50 (Median) 55.945812ms
157+
Average 56.290356ms
158+
P95 58.108814ms
159+
P99 58.758087ms
160+
Max 59.089958ms (Stable Tail)
161+
162+
Stability Metrics:
163+
Std Dev 898.087µs
164+
IQR 1.383083ms Interquartile Range
165+
Jitter 590.076µs Avg delta per worker
166+
CV 1.60% Excellent Stability (<5%)
167+
168+
Memory 1301420 B/op Allocated bytes per operation
169+
Allocs 18817 allocs/op Allocations per operation
170+
171+
Latency Heatmap (Dynamic Range):
172+
Range Freq Distribution Graph
173+
55.180375ms-55.369563ms 17 █████████████████████████ (10.1%)
174+
55.369563ms-55.5594ms 18 ██████████████████████████ (10.7%)
175+
55.5594ms-55.749887ms 27 ████████████████████████████████████████ (16.1%)
176+
55.749887ms-55.941028ms 20 █████████████████████████████ (11.9%)
177+
55.941028ms-56.132824ms 13 ███████████████████ (7.7%)
178+
56.132824ms-56.325277ms 9 █████████████ (5.4%)
179+
56.325277ms-56.51839ms 4 █████ (2.4%)
180+
56.51839ms-56.712165ms 6 ████████ (3.6%)
181+
56.712165ms-56.906605ms 9 █████████████ (5.4%)
182+
56.906605ms-57.101711ms 13 ███████████████████ (7.7%)
183+
57.101711ms-57.297486ms 10 ██████████████ (6.0%)
184+
57.297486ms-57.493933ms 3 ████ (1.8%)
185+
57.493933ms-57.691053ms 3 ████ (1.8%)
186+
57.691053ms-57.888849ms 4 █████ (2.4%)
187+
57.888849ms-58.087323ms 3 ████ (1.8%)
188+
58.087323ms-58.286478ms 2 ██ (1.2%)
189+
58.286478ms-58.486315ms 2 ██ (1.2%)
190+
58.486315ms-58.686837ms 2 ██ (1.2%)
191+
58.686837ms-58.888047ms 2 ██ (1.2%)
192+
58.888047ms-59.089958ms 1 █ (0.6%)
193+
194+
--- Analysis & Recommendations ---
195+
[WARN] Low sample size (168). Results may not be statistically significant. Run for longer.
196+
[INFO] High Allocations (18817/op). This will trigger frequent GC cycles and increase Max Latency.
197+
----------------------------------
198+
=== RUN TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_10_workers
199+
Metric Value Description
200+
------ ----- -----------
201+
Workers 10
202+
Total Ops 1232 (Low Sample Size)
203+
Duration 10.070877291s (Good Duration)
204+
Real Throughput 122.33/s Observed Ops/sec (Wall Clock)
205+
Pure Throughput 130.12/s Theoretical Max (Low Overhead)
206+
207+
Latency Distribution:
208+
Min 61.2545ms
209+
P50 (Median) 75.461375ms
210+
Average 76.852256ms
211+
P95 93.50851ms
212+
P99 106.198982ms
213+
Max 144.872375ms (Stable Tail)
214+
215+
Stability Metrics:
216+
Std Dev 9.28799ms
217+
IQR 10.909229ms Interquartile Range
218+
Jitter 9.755984ms Avg delta per worker
219+
CV 12.09% Moderate Variance (10-20%)
220+
221+
Memory 1282384 B/op Allocated bytes per operation
222+
Allocs 18668 allocs/op Allocations per operation
223+
224+
Latency Heatmap (Dynamic Range):
225+
Range Freq Distribution Graph
226+
61.2545ms-63.948502ms 36 ███████ (2.9%)
227+
63.948502ms-66.760987ms 86 █████████████████ (7.0%)
228+
66.760987ms-69.697167ms 152 ███████████████████████████████ (12.3%)
229+
69.697167ms-72.762481ms 181 █████████████████████████████████████ (14.7%)
230+
72.762481ms-75.962609ms 195 ████████████████████████████████████████ (15.8%)
231+
75.962609ms-79.303481ms 179 ████████████████████████████████████ (14.5%)
232+
79.303481ms-82.791286ms 152 ███████████████████████████████ (12.3%)
233+
82.791286ms-86.432486ms 94 ███████████████████ (7.6%)
234+
86.432486ms-90.233828ms 59 ████████████ (4.8%)
235+
90.233828ms-94.202355ms 40 ████████ (3.2%)
236+
94.202355ms-98.345419ms 29 █████ (2.4%)
237+
98.345419ms-102.670697ms 9 █ (0.7%)
238+
102.670697ms-107.186203ms 8 █ (0.6%)
239+
107.186203ms-111.900303ms 4 (0.3%)
240+
111.900303ms-116.821732ms 2 (0.2%)
241+
116.821732ms-121.959608ms 3 (0.2%)
242+
121.959608ms-127.32345ms 1 (0.1%)
243+
127.32345ms-132.923196ms 1 (0.1%)
244+
138.769222ms-144.872375ms 1 (0.1%)
245+
246+
--- Analysis & Recommendations ---
247+
[WARN] Low sample size (1232). Results may not be statistically significant. Run for longer.
248+
[INFO] High Allocations (18668/op). This will trigger frequent GC cycles and increase Max Latency.
249+
----------------------------------
250+
--- PASS: TestParallelBenchmarkSender (20.83s)
251+
--- PASS: TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_1_workers (10.39s)
252+
--- PASS: TestParallelBenchmarkSender/Setup(bits_32,_curve_BN254,_#i_2,_#o_2)_with_10_workers (10.44s)
253+
PASS
254+
ok github.com/hyperledger-labs/fabric-token-sdk/token/core/zkatdlog/nogh/v1/transfer 21.409s
255+
```

0 commit comments

Comments
 (0)