@@ -37,8 +37,6 @@ are 3 levels of fidelity that we consider:
37371 . The suite's performance counters match production.
38381 . An optimization's impact on the suite matches the impact on production.
3939
40- The goal of the suite for Y22 is to achieve the first level of fidelity.
41-
4240## Versioning
4341
4442Fleetbench uses [ semantic versioning] ( http://semver.org ) for its releases, where
@@ -67,14 +65,15 @@ optimizations.
6765
6866### TCMalloc per-CPU Mode
6967
70- TCMalloc is the underlying memory allocator in this benchmark suite. The
71- supposed default operation mode should be
72- [ per-CPU mode] ( https://google.github.io/tcmalloc/overview.html ) . RSEQ is
73- required for this mode, however, glibc took control of it since version 2.35,
74- and TCMalloc reverts to using per-thread caching instead
75- ([ more info] ( https://github.com/google/tcmalloc/issues/144 ) ). We ** strongly
76- recommend** adding environment variable: ` GLIBC_TUNABLES=glibc.pthread.rseq=0 `
77- to ensure per-CPU mode is being applied when running the benchmark. For example:
68+ TCMalloc is the underlying memory allocator in this benchmark suite. By default
69+ it operates in [ per-CPU mode] ( https://google.github.io/tcmalloc/overview.html ) .
70+
71+ However, [ RSEQ] ( https://lwn.net/Articles/883104/ ) is required for this to work.
72+
73+ To avoid [ conflicts] ( https://github.com/google/tcmalloc/issues/144 ) with glibc's
74+ use of RSEQ, we ** strongly recommend** setting the environment variable:
75+ ` GLIBC_TUNABLES=glibc.pthread.rseq=0 ` to ensure per-CPU mode is being applied
76+ when running the benchmark. For example:
7877
7978```
8079GLIBC_TUNABLES=glibc.pthread.rseq=0 bazel run --config=opt fleetbench/swissmap:hot_swissmap_benchmark
@@ -109,57 +108,38 @@ Use `--config=westmere` for Westmere-era processors.
109108
110109### Running Benchmarks
111110
112- Swissmap benchmark for cold access setup takes much longer to run to completion,
113- so by default it has a ` --benchmark_filter ` flag set to narrow down to smaller
114- set sizes of ` 16 ` and ` 64 ` elements:
111+ Swissmap benchmark for cold access setup takes a long time to run to completion.
112+ We suggest using the ` --benchmark_filter ` flag to narrow down to smaller set
113+ sizes of e.g. ` 16 ` and ` 64 ` elements:
115114
116115```
117- bazel run --config=opt fleetbench/swissmap:cold_swissmap_benchmark
116+ bazel run --config=opt fleetbench/swissmap:cold_swissmap_benchmark -- \
117+ --benchmark_filter=".*set_size:(16|64).*"
118118```
119119
120120To change this filter, you can specify a regex in ` --benchmark_filter ` flag
121121([ more info] ( https://github.com/google/benchmark/blob/main/docs/user_guide.md#running-a-subset-of-benchmarks ) ).
122122Example to run for only sets of ` 16 ` and ` 512 ` elements:
123123
124124```
125- bazel run --config=opt fleetbench/swissmap:cold_swissmap_benchmark -- --benchmark_filter=".*set_size:(16|512).*"
125+ bazel run --config=opt fleetbench/swissmap:cold_swissmap_benchmark -- \
126+ --benchmark_filter=".*set_size:(16|512).*"
126127```
127128
128- The protocol buffer benchmark is set to run for at least 3s by default:
129-
130- ```
131- bazel run --config=opt fleetbench/proto:proto_benchmark
132- ```
133-
134- To change the duration to 30s, run the following:
129+ To extend the runtime of a benchmark, e.g. to collect more profile samples, use
130+ --benchmark_min_time.
135131
136132```
137133bazel run --config=opt fleetbench/proto:proto_benchmark -- --benchmark_min_time=30s
138134```
139135
140- The TCMalloc Empirical Driver benchmark can take ~ 1hr to run all benchmarks:
136+ The TCMalloc Empirical Driver benchmark can take ~ 1hr to run all benchmarks, so
137+ running a subset may be advised.
141138
142139```
143140bazel run --config=opt fleetbench/tcmalloc:empirical_driver -- --benchmark_counters_tabular=true
144141```
145142
146- To build and execute the benchmark in separate steps, run the commands below.
147-
148- NOTE: you'll need to specify the flags ` --benchmark_filter ` and
149- ` --benchmark_min_time ` explicitly when build and execution are split into two
150- separate steps.
151-
152- ```
153- bazel build --config=opt fleetbench/swissmap:hot_swissmap_benchmark
154- bazel-bin/fleetbench/swissmap/hot_swissmap_benchmark --benchmark_filter=all
155- ```
156-
157- NOTE: the suite will be expanded with the ability to execute all benchmarks with
158- one target.
159-
160- WARNING: MacOS and Windows have not been tested, and are not currently supported
161- by Fleetbench.
162-
163143### Reducing run-to-run variance
164144
165145It is expected that there will be some variance in the reported CPU times across
@@ -175,15 +155,15 @@ list of techniques that help with reducing run-to-run latency variance:
175155 ` --benchmark_repetitions ` .
176156* Recommended by the benchmarking framework
177157 [ here] ( https://github.com/google/benchmark/blob/main/docs/reducing_variance.md#reducing-variance-in-benchmarks ) :
178- * Disable frequently scaling,
179- * Bind the process to a core by setting its affinity,
180- * Disable processor boosting,
158+ * Disable frequency scaling
159+ * Bind the process to a core by setting its affinity
160+ * Disable processor boosting
181161 * Disable Hyperthreading/SMT (should not affect single-threaded
182- benchmarks).
162+ benchmarks)
183163 * NOTE: We do not recommend reducing the working set of the benchmark to
184164 fit into L1 cache, contrary to the recommendations in the link, as it
185165 would significantly reduce this benchmarking suite's representativeness.
186- * Disable memory randomization (ASLR).
166+ * Disable memory randomization (ASLR)
187167
188168## Future Work
189169
@@ -243,10 +223,18 @@ bazel run --config=clang --config=opt --features=thin_lto fleetbench/proto:proto
243223
2442241 . Q: Can I run Fleetbench without TCMalloc?
245225
246- A: Fleetbench is built with Bazel, which supports --custom_malloc option
247- ([ bazel docs] ( https://bazel.build/docs/user-manual#custom-malloc ) ). This
248- should allow you to override the malloc attributed configured to take
249- tcmalloc as the default.
226+ A: Yes. Specify ` --custom_malloc="@bazel_tools//tools/cpp:malloc" ` on the
227+ bazel command line to override with the system allocator.
228+
229+ 1 . Q: Can I run with Address Sanitizer?
230+
231+ A: Yes. Note that you need to override TCMalloc as well for ASAN to work.
232+
233+ Example:
234+
235+ ```
236+ bazel build --custom_malloc="@bazel_tools//tools/cpp:malloc" -c opt fleetbench/proto:proto_benchmark --copt=-fsanitize=address --linkopt=-fsanitize=address
237+ ```
250238
2512391 . Q: Are the benchmarks fixed in nature?
252240
0 commit comments