Skip to content

Commit 07d94b5

Browse files
Merge pull request #94 from zamazan4ik/feature/pgo-improvements
Extend PGO section of the Rust Engineering practices
2 parents ce0354c + f5ac822 commit 07d94b5

1 file changed

Lines changed: 30 additions & 1 deletion

File tree

engineering-book/src/ch03-benchmarking-measuring-what-matters.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,13 @@ perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
280280
- **Bottom** = entry point, **Top** = leaf functions doing actual work
281281
- Look for wide plateaus at the top — those are your hot spots
282282

283-
**Profile-guided optimization (PGO):**
283+
### Profile-Guided Optimization (PGO)
284+
285+
Profile-Guided Optimization (PGO) is a compiler optimization technique for improving performance of CPU-intensive applications. The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.
286+
287+
There are different ways of collecting data about a program’s execution. One is to run the program inside a profiler (such as `perf`) and another is to create an instrumented binary, that is, a binary that has data collection built into it, and run that. The latter usually provides more accurate data and it is also what is supported by Rustc.
288+
289+
Below there is an example of instrumentation-based PGO:
284290

285291
```bash
286292
# Step 1: Build with instrumentation
@@ -302,9 +308,32 @@ RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata" cargo build --release
302308
# because the CPU is mostly waiting, not executing hot loops.
303309
```
304310

311+
As an alternative to directly using the compiler for PGO, you may choose to go with [cargo-pgo](https://github.com/kobzol/cargo-pgo), which has an intuitive command-line API and saves you the trouble of doing all the manual work.
312+
313+
With `cargo-pgo`, the optimization workflow from above can look like that:
314+
315+
```bash
316+
# Step 1: Build with instrumentation
317+
cargo pgo build
318+
319+
# Step 2: Run representative workloads
320+
cargo pgo run -- --run-full
321+
322+
# Step 3: Rebuild with profiling feedback
323+
cargo pgo optimize
324+
```
325+
326+
Sampling PGO or SPGO is a more complicated way to perform PGO in a price of reduced runtime overhead compared to instrumentation-based PGO. For now, the best place to read about it is the Clang PGO [manual](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers).
327+
305328
> **Tip**: Before spending time on PGO, ensure your [release profile](ch07-release-profiles-and-binary-size.md)
306329
> already has LTO enabled — it typically delivers a bigger win for less effort.
307330
331+
Further reading:
332+
333+
* Official Rustc [guide](https://doc.rust-lang.org/rustc/profile-guided-optimization.html) about PGO.
334+
* [Awesome PGO](https://github.com/zamazan4ik/awesome-pgo) - a collection of PGO benchmarks for real applications, including PGO guides for different compilers (including Sampling PGO)
335+
* [LLVM BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) - Post-Link Optimization (PLO) optimization technique. PLO can be used for performing additional optimizations even after applying PGO for getting better performance. `cargo-pgo` supports `llvm-bolt` too.
336+
308337
### `hyperfine` — Quick End-to-End Timing
309338

310339
[`hyperfine`](https://github.com/sharkdp/hyperfine) benchmarks entire commands,

0 commit comments

Comments
 (0)