Skip to content

Using the unit tests as the PGO task has problems #130701

Open
@nascheme

Description

@nascheme

Feature or enhancement

Proposal:

When Python is compiled with --enabled-optimizations, which turns on PGO (program guided optimizations), the build will run a subset of the unit tests as the "task" to generate profile information. Included in the profile is information like counts of the taken side of a CPU conditional branch instructions. To get the best optimization, your PGO task should match the branch taken behavior of your real workloads.

Using the unit tests has the advantage that we have good code coverage in terms of executing most branches and code paths. It also has the advantage of being available without any external dependencies. It has the disadvantage that the code executed during unit tests is likely quite atypical of what's executed during real applications. Running the ./python -X perf -m test --pgo under the "perf" tool, I see the following results:

Children Self Symbol
97.39% 21.16% _PyEval_EvalFrameDefault
34.37% 2.39% deduce_unreachable
24.82% 1.50% _PyGC_Collect
21.11% 1.32% gc_collect_region
20.95% 0.00% gc_collect
19.50% 0.00% py::gc_collect:/home/nas/src/cpython/Lib/test/support/init.py
14.98% 0.54% PyObject_Vectorcall
10.95% 1.48% dict_traverse
7.57% 0.00% _PyPegen_run_parser_from_string
7.46% 0.00% _PyPegen_run_parser
7.46% 0.00% _PyPegen_parse
7.33% 0.24% py::_make_iterencode.._iterencode_dict:/home/nas/src/cpython/Lib/json/encoder.py
7.18% 7.09% visit_reachable
6.43% 0.02% expression_rule
6.13% 0.00% PyRun_StringFlags
6.06% 1.90% _PyEval_Vector
5.99% 5.91% visit_decref
5.80% 0.00% builtin_eval
5.66% 0.04% disjunction_rule

This profile reveals a number of problems. First, a large fraction of time in spent in the cyclic GC. That's because the unit test framework calls test.support.gc_collect() before each test case. That function triggers three full GC collections. Other tests also call the GC explicitly. This is not behavior typical of a real program.

Also taking a lot of time are functions related to parsing and compiling Python code. Notice the builtin_eval() function, for example. I suspect that's mostly a result of using "doctest". Again, this would not be typical of real programs.

I think we should replace the PGO task with a program that more closely represents the behavior of real Python programs. There are at least two potential advantages: it could make the compiled Python binary faster for real programs, it could make our benchmark results less noisy since the compiler would be doing a better and most consistent job of generating optimal code.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildThe build process and cross-buildperformancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions