Using the unit tests as the PGO task has problems

# Feature or enhancement

### Proposal:

When Python is compiled with `--enabled-optimizations`, which turns on PGO (program guided optimizations), the build will run a subset of the unit tests as the "task" to generate profile information.  Included in the profile is information like counts of the taken side of a CPU conditional branch instructions.  To get the best optimization, your PGO task should match the branch taken behavior of your real workloads.

Using the unit tests has the advantage that we have good code coverage in terms of executing most branches and code paths.  It also has the advantage of being available without any external dependencies.  It has the disadvantage that the code executed during unit tests is likely quite atypical of what's executed during real applications.  Running the `./python -X perf -m test --pgo` under the "perf" tool, I see the following results:

| Children | Self   | Symbol                                                                                   |
| -------- | ------ | ---------------------------------------------------------------------------------------- |
| 97.39%   | 21.16% | _PyEval_EvalFrameDefault                                                                 |
| 34.37%   | 2.39%  | deduce_unreachable                                                                       |
| 24.82%   | 1.50%  | _PyGC_Collect                                                                            |
| 21.11%   | 1.32%  | gc_collect_region                                                                        |
| 20.95%   | 0.00%  | gc_collect                                                                               |
| 19.50%   | 0.00%  | py::gc_collect:/home/nas/src/cpython/Lib/test/support/__init__.py                        |
| 14.98%   | 0.54%  | PyObject_Vectorcall                                                                      |
| 10.95%   | 1.48%  | dict_traverse                                                                            |
| 7.57%    | 0.00%  | _PyPegen_run_parser_from_string                                                          |
| 7.46%    | 0.00%  | _PyPegen_run_parser                                                                      |
| 7.46%    | 0.00%  | _PyPegen_parse                                                                           |
| 7.33%    | 0.24%  | py::_make_iterencode.<locals>._iterencode_dict:/home/nas/src/cpython/Lib/json/encoder.py |
| 7.18%    | 7.09%  | visit_reachable                                                                          |
| 6.43%    | 0.02%  | expression_rule                                                                          |
| 6.13%    | 0.00%  | PyRun_StringFlags                                                                        |
| 6.06%    | 1.90%  | _PyEval_Vector                                                                           |
| 5.99%    | 5.91%  | visit_decref                                                                             |
| 5.80%    | 0.00%  | builtin_eval                                                                             |
| 5.66%    | 0.04%  | disjunction_rule                                                                         |

This profile reveals a number of problems.  First, a large fraction of time in spent in the cyclic GC.  That's because the unit test framework calls `test.support.gc_collect()` before each test case.  That function triggers three full GC collections.  Other tests also call the GC explicitly.  This is not behavior typical of a real program.

Also taking a lot of time are functions related to parsing and compiling Python code.  Notice the `builtin_eval()` function, for example.  I suspect that's mostly a result of using "doctest".  Again, this would not be typical of real programs.

I think we should replace the PGO task with a program that more closely represents the behavior of real Python programs.  There are at least two potential advantages: it could make the compiled Python binary faster for real programs, it could make our benchmark results less noisy since the compiler would be doing a better and most consistent job of generating optimal code.

### Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

### Links to previous discussion of this feature:

_No response_


### Linked PRs
* gh-130702

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using the unit tests as the PGO task has problems #130701

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Children	Self	Symbol
97.39%	21.16%	_PyEval_EvalFrameDefault
34.37%	2.39%	deduce_unreachable
24.82%	1.50%	_PyGC_Collect
21.11%	1.32%	gc_collect_region
20.95%	0.00%	gc_collect
19.50%	0.00%	py::gc_collect:/home/nas/src/cpython/Lib/test/support/init.py
14.98%	0.54%	PyObject_Vectorcall
10.95%	1.48%	dict_traverse
7.57%	0.00%	_PyPegen_run_parser_from_string
7.46%	0.00%	_PyPegen_run_parser
7.46%	0.00%	_PyPegen_parse
7.33%	0.24%	py::_make_iterencode.._iterencode_dict:/home/nas/src/cpython/Lib/json/encoder.py
7.18%	7.09%	visit_reachable
6.43%	0.02%	expression_rule
6.13%	0.00%	PyRun_StringFlags
6.06%	1.90%	_PyEval_Vector
5.99%	5.91%	visit_decref
5.80%	0.00%	builtin_eval
5.66%	0.04%	disjunction_rule

Uh oh!

Using the unit tests as the PGO task has problems #130701

Description

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions